Слайд 2
![From the very beginning ...AACCCGTACGTTTTGCAAACGACCGT...](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-1.jpg)
From the very beginning
...AACCCGTACGTTTTGCAAACGACCGT...
Слайд 3
![From the very beginning Sequencing ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTACGT CGTACGTTTTG AACGACCG GTTTTGCAAACG GTACGTTTTGCA](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-2.jpg)
From the very beginning
Sequencing
...AACCCGTACGTTTTGCAAACGACCGT...
AACCCGTACGT
CGTACGTTTTG
AACGACCG
GTTTTGCAAACG
GTACGTTTTGCA
Слайд 4
![From the very beginning Sequencing Coverage ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTACGT CGTACGTTTTG AACGACCG GTTTTGCAAACG GTACGTTTTGCA 3x 2x](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-3.jpg)
From the very beginning
Sequencing
Coverage
...AACCCGTACGTTTTGCAAACGACCGT...
AACCCGTACGT
CGTACGTTTTG
AACGACCG
GTTTTGCAAACG
GTACGTTTTGCA
3x
2x
Слайд 5
![From the very beginning Sequencing Coverage Errors Mismatches ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTTCGT CGTACGTTTTC AACGACCG GTTTTGCAAACG GTACGTTTTGCA](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-4.jpg)
From the very beginning
Sequencing
Coverage
Errors
Mismatches
...AACCCGTACGTTTTGCAAACGACCGT...
AACCCGTTCGT
CGTACGTTTTC
AACGACCG
GTTTTGCAAACG
GTACGTTTTGCA
Слайд 6
![From the very beginning Sequencing Coverage Errors Mismatches Indels ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTTCGT CGTACGTTTTTC AACGACCG GTTTTGCAAACG GTA_GTTTTGCA](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-5.jpg)
From the very beginning
Sequencing
Coverage
Errors
Mismatches
Indels
...AACCCGTACGTTTTGCAAACGACCGT...
AACCCGTTCGT
CGTACGTTTTTC
AACGACCG
GTTTTGCAAACG
GTA_GTTTTGCA
Слайд 7
![Early days Sanger sequencing Long reads (~900 bp) Low coverage](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-6.jpg)
Early days
Sanger sequencing
Long reads (~900 bp)
Low coverage (< 10x)
Extreme cost
Human genome
project
3 Gbp
3 billion USD
10 years
Слайд 8
![NGS Shorter reads (25-400bp) High coverage (50-1000x) Huge amount of](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-7.jpg)
NGS
Shorter reads (25-400bp)
High coverage (50-1000x)
Huge amount of data
Low cost
More applications
Required
completely new algorithms
Слайд 9
![NGS technologies](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-8.jpg)
Слайд 10
![Illumina sequencing http://www.youtube.com/watch?v=77r5p8IBwJk](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-9.jpg)
Illumina sequencing
http://www.youtube.com/watch?v=77r5p8IBwJk
Слайд 11
![IonTorrent sequencing https://www.youtube.com/watch?v=WYBzbxIfuKs](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-10.jpg)
IonTorrent sequencing
https://www.youtube.com/watch?v=WYBzbxIfuKs
Слайд 12
![Paired reads AACCCGTACGTTTTGCAAACGACCGTAACCAAATTGG AACCCGTACGT........TAACCAAATTGG insert size Paired-end ( Mate-pairs (1 - 20 kbp)](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-11.jpg)
Paired reads
AACCCGTACGTTTTGCAAACGACCGTAACCAAATTGG
AACCCGTACGT........TAACCAAATTGG
insert size
Paired-end (< 1 kbp)
Mate-pairs (1 - 20 kbp)
Слайд 13
![Insert size distribution Insert size # of reads](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-12.jpg)
Insert size distribution
Insert size
# of reads
Слайд 14
![FASTA/FASTQ FASTA >EAS20_8_6_1_9_1972/1 ACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGC >EAS20_8_6_1_163_1521/1 GCAGAAAACGTTCTGCATTTGCCACTGATGTACCGCCGAACTTCAACACTCGCA FASTQ @EAS20_8_6_1_1477_92/1 ACCGTTACCTGTGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTT +EAS20_8_6_1_1477_92/1](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-13.jpg)
FASTA/FASTQ
FASTA
>EAS20_8_6_1_9_1972/1
ACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGC
>EAS20_8_6_1_163_1521/1
GCAGAAAACGTTCTGCATTTGCCACTGATGTACCGCCGAACTTCAACACTCGCA
FASTQ
@EAS20_8_6_1_1477_92/1
ACCGTTACCTGTGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTT
+EAS20_8_6_1_1477_92/1
HHGHFHHHHHHHHHGFFHHHBG?GGC8DD9GF??=FFBCGBAF>FGCFHGHGGG
Phred quality
Q = [ - 10 log10 p / (1 -
p) ]
Слайд 15
![seqtk utility Subsampling sample Converting between interleaved/paired files mergepe, seq](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-14.jpg)
seqtk utility
Subsampling
sample
Converting between interleaved/paired files
mergepe, seq -1/-2
fastq->fasta
seq -A
Quality trimming
Shifting the quality
Modifying
names
etc...
Слайд 16
![Quality Control](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-15.jpg)
Слайд 17
![FastQC Easy and lightweight quality control for sequencing data Does not require reference genome](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-16.jpg)
FastQC
Easy and lightweight quality control for sequencing data
Does not require reference
genome
Слайд 18
![Per base sequence quality](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-17.jpg)
Per base sequence quality
Слайд 19
![Per base sequence quality](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-18.jpg)
Per base sequence quality
Слайд 20
![Per sequence GC content](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-19.jpg)
Слайд 21
![Per sequence GC content](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-20.jpg)
Слайд 22
![Per sequence GC content](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-21.jpg)
Слайд 23
![Per base sequence content](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-22.jpg)
Per base sequence content
Слайд 24
![Per base sequence content](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-23.jpg)
Per base sequence content
Слайд 25
![FastQC fastqc -h mkdir fastqc … -o](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-24.jpg)
FastQC
fastqc -h
mkdir
Слайд 26
![Error correction](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-25.jpg)
Слайд 27
![Per base sequence quality](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-26.jpg)
Per base sequence quality
Слайд 28
![Trimmomatic SE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Remove leading low quality](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-27.jpg)
quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Слайд 29
![Trimmomatic Scan the read with a 4-base wide sliding window,](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-28.jpg)
Trimmomatic
Scan the read with a 4-base wide sliding window, cutting when
the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)
Слайд 30
![Trimmomatic PE OPTIONS ILLUMINACLIP: ILLUMINACLIP:TruSeq3-PE.fa](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-29.jpg)
OPTIONS
ILLUMINACLIP:
ILLUMINACLIP:TruSeq3-PE.fa
Слайд 31
![Adapter trimming ILLUMINACLIP: : : threshold>: ILLUMINACLIP:NexteraPE-PE.fa:2:10:30](/_ipx/f_webp&q_80&fit_contain&s_1440x1080/imagesDir/jpg/429445/slide-30.jpg)
Adapter trimming
ILLUMINACLIP:::threshold>:
ILLUMINACLIP:NexteraPE-PE.fa:2:10:30