Illumina data QC & basic NGS tools презентация

Содержание

Слайд 2

From the very beginning ...AACCCGTACGTTTTGCAAACGACCGT...

From the very beginning

...AACCCGTACGTTTTGCAAACGACCGT...

Слайд 3

From the very beginning Sequencing ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTACGT CGTACGTTTTG AACGACCG GTTTTGCAAACG GTACGTTTTGCA

From the very beginning

Sequencing

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTACGT

CGTACGTTTTG

AACGACCG

GTTTTGCAAACG

GTACGTTTTGCA

Слайд 4

From the very beginning Sequencing Coverage ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTACGT CGTACGTTTTG AACGACCG GTTTTGCAAACG GTACGTTTTGCA 3x 2x

From the very beginning

Sequencing
Coverage

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTACGT

CGTACGTTTTG

AACGACCG

GTTTTGCAAACG

GTACGTTTTGCA

3x

2x

Слайд 5

From the very beginning Sequencing Coverage Errors Mismatches ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTTCGT CGTACGTTTTC AACGACCG GTTTTGCAAACG GTACGTTTTGCA

From the very beginning

Sequencing
Coverage
Errors
Mismatches

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTTCGT

CGTACGTTTTC

AACGACCG

GTTTTGCAAACG

GTACGTTTTGCA

Слайд 6

From the very beginning Sequencing Coverage Errors Mismatches Indels ...AACCCGTACGTTTTGCAAACGACCGT... AACCCGTTCGT CGTACGTTTTTC AACGACCG GTTTTGCAAACG GTA_GTTTTGCA

From the very beginning

Sequencing
Coverage
Errors
Mismatches
Indels

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTTCGT

CGTACGTTTTTC

AACGACCG

GTTTTGCAAACG

GTA_GTTTTGCA

Слайд 7

Early days Sanger sequencing Long reads (~900 bp) Low coverage

Early days

Sanger sequencing
Long reads (~900 bp)
Low coverage (< 10x)
Extreme cost
Human genome

project
3 Gbp
3 billion USD
10 years
Слайд 8

NGS Shorter reads (25-400bp) High coverage (50-1000x) Huge amount of

NGS

Shorter reads (25-400bp)
High coverage (50-1000x)
Huge amount of data
Low cost
More applications
Required

completely new algorithms
Слайд 9

NGS technologies

NGS technologies

Слайд 10

Illumina sequencing http://www.youtube.com/watch?v=77r5p8IBwJk

Illumina sequencing

http://www.youtube.com/watch?v=77r5p8IBwJk

Слайд 11

IonTorrent sequencing https://www.youtube.com/watch?v=WYBzbxIfuKs

IonTorrent sequencing

https://www.youtube.com/watch?v=WYBzbxIfuKs

Слайд 12

Paired reads AACCCGTACGTTTTGCAAACGACCGTAACCAAATTGG AACCCGTACGT........TAACCAAATTGG insert size Paired-end ( Mate-pairs (1 - 20 kbp)

Paired reads
AACCCGTACGTTTTGCAAACGACCGTAACCAAATTGG

AACCCGTACGT........TAACCAAATTGG
insert size

Paired-end (< 1 kbp)
Mate-pairs (1 - 20 kbp)

Слайд 13

Insert size distribution Insert size # of reads

Insert size distribution

Insert size

# of reads

Слайд 14

FASTA/FASTQ FASTA >EAS20_8_6_1_9_1972/1 ACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGC >EAS20_8_6_1_163_1521/1 GCAGAAAACGTTCTGCATTTGCCACTGATGTACCGCCGAACTTCAACACTCGCA FASTQ @EAS20_8_6_1_1477_92/1 ACCGTTACCTGTGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTT +EAS20_8_6_1_1477_92/1

FASTA/FASTQ

FASTA
>EAS20_8_6_1_9_1972/1
ACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGC
>EAS20_8_6_1_163_1521/1
GCAGAAAACGTTCTGCATTTGCCACTGATGTACCGCCGAACTTCAACACTCGCA
FASTQ
@EAS20_8_6_1_1477_92/1
ACCGTTACCTGTGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTT
+EAS20_8_6_1_1477_92/1
HHGHFHHHHHHHHHGFFHHHBG?GGC8DD9GF??=FFBCGBAF>FGCFHGHGGG
Phred quality
Q = [ - 10 log10 p / (1 -

p) ]
Слайд 15

seqtk utility Subsampling sample Converting between interleaved/paired files mergepe, seq

seqtk utility

Subsampling sample
Converting between interleaved/paired files mergepe, seq -1/-2
fastq->fasta seq -A
Quality trimming
Shifting the quality
Modifying

names
etc...
Слайд 16

Quality Control

Quality Control

Слайд 17

FastQC Easy and lightweight quality control for sequencing data Does not require reference genome

FastQC

Easy and lightweight quality control for sequencing data
Does not require reference

genome
Слайд 18

Per base sequence quality

Per base sequence quality

Слайд 19

Per base sequence quality

Per base sequence quality

Слайд 20

Per sequence GC content

Per sequence GC content

Слайд 21

Per sequence GC content

Per sequence GC content

Слайд 22

Per sequence GC content

Per sequence GC content

Слайд 23

Per base sequence content

Per base sequence content

Слайд 24

Per base sequence content

Per base sequence content

Слайд 25

FastQC fastqc -h mkdir fastqc … -o

FastQC

fastqc -h
mkdir
fastqc … -o

Слайд 26

Error correction

Error correction

Слайд 27

Per base sequence quality

Per base sequence quality

Слайд 28

Trimmomatic SE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 Remove leading low quality

Trimmomatic

SE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Remove leading low

quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Слайд 29

Trimmomatic Scan the read with a 4-base wide sliding window,

Trimmomatic

Scan the read with a 4-base wide sliding window, cutting when

the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)
Слайд 30

Trimmomatic PE OPTIONS ILLUMINACLIP: ILLUMINACLIP:TruSeq3-PE.fa

Trimmomatic

PE

OPTIONS
ILLUMINACLIP:
ILLUMINACLIP:TruSeq3-PE.fa
Слайд 31

Adapter trimming ILLUMINACLIP: : : threshold>: ILLUMINACLIP:NexteraPE-PE.fa:2:10:30

Adapter trimming

ILLUMINACLIP:::threshold>:
ILLUMINACLIP:NexteraPE-PE.fa:2:10:30

Имя файла: Illumina-data-QC-&amp;-basic-NGS-tools.pptx
Количество просмотров: 34
Количество скачиваний: 0