Illumina data QC & basic NGS tools презентация

Содержание

Слайд 2

From the very beginning

...AACCCGTACGTTTTGCAAACGACCGT...

Слайд 3

From the very beginning

Sequencing

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTACGT

CGTACGTTTTG

AACGACCG

GTTTTGCAAACG

GTACGTTTTGCA

Слайд 4

From the very beginning

Sequencing
Coverage

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTACGT

CGTACGTTTTG

AACGACCG

GTTTTGCAAACG

GTACGTTTTGCA

3x

2x

Слайд 5

From the very beginning

Sequencing
Coverage
Errors
Mismatches

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTTCGT

CGTACGTTTTC

AACGACCG

GTTTTGCAAACG

GTACGTTTTGCA

Слайд 6

From the very beginning

Sequencing
Coverage
Errors
Mismatches
Indels

...AACCCGTACGTTTTGCAAACGACCGT...

AACCCGTTCGT

CGTACGTTTTTC

AACGACCG

GTTTTGCAAACG

GTA_GTTTTGCA

Слайд 7

Early days

Sanger sequencing
Long reads (~900 bp)
Low coverage (< 10x)
Extreme cost
Human genome project
3 Gbp
3

billion USD
10 years

Слайд 8

NGS

Shorter reads (25-400bp)
High coverage (50-1000x)
Huge amount of data
Low cost
More applications
Required completely new

algorithms

Слайд 9

NGS technologies

Слайд 10

Illumina sequencing

http://www.youtube.com/watch?v=77r5p8IBwJk

Слайд 11

IonTorrent sequencing

https://www.youtube.com/watch?v=WYBzbxIfuKs

Слайд 12

Paired reads
AACCCGTACGTTTTGCAAACGACCGTAACCAAATTGG

AACCCGTACGT........TAACCAAATTGG
insert size

Paired-end (< 1 kbp)
Mate-pairs (1 - 20 kbp)

Слайд 13

Insert size distribution

Insert size

# of reads

Слайд 14

FASTA/FASTQ

FASTA
>EAS20_8_6_1_9_1972/1
ACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGC
>EAS20_8_6_1_163_1521/1
GCAGAAAACGTTCTGCATTTGCCACTGATGTACCGCCGAACTTCAACACTCGCA
FASTQ
@EAS20_8_6_1_1477_92/1
ACCGTTACCTGTGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTT
+EAS20_8_6_1_1477_92/1
HHGHFHHHHHHHHHGFFHHHBG?GGC8DD9GF??=FFBCGBAF>FGCFHGHGGG
Phred quality
Q = [ - 10 log10 p / (1 - p) ]

Слайд 15

seqtk utility

Subsampling sample
Converting between interleaved/paired files mergepe, seq -1/-2
fastq->fasta seq -A
Quality trimming
Shifting the quality
Modifying names
etc...

Слайд 16

Quality Control

Слайд 17

FastQC

Easy and lightweight quality control for sequencing data
Does not require reference genome

Слайд 18

Per base sequence quality

Слайд 19

Per base sequence quality

Слайд 20

Per sequence GC content

Слайд 21

Per sequence GC content

Слайд 22

Per sequence GC content

Слайд 23

Per base sequence content

Слайд 24

Per base sequence content

Слайд 25

FastQC

fastqc -h
mkdir
fastqc … -o

Слайд 26

Error correction

Слайд 27

Per base sequence quality

Слайд 28

Trimmomatic

SE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Remove leading low quality or

N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)

Слайд 29

Trimmomatic

Scan the read with a 4-base wide sliding window, cutting when the average

quality per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)

Слайд 30

Trimmomatic

PE

OPTIONS
ILLUMINACLIP:
ILLUMINACLIP:TruSeq3-PE.fa

Слайд 31

Adapter trimming

ILLUMINACLIP:::threshold>:
ILLUMINACLIP:NexteraPE-PE.fa:2:10:30

Имя файла: Illumina-data-QC-&amp;-basic-NGS-tools.pptx
Количество просмотров: 23
Количество скачиваний: 0