Bioinformatics - fastp FastQ Preprocessing Tool (Timestamps)

Alex Soupir

มุมมอง 8 440

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 ม.ค. 2021
Welcome back, Everyone. Today I'll talk about fastp which is useful for processing raw FastQ files before using them with an aligning tool like STAR or Bowtie2. Here I quickly go through the different options and uses for fastp. While I haven't used it in the past, I think I'll use it in the future!
00:41 - fastp is useful
01:15 - installing fastp with miniconda `conda install -c bioconda fastp`
02:05 - RNA-Seq files `wget -i readlist.txt`
03:26 - running fastp
03:33 - fastp help `fastp --help`
03:51 - input files `-i/o` and -I/O`
05:30 - removing short reads `-l`
06:19 - adapter trimming
09:54 - sliding window quality trimming
14:09 - base correction `-c`
16:35 - poly 'G' trimming `-g/G`
17:57 - UMI processing `-U`
19:22 - overrepresented sequences `-p/P`
21:12 - interleaving paired end read files `-m`
fastp: an ultra-fast all-in-one FASTQ preprocessor:
academic.oup.com/bioinformati...
fastp GitHub:
github.com/OpenGene/fastp
Example Files:
github.com/ACSoupir/Bioinform...
Please consider contributing to my Patreon where I may do merch, gather ideas for future content, and have further discussions:
/ alexsoupir
แนวปฏิบัติและการใช้ชีวิต

ความคิดเห็น • 21

@promethius7820 ปีที่แล้ว ⁺¹
(Sees a man actually running informatics on windows and not throwing a billion errors) "This man is the one who was only spoken about in legends."
@geocarvalhont 2 ปีที่แล้ว
That is a great introduction, thanks.
@donquijote4442 ปีที่แล้ว
you saved my life
@KiranKumar-gs7mp 5 หลายเดือนก่อน ⁺¹
Great video to understand and use fastp....thank you... and it would be great help if you would explain fastp usage with bash script for processing multiple fastq files.
@alexsoupir 5 หลายเดือนก่อน
Hey, Kiran. Thanks for the comments.
For processing several files at ones, I think it can actually be done much simpler than with a bash script. Essentially, can do it in one line using `parallel` (I suggest the conda install: anaconda.org/conda-forge/parallel ). Following code for running STAR with multiple samples, you should be able to adapt the fastp code to run within the parallel call. Only additional starting file that you would need is a text file where the sample names are on their own line. Check out my github page with codes for STAR and see if it helps: github.com/ACSoupir/Bioinformatics_TH-cam/tree/master/STAR
`cat samples.txt | parallel -j 4 "fastp ..."`
for 4 samples at once. Can bump the -j if you want to do more samples.
@KiranKumar-gs7mp 5 หลายเดือนก่อน
Hi Alex, I appreciate your suggestion...I would try it definetly. Anyhow I figured out how to run fastp for multiple fastq files using bash script...thanks.. 😊
@shathahanya9280 2 ปีที่แล้ว ⁺¹
thank u sooo much, can u please do more Videos on this ?
@lawrencemckinney7464 3 ปีที่แล้ว
Great walk-through! I'd like to hear your thoughts on the summary for FASTp vs FASTQC. Visually, which QC report do you prefer?
@alexsoupir 3 ปีที่แล้ว
Thanks! Personally I like FastQC visually more than FastP, more or less do to what seems more clean with FastQC. FastP is more informative though since it gives before and after which is awesome.
@wiggiag 3 ปีที่แล้ว
FastQC output in very informative
@nooraleslam1000 8 หลายเดือนก่อน
Thank you so much it is really informative to me however if you are able to record a video how I can interpret the fastp report, it would be helpful for us thanks again
@adetayoaborisade9346 2 ปีที่แล้ว
Thanks a lot Alex, I get the error “Permission is denied” when I tried to use trimmomatic. What can be the problem pls. Thanks a lot
@aprilmaetabonda2020 2 ปีที่แล้ว
Thank you so much for this video. I have a question. What may be the problem in the raw read file (.gz format) with results that are empty?Zero reads and evrything.
@alexsoupir 2 ปีที่แล้ว ⁺¹
Hey, April. Depending where/how you got the raw reads, there may be some issues but don't know why fastqc or fastp would have an issue. Mostly that is something that would come up in downstream analyses or programs. Something I would suggest is doing something like `head` (or if its zipped might need to `zcat raw_reads.gz | head`) and then compare the fasta header with something you can find online to see if there are issues. There might be a missing line, no header at all, or maybe an extra line, extra character at the beginning of the reads' header.
Long suggestion short, compare your raw read file with some others that you can find online (or if you have some that work) to see what the difference in formatting is.
Hope this helps!
@aprilmaetabonda2020 2 ปีที่แล้ว ⁺¹
Alex Soupir thank you so much 😊. New subscriber here 🥰
@user-bm4nb4to9d 7 หลายเดือนก่อน
thank you so mush this is very help. how would you run fastp on a 100 fastq pe raw reads? surely not one at a time
@alexsoupir 7 หลายเดือนก่อน
Not exactly sure the question but there are a few ways to do an entire folder/sample set in a single line of code. For example, if folder contains 100 samples with PE reads (2 files each), I personally would use `ls` to list the files, parse out the sample names with `awk` or `sed`, use `uniq` to get single entry per sample ID, then pass to `parallel` to run the samples. This can all be piped using `|` making it streamlined.
@muhammadakmal1414 ปีที่แล้ว
Thank you for wonderfull video. I tried to detect and trim the adaptors as per the commands shown in video but still fastQC shows that I have Nextera transposes sequence in my FastQC file. how to deal with this?
@alexsoupir ปีที่แล้ว
This is interesting. Might need to use a program like `trimmomatic` where the adapters can be specified in a file. There is also the ability to align to the genome and what is called "soft-clip" reads so the adapter part doesn't get used and the alignment isn't penalized.
Depending on the alignment tool, the adapter in the reads will decrease the alignment quality. If going with counts, the minimum quality (-Q with featureCounts) can be set to still count those that had lower mapping quality due to the adapter included. However, this is going to be a tradeoff between being able to count reads with adapters and actually bad aligned reads. Best to find a way to remove them from ends of reads for highest quality data.
@christhianulisesfrancofria3353 2 ปีที่แล้ว ⁺²
how i can process a lot of *fastq.gz'
@riazhussain2330 ปีที่แล้ว
same question

ต่อไป

เล่นอัตโนมัติ

Bioinformatics - Setting Up UseGalaxy Locally! (Timestamps)