Writing an R function to read FASTA-formatted files (CC289)
ฝัง
- เผยแพร่เมื่อ 16 ก.ค. 2024
- Watch along as Pat shows the development of a function for reading in a FASTA-formatted file. FASTA files are commonly used to store DNA sequence information. He uses Test Driven Development (TDD) to develop and refactor the function to suit our needs. Along the way, he makes use of temporary files using the tempfile function and a variety of functions from the stringi package that stringr is based on. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.
If you want to get a physical copy of R Packages: amzn.to/43pMR8L
If you want a free, online version of R packages: r-pkgs.org/
You can find my blog post for this episode at www.riffomonas.org/code_club/....
Check out the GitHub repository at the:
* Beginning of the episode: github.com/riffomonas/phyloty...
* End of the episode: github.com/riffomonas/phyloty...
#rstats #paste #paste0 #refactor #testthat #tdd #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome
Support Riffomonas by becoming a Patreon member!
/ riffomonas
Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.
If you're interested in purchasing a video workshop be sure to check out riffomonas.org/workshops/
You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/
0:00 Introduction
7:19 Working with temporary files
10:36 Basic version of read_fasta
15:37 Parsing sequence names when comments are present
21:41 Parsing comments from fasta header line
24:01 Reading in multiple lines of sequence data
37:08 Using read_fasta in vignette - วิทยาศาสตร์และเทคโนโลยี
I recommend the fastverse, mainly collapse:: package, which offers hyper-performnat in tidyverse syntax.
Try instead of mutate, collapse::fmutate. Instead of inner join, collapse::join(...,how="inner") etc..
I also like bench::mark(..,.., relative = TRUE) now to get relative time (slower is 1.00, all other align).
These are super helpful functions in a great echo system very similar to the tidyverse, just faster.
Thanks - I'll have to check it out
Sir, I am working on bioinformatics programming for several years now, and I need to say thank you for your wonderful efforts 😄, First Like 😁
Ha! Thanks for watching. Good luck in your own bioinformatics project