Writing an R function to read FASTA-formatted files (CC289)

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ก.ค. 2024
  • Watch along as Pat shows the development of a function for reading in a FASTA-formatted file. FASTA files are commonly used to store DNA sequence information. He uses Test Driven Development (TDD) to develop and refactor the function to suit our needs. Along the way, he makes use of temporary files using the tempfile function and a variety of functions from the stringi package that stringr is based on. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.
    If you want to get a physical copy of R Packages: amzn.to/43pMR8L
    If you want a free, online version of R packages: r-pkgs.org/
    You can find my blog post for this episode at www.riffomonas.org/code_club/....
    Check out the GitHub repository at the:
    * Beginning of the episode: github.com/riffomonas/phyloty...
    * End of the episode: github.com/riffomonas/phyloty...
    #rstats #paste #paste0 #refactor #testthat #tdd #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome
    Support Riffomonas by becoming a Patreon member!
    / riffomonas
    Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.
    If you're interested in purchasing a video workshop be sure to check out riffomonas.org/workshops/
    You can also find complete tutorials for learning R with the tidyverse using...
    Microbial ecology data: www.riffomonas.org/minimalR/
    General data: www.riffomonas.org/generalR/
    0:00 Introduction
    7:19 Working with temporary files
    10:36 Basic version of read_fasta
    15:37 Parsing sequence names when comments are present
    21:41 Parsing comments from fasta header line
    24:01 Reading in multiple lines of sequence data
    37:08 Using read_fasta in vignette
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 4

  • @djangoworldwide7925
    @djangoworldwide7925 หลายเดือนก่อน +4

    I recommend the fastverse, mainly collapse:: package, which offers hyper-performnat in tidyverse syntax.
    Try instead of mutate, collapse::fmutate. Instead of inner join, collapse::join(...,how="inner") etc..
    I also like bench::mark(..,.., relative = TRUE) now to get relative time (slower is 1.00, all other align).
    These are super helpful functions in a great echo system very similar to the tidyverse, just faster.

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน

      Thanks - I'll have to check it out

  • @SammanMahmoud
    @SammanMahmoud หลายเดือนก่อน +2

    Sir, I am working on bioinformatics programming for several years now, and I need to say thank you for your wonderful efforts 😄, First Like 😁

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      Ha! Thanks for watching. Good luck in your own bioinformatics project