Benchmarking R functions for reading tsv files (CC291)
ฝัง
- เผยแพร่เมื่อ 16 ก.ค. 2024
- Reading data tables into R is a very common activity and there are many ways to do this in base R with read.delim or with the read_tsv function from readr, the vroom function from the vroom package, or the fread function from data.table. Pat will benchmark these four approaches and discuss the tradeoffs between speed and dependencies for package development. Then he implements readr::read_tsv with test driven development (TDD) to create a read_taxonomy function in his phylotypr R package. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.
If you want to get a physical copy of R Packages: amzn.to/43pMR8L
If you want a free, online version of R packages: r-pkgs.org/
You can find my blog post for this episode at www.riffomonas.org/code_club/....
Check out the GitHub repository at the:
* Beginning of the episode: github.com/riffomonas/phyloty...
* End of the episode: github.com/riffomonas/phyloty...
#rstats #readr #vroom #data.table #read.delim #rdp #16S #classification #classifier #microbialecology #microbiome
Support Riffomonas by becoming a Patreon member!
/ riffomonas
Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.
If you're interested in purchasing a video workshop be sure to check out riffomonas.org/workshops/
You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: www.riffomonas.org/minimalR/
General data: www.riffomonas.org/generalR/
0:00 Introduction
3:30 Benchmarking methods for reading tsv files
24:23 Writing tests for read_taxonomy
28:18 Writing read_taxonomy function
32:09 Refactoring read_taxonomy
35:51 Package hygiene - วิทยาศาสตร์และเทคโนโลยี
You should totally include arrow::read_tsv_arrow
Thanks for the suggestion! I just went back and added it to my benchmarking script. If I leave out the stri_replace_last_regex function call, it is pretty similar to vroom (5.7 vs 5.8 ms), which is a smidge faster than dt (7.6 ms). With the stri_replace_last_regex function call it is still similar to vroom (33 ms), but a smidge slower than dt (30.7 ms). I committed the additional test to the repository if you want to see what I did.