Incorporating C++ code in an R package with Rcpp and devtools (CC288)

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ก.ค. 2024
  • Sometimes R code just isn't fast enough. When you've tried all of the other options, rewriting the R code in C++ can be a great option. In this Code Club, Pat uses the Rcpp package to add C++ code to his phylotypr package. Using the devtools use_rcpp function he shows how to create the framework for incorporating C++ code. Then he uses the microbenchmark package to optimize his Rcpp code to see if he can make it faster than his base R code. Can he do it? This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.
    If you want to get a physical copy of R Packages: amzn.to/43pMR8L
    If you want a free, online version of R packages: r-pkgs.org/
    You can find my blog post for this episode at www.riffomonas.org/code_club/....
    Check out the GitHub repository at the:
    * Beginning of the episode: github.com/riffomonas/phyloty...
    * End of the episode: github.com/riffomonas/phyloty...
    #rstats #paste #paste0 #refactor #testthat #tdd #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome
    Support Riffomonas by becoming a Patreon member!
    / riffomonas
    Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at shop.riffomonas.org/youtube to get practice problems, tips, and insights.
    If you're interested in purchasing a video workshop be sure to check out riffomonas.org/workshops/
    You can also find complete tutorials for learning R with the tidyverse using...
    Microbial ecology data: www.riffomonas.org/minimalR/
    General data: www.riffomonas.org/generalR/
    0:00 Introduction
    3:31 Getting set up to use Rcpp in a package
    6:11 Writing Rcpp/C++ version of function
    15:35 Optimize performance of Rcpp code
    23:03 Tradeoffs with using Rcpp vs. pure R
    24:07 Committing changes to repository
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 12

  • @pedrobittencourt_
    @pedrobittencourt_ หลายเดือนก่อน +1

    I really need to watch this entire package development series!

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน

      Thanks! I think you'll learn a lot - I certainly have 🤓

  • @rgpy
    @rgpy หลายเดือนก่อน +2

    I do not know whether this is relevant to your case but something similar happened to me when solving a problem in economics and R was being faster than C++. I then avoided loops and resorted to RcppArmadillo package. This was after a long search though :). This library is optimized for high performance linear algebra and it might help at some point in the future.

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      Thanks - I've never used it, but RcppArmadillo is on my radar

  • @astrucmael5534
    @astrucmael5534 หลายเดือนก่อน

    I think you can parallelize your Rcpp loops because each iteration is independent of the previous one. I have no experience with parallel code in C++, but I tried to implement it in a fork with OpenMP, and it seems faster from my benchmarks ! Thanks for these series, they are really nice

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      Thanks! I may come back and try again when we talk about parallelization, but at this point, I'm pretty happy with its speed.

    • @astrucmael5534
      @astrucmael5534 หลายเดือนก่อน

      @@Riffomonas I'm looking forward to it! One thing that might affect the performance, is that when using devtools::load_all(), it says that the Rcpp code is compiled in debug mode, which does not include the compiler optimizations. Including the Makevars files with Open MP parameters seems to change it and might explain the faster results with Rcpp in my case, but I didn't do proper benchmarks

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน +1

      Thank you for your comment - wow! I did not know. It's a little confusing because the compiler flags for load_all() have both -O2 and -O0. If you do install() it drops the -O0. For this case, the function goes from about 10.5 seconds down to about 5.9 seconds, which is about 0.5 seconds faster than base R. Certainly something I'll remember and try to bring up again in a future episode!

  • @qwerty11111122
    @qwerty11111122 หลายเดือนก่อน

    You could try to unroll the loop yourself. I think the compiler should understand your cpu, but you can do 8 different adds or multiplies at once. The log and divide might not lol.

    • @Riffomonas
      @Riffomonas  หลายเดือนก่อน

      Thanks - I tried this when I was looking into options and unfortunately, it didn't really do anything (perhaps made it slower?). My understanding is that the compiler already does a lot of the optimization for this type of thing

    • @Unaimend
      @Unaimend หลายเดือนก่อน

      @@Riffomonas checking the compiler uses SIMD instructions should be easy by looking at the assembly