Accessing values from data frames, data tables, tibbles, matrices, and vectors (CC278)

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ต.ค. 2024

ความคิดเห็น • 18

  • @eric13hill
    @eric13hill 5 หลายเดือนก่อน +5

    Pat, I wanted to thank you for your selfless service that you do to help others. I really have benefitted from your kindness.

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน

      My pleasure and many thanks for your generous comment!

  • @PhilippusCesena
    @PhilippusCesena 5 หลายเดือนก่อน +3

    this is why we love your videos, also to learn different approaches, get into the right mode and mindset. Personally I had the pleasure of watching a lot of your videos and using R for about two years, not very long, today having changed jobs I use it much less, however to do simple things with Excel I take too long... So in the end I import in R and continue to do as I always have. Thanks to your videos I still don't lose too much dexterity, which unfortunately is quickly lost by standing still.

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน

      Thank you so much!🤓

  • @spacelem
    @spacelem 5 หลายเดือนก่อน

    Fascinating! I saw the note about how the weirdness goes away with bigger N, but I was surprised by how bad the results were there. All I can think is that there's a huge overhead for the actual getting of indices, relative to using the indices to extract the data. I don't care how much more performant "x == n1 | x == n2 | ..." is, I'm not giving up "x %in% c(n1, n2, ...)"!

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน +1

      Yeah, remember it's all about context and application. I use %in% all the time for analyses where speed doesn't matter. 99% of the time it takes longer to save a TIFF than to filter rows from a data frame 🤓

  • @djangoworldwide7925
    @djangoworldwide7925 5 หลายเดือนก่อน

    Might be interested to try the single vector function with a map or for loop and run through the desired kmers. You might find that just iterating with parallelizing of a single vector read is the most performent..

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน

      I tried map/sapply in an earlier episode to build a vector, it was pretty slow relative to other options

  • @sounkoumahamanetoure4607
    @sounkoumahamanetoure4607 5 หลายเดือนก่อน

    what is the effect of the JIT on these comparisons ?

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน

      Not sure what you mean by JIT?

  • @AKBARESFAHANI
    @AKBARESFAHANI 5 หลายเดือนก่อน

    Why not use Arrow and use it to read data out of memory

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน +1

      I haven't tried arrow, but in the next episode (Thursday, 2024-05-02) I'll try duckdb with duckplyr - it's pretty slick

    • @Riffomonas
      @Riffomonas  5 หลายเดือนก่อน +1

      Just tried arrow - it's about 3x slower than duckdb with the filter function on a table with 1e7 rows and 3 columns. Check back on Thursday afternoon and I'll post the updated timings with arrow included. Thanks for asking!

    • @AKBARESFAHANI
      @AKBARESFAHANI 5 หลายเดือนก่อน

      @@Riffomonas try saving your data out as Parquet using partitions for better performance

    • @AKBARESFAHANI
      @AKBARESFAHANI 5 หลายเดือนก่อน

      @@Riffomonas and I really enjoy your videos