Doing More with Data: An Introduction to Arrow for R Users

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ก.ค. 2024
  • Speaker: Danielle Navarro, Developer Advocate at Voltron Data
    As datasets become larger and more complex, the boundaries between data engineering and data science are becoming blurred. Data analysis pipelines with larger-than-memory data are becoming commonplace, creating a gap that needs to be bridged: between engineering tools designed to work with very large datasets on the one hand, and data science tools that provide the analysis capabilities used in data workflows on the other.
    One way to build this bridge is with Apache Arrow, a multi-language toolbox for working with larger-than-memory tabular data. Arrow is designed to improve performance and efficiency, and places emphasis on standardization and interoperability among workflow components, programming languages, and systems.
    This talk gives an introduction to the Arrow package in R, a mature interface to Apache Arrow, that provides an appealing solution for data scientists working with large data in R. It introduces the core concepts behind Apache Arrow and the Arrow package in R, provides a walkthrough of a sample data analysis using a large tabular data set (containing about 1.7 billion rows), and highlights possible pain points for an R user new to the Arrow ecosystem.

ความคิดเห็น • 8

  • @tamararodrigues3471
    @tamararodrigues3471 4 หลายเดือนก่อน

    Greaaaat video, thanks!!

  • @nndegwa1
    @nndegwa1 4 หลายเดือนก่อน

    Love it!

  • @user-gg5fc6yg9f
    @user-gg5fc6yg9f ปีที่แล้ว

    Thank you Danielle Navarro !

  • @dasrotrad
    @dasrotrad 2 ปีที่แล้ว

    Super tutorial Danielle. Thank you.

  • @jorgenengmann4856
    @jorgenengmann4856 ปีที่แล้ว

    super! thanks for this very useful tutorial.

  • @arturocdb
    @arturocdb ปีที่แล้ว

    Incredible useful thank you so much!…

  • @tarasst6887
    @tarasst6887 10 หลายเดือนก่อน

    🎉🎉🎉😊

  • @robinkohrs8097
    @robinkohrs8097 2 ปีที่แล้ว

    That looks fantastic! But what if I do not have my date as cleanly organzied in many "smaller" files, but rather one giant csv. Does arrow still have benefits?:)