Accelerating Python Data Analysis with DuckDB

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ส.ค. 2024
  • This presentation will introduce DuckDB and how it can speed up data analysis in Python. DuckDB is an in-process OLAP SQL database that integrates seamlessly with Python and packages like Pandas, and it allows for fast, local data processing in SQL. We’ll explore how to use it, why you might choose DuckDB over something like Pandas, and some of its other features.
    Justin Smithers has been a data engineer and business intelligence engineer at Best Choice Products for the past 2 years. Previously, he worked as a data analyst at Sweetwater Sound. Justin is new to Ann Arbor and will be moving to the area from Fort Wayne, Indiana this summer.

ความคิดเห็น • 3

  • @JohnFrederickRosas
    @JohnFrederickRosas 3 หลายเดือนก่อน

    May I ask how did a 44.2 GB csv file size manage to operate? I thought there is a maximum of 2 million rows in a single file of excel? Thank you for clarifying.

    • @leassis91
      @leassis91 3 หลายเดือนก่อน

      it wasnt an excel file, it was a .txt

    • @justinsmethers
      @justinsmethers 3 หลายเดือนก่อน

      Hi John, thanks for the question. The 2 million row limit applies to Excel sheets, not CSV files. CSV files are essentially plain text files (hence why it's a .txt file here) that can store data in a tabular format with no inherent row or column limits. In the demo, DuckDB reads from the file directly without using Excel. I hope that clears things up