DuckDB in Python - The Next Pandas Killer?

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 ก.ย. 2024

ความคิดเห็น • 49

  • @graymars1097
    @graymars1097 12 วันที่ผ่านมา +12

    I never tried duckdb but I'm convinced now thanks to you
    Pandas is great but "I can't query a DF like it's a DB using SQL" was one of my first compliants using pandas and "it needs its own process" is a huge deal breaker for data analysis specially in finance
    This solves all these problems 😮
    Great video 😊

  • @rsbenari
    @rsbenari 13 วันที่ผ่านมา +4

    Concise, complete, helpful. As usual. Thanks.

  • @jaybakerwork
    @jaybakerwork 12 วันที่ผ่านมา +3

    Big big fan of DuckDB. It is nice to be able to switch to SQL for some things. The engine is great. Pairing it with MotherDuck opens up some nice options.
    I like the relational api too.
    I switch between duckdb and polars.

  • @nocturnomedieval
    @nocturnomedieval 13 วันที่ผ่านมา +11

    This time every new tool seems faster than pandas even after the 2.0 arrow-based upgrade

    • @maximilianrausch5193
      @maximilianrausch5193 2 วันที่ผ่านมา

      What else is faster?

    • @nocturnomedieval
      @nocturnomedieval วันที่ผ่านมา

      @@maximilianrausch5193 polars, vaex

    • @nocturnomedieval
      @nocturnomedieval วันที่ผ่านมา

      ​@@maximilianrausch51931:27 polars and vaex are examples

  • @pawjast
    @pawjast 10 วันที่ผ่านมา

    I would have never thought duckDB is so versatile and really cool replacement for Pandas even when working with local files!

  • @Andrew-ud3xl
    @Andrew-ud3xl 9 วันที่ผ่านมา +1

    For the speed this is very comparable to polars, I started using SQL within polars but have switched to using duckdb due to duck showing where errors come from and i dont think polars sql has all the functionality of duck.
    Being able to save as a dataframe is nice to change things such as date columns to UK format and the display options of the output table.

  • @apefu
    @apefu วันที่ผ่านมา

    So if you don't need the sugar, just use a db already? I can't stress how much time this insight has saved me. Especially for live data.

  • @vuongphaminh2293
    @vuongphaminh2293 12 วันที่ผ่านมา

    Love your videos, very helpful and informative! I have a suggestion that you could consider using AI completion tools like SuperMaven, Copilot, Tabnine, Cursor, etc., to speed up your typing. Watching you type long code lines slowly can be a bit tedious.

  • @joeingle1745
    @joeingle1745 13 วันที่ผ่านมา +9

    Those two queries are CTEs (Common Table Expressions) not sub-selects. Nit picking, but there is a difference.

    • @NeuralNine
      @NeuralNine  12 วันที่ผ่านมา +2

      Thanks for the clarification! :)

    • @marvinalone
      @marvinalone 8 วันที่ผ่านมา

      In my point of view, you can treat CTE as sub-selects, it's just more clear and better readibility as it can reuse in the query statement. But yes, they do have some differences. In the video, the author called them 'sub selects' or 'sub queryies' I think is okay.

    • @joeingle1745
      @joeingle1745 8 วันที่ผ่านมา

      @@marvinalone Apart from the fact that they're not sub-selects...

  • @InhumanBean
    @InhumanBean 11 วันที่ผ่านมา

    Really great intro, thank you for making this.

  • @rohitpammi
    @rohitpammi 13 วันที่ผ่านมา +1

    Really very informatic, thanks for sharing.

  • @carlmunkby6140
    @carlmunkby6140 13 วันที่ผ่านมา +6

    Any plans for making a video about airflow?

  • @andybecker5001
    @andybecker5001 12 วันที่ผ่านมา

    Duckdb is super convenient for one off analysis or those that don’t have extra hardware to run their own database server.

  • @dandyexplorer4252
    @dandyexplorer4252 11 วันที่ผ่านมา +2

    Can you compare the speed of polars and duckdb?

  • @lowkeygaming4716
    @lowkeygaming4716 13 วันที่ผ่านมา

    Thank you for another great video. I learned something new again.

  • @Jianfeng-ny7to
    @Jianfeng-ny7to 11 วันที่ผ่านมา +1

    when considering the speed, how about the use pd.read_csv, engine="pyarrow"? which is also very fastly load the big csv file

  • @goodmanshawnhuang
    @goodmanshawnhuang 7 วันที่ผ่านมา +1

    Thanks for the video,i am considering querying a large set from bigquery then output to a csv file but better zip it, any suggestions please? Thanks

  • @xade8381
    @xade8381 12 วันที่ผ่านมา

    would be great if you could also cover cozo DB

  • @BaronCorso
    @BaronCorso 12 วันที่ผ่านมา +1

    How about duckdb vs polars?

  • @christopherc4526
    @christopherc4526 11 วันที่ผ่านมา

    Well done. Thank you

  • @nishantkumar9570
    @nishantkumar9570 10 วันที่ผ่านมา

    Can we do comparison between DuckDB vs Polars?

  • @timz2917
    @timz2917 12 วันที่ผ่านมา

    Is data transformation still better in Pandas or Polars? I appreciate that the analysis part is better in DuckDB.

  • @booster404
    @booster404 11 วันที่ผ่านมา +2

    This makes my thinking even more true "you master python then you master everything"

  • @user-ty2gg8vv6m
    @user-ty2gg8vv6m 10 วันที่ผ่านมา

    Interesting

  • @Rothbardo
    @Rothbardo 13 วันที่ผ่านมา

    Answer = yes

  • @matheuspegorari1094
    @matheuspegorari1094 9 วันที่ผ่านมา

    Wtf. Now I can use csv files as tables and query them with sql. I’ve only heard about this. But never seen it being done so easily

  • @philtoa334
    @philtoa334 13 วันที่ผ่านมา +1

    Thx_.

  • @mapleigue
    @mapleigue 9 วันที่ผ่านมา

    Does anyone know how would this compare to a PySpark setup?

  • @user-td4pf6rr2t
    @user-td4pf6rr2t 12 วันที่ผ่านมา +2

    4:01 Lol he said INSERT INTO person.

  • @jaybakerwork
    @jaybakerwork 12 วันที่ผ่านมา

    What is the go to screen capture that he is using that works on linux these days?

    • @nobiado484
      @nobiado484 9 วันที่ผ่านมา

      Same as rhe one for Windiws: OBS Studio.

  • @egonkirchof
    @egonkirchof 12 วันที่ผ่านมา +3

    Fun fact: after all these pandas killers people still are using pandas.

    • @gs-e2d
      @gs-e2d 11 วันที่ผ่านมา

      Pandas is a different use case

    • @egonkirchof
      @egonkirchof 11 วันที่ผ่านมา

      @@gs-e2d So they should be called Pandas Killer but is a different use case.

  • @SuperLimeWorld
    @SuperLimeWorld 13 วันที่ผ่านมา

    Hi

  • @harikrishnanb7273
    @harikrishnanb7273 12 วันที่ผ่านมา

    not the next, it's the real killer

  • @iegorshevchenko8365
    @iegorshevchenko8365 13 วันที่ผ่านมา

    +++++++++

  • @HH-mw4sq
    @HH-mw4sq 12 วันที่ผ่านมา

    For the complex query, I have:
    query = """
    WITH filtered_data AS (
    SELECT job, AVG(age) as avg_age
    FROM personnel_df
    WHERE age > 25
    GROUP BY job
    ),
    job_counts AS (
    SELECT job, COUNT(*) as count
    FROM personnel_df
    GROUP BY job
    )
    SELECT fd.job, fd.avg_age, jc.count
    FROM filtered_data fd
    JOIN job_counts jc
    ON fd.job = jc.job
    WHERE jc,count > 1
    ORDER BY fd.avg_age DESC
    """
    print(conn.sql(query).df())
    I get the following error:
    Traceback (most recent call last):
    File "D:/python/DuckDB/foo1.py", line 60, in
    main()
    File "D:/python/DuckDB/foo1.py", line 55, in main
    print(conn.sql(query).df())
    duckdb.duckdb.ParserException: Parser Error: syntax error at or near ","
    I am running python3.12, using the IDLE interface. Can someone please explain how to correct this error?

    • @benrontol2010
      @benrontol2010 12 วันที่ผ่านมา

      probably this one "WHERE jc,count > 1", it should be "jc.count", period not comma.

    • @HH-mw4sq
      @HH-mw4sq 12 วันที่ผ่านมา +1

      @benrontol2010 - wow!!!! I looked that over about 100 times and missed that. Thank you very much.

    • @amalzulkipli2248
      @amalzulkipli2248 12 วันที่ผ่านมา

      @@benrontol2010 real chad!