29
22 247

Caching Ibis Expressions for Lightning Fast Analytics

In this video I show how to use the new-in-5.0 cache method for ibis table expressions, to avoid repeated expensive to compute but cheap to store computations when doing interactive analysis.
Check out the ibis project at ibis-project.org

มุมมอง: 388

วีดีโอ

21:17

Connect to Your Data with Ibis

มุมมอง 5982 ปีที่แล้ว

Connect to Your Data with Ibis

ความคิดเห็น

@Sac-x5p 15 วันที่ผ่านมา
why create an intermediary/broker in the form of ibis while we can do well with DuckDB straight out?
@iamchinmayapadhi 2 หลายเดือนก่อน
Hi Philip is there a video for installing ibis ?
@MegaTheDamir 6 หลายเดือนก่อน
Oh man, biggest respect, having to figure out all commonalities, and special cases for different DBs, map everything in code, and maintain the code as different products evolve….can imagine how challenging it is…❤ Amazing work
@nicholasshook2474 7 หลายเดือนก่อน
this was a great video, thanks for creating it!
@holandacaua643 10 หลายเดือนก่อน
Goodnight ! I'm creating the convert function in SQL Server, my question is how to define a variable return in 'dtype' I try dt.Any but it doesn't work ex: class Convert(Value): arg: Value[dt.Any, ds.Any] data_type: Value[dt.String, ds.Any] style: Value[dt.Integer, ds.Any] | None = None dtype = dt.Any shape = rlz.shape_like('arg')
@cpcloud 6 หลายเดือนก่อน
You need to import ibis.expr.datatypes as dt!
@holandacaua643 6 หลายเดือนก่อน
@@cpcloud Ha thank you! I had made the import as in the convert function the output can vary, so solve it class Convert(Value): arg: Value[dt.Any, ds.Any] data_type: Value[dt.String, ds.Any] style: Value[dt.Integer, ds.Any] | None = None shape = rlz.shape_like('arg') @property def dtype(self) -> dt.DataType: key = self.data_type.value return _from_type_dtype[key.upper()]
@kefahissa 10 หลายเดือนก่อน
Nice. How did you manage to configure ipython to produce tables in that pretty way?
@cpcloud 6 หลายเดือนก่อน
We're using a library called rich for this!
@weeb3277 11 หลายเดือนก่อน
are you using Jupyter Notebook in your terminal? how doe sit work? or is it vscode?
@cpcloud 11 หลายเดือนก่อน
It's just regular IPython in the terminal, no jupyter, no vscode
@XRavenxMotionsX ปีที่แล้ว
is this a python repl? what are you running python in the video?
@cpcloud ปีที่แล้ว
I use IPython and it's my daily driver, I love it!
@XRavenxMotionsX ปีที่แล้ว
@@cpcloud as in ipython notebooks?
@cpcloud ปีที่แล้ว
No, you can pip install ipython. Then run ipython
@aakoss ปีที่แล้ว
How does it compare to Polars?
@cpcloud ปีที่แล้ว
Good question! Perhaps a Polars comparison is in order!
@cpcloud 11 หลายเดือนก่อน
Stay tuned, a blog post about this is on the way!
@aakoss 11 หลายเดือนก่อน
@@cpcloudawesome, looking forward to it
@xaviernogueira ปีที่แล้ว
Great video. Subbed! Curious what you use for your terminal to get the nicr colors?
@cpcloud ปีที่แล้ว
I'm using Alacritty and starship!
@bFix ปีที่แล้ว
2.8 times slower on execution, but you shouldn't forget load times. and memory usage (but you looked at that already)
@cpcloud ปีที่แล้ว
What load times are you referring to?
@bFix ปีที่แล้ว
df = t.execute() where pandas loaded the db into ram. or am I misunderstanding, what pandas/ibis did there? around 14:00
@cpcloud ปีที่แล้ว
Pandas loads the entire dataset into memory. DuckDB operates in a streaming fashion. If you have enough RAM, it might behave similarly with respect to memory as pandas. If you're RAM-limited, then it will work, whereas pandas will simply fail.
@bFix ปีที่แล้ว
well yes, but loading it into ram takes additional time. so pandas is even slower Sure if you execute many operations in pandas the initial overhead for loading data in memory gets less (per operation), but not every script does that many operations.
@manishmarx ปีที่แล้ว
Hi phillip nice tutorial , it helps a lot if you can make any tutorial reading s3 file and cache it using ibis which also include all cache related config . i have tried to read s3 files and done cache using ffspec but don't know how to using proper method of ibis and duck db way. TIa
@cpcloud ปีที่แล้ว
Great idea! We'll consider a cloud storage blog post soon on ibis-project.org/posts
@manishmarx 11 หลายเดือนก่อน
@@cpcloud Thanks please let me know once you posted a same which helps a lot Thanks agian
@manishmarx 8 หลายเดือนก่อน
Hi @Phillip is there any way to read large parequt file from s3 and cache it like hold up to last 2 hrs data from those file as a cache or similar approach to achieve cache ? TIA
@kryptec9668 ปีที่แล้ว
What REPL are you using to get those completions?
@cpcloud ปีที่แล้ว
I am using IPython!
@vineetsansi ปีที่แล้ว
Hi, this is a really good package. I have been following your videos but I wasn't able to install `ibis-bigquery` in anaconda so I wanted to nest/implode (reverse of unnest) the data before trying the unnest() feature. I know Polars to implode the data but I am not sure how can I do it in ibis or ibis-polars.
@cpcloud ปีที่แล้ว
The implode functionality is a method called collect(): ibis-project.org/reference/expression-generic#ibis.expr.types.generic.Value.collect
@sameerlalwani1 ปีที่แล้ว
This video was very useful helping me wrangle a dataset by mixing selectors ->pivot_longer -> pivot_wider . Doing it with pandas would have required me to breakup into multiple dataframes.
@cpcloud ปีที่แล้ว
Glad you like it!
@BarbaraGrazielleFirminoDeArauj ปีที่แล้ว
Unnest() doesn't work when you have a array of structs. Which version of ibis are you using?
@cpcloud ปีที่แล้ว
I don't remember! But we just released 7.0, give it a try. Unnest works with even more backends now!
@gg8117 ปีที่แล้ว
my guy didn't even know n_largest n_smallest functions. kinda clear he doesn't know pandas that well and hurts the usefulness of the video's credibility. try doing a little hw! not coming back to this channel unless the dude shows he knows what he is talking about. too many other actually informed people on yt to waste a 25min vid with this dude
@cpcloud ปีที่แล้ว
Feel free to do a video to correct my mistakes and link it here!
@arturocdb ปีที่แล้ว
Very useful…, more videos on pyarrow + duckdb + polars will be nice, thank you
@cpcloud ปีที่แล้ว
More to come!
@parikannappan1580 ปีที่แล้ว
awesome
@cpcloud ปีที่แล้ว
Thanks for watching!
@walterdiaz2003 ปีที่แล้ว
Excellent. Thanks for sharing.
@cpcloud ปีที่แล้ว
Glad you enjoyed it!
@walterdiaz2003 ปีที่แล้ว
reminds me of slick for scala.
@cpcloud ปีที่แล้ว
Nice, hadn't heard of that. Thanks!
@walterdiaz2003 ปีที่แล้ว
@@cpcloud I forgot to say, excellent video though. Subscribed!
@holandacaua643 ปีที่แล้ว
great content! If possible, would you be able to do a demo using MSSQL?
@cpcloud ปีที่แล้ว
For sure! It'll look pretty similar to other things, which is the whole point!
@incremental_failure ปีที่แล้ว
I'm looking to use Clickhouse as persistent storage and Polars with ConnectorX as an interface. Would like to use the Pandas-style API of Ibis but how can that fit in here?
@cpcloud ปีที่แล้ว
How about using ibis with the clickhouse backend and forego polars and connectorx altogether?
@hengainizhang4274 ปีที่แล้ว
I have a suggestion. For the line recorded in the video, it is best to move up a little, three or five lines will do. When I pause the video, the progress bar of the player just covers the concerned code.
@cpcloud ปีที่แล้ว
Great suggestion!
@hkpeaks ปีที่แล้ว
you also learn DuckDB cs Peaks th-cam.com/video/bzess7_pKoc/w-d-xo.html
@DerekMahar ปีที่แล้ว
Where can we find the Suba (sp?) library that you mention at the start of the video?
@DerekMahar ปีที่แล้ว
Where did you post the links to the projects to which you referred in the video? I'm watching this in Firefox on my phone and don't see the links in the video description. Did the live stream have a chat window?
@cpcloud ปีที่แล้ว
Siuba: github.com/machow/siuba
@DerekMahar ปีที่แล้ว
You mentioned near the start of this video that you are using IPython as your shell.
@cpcloud ปีที่แล้ว
Yep, IPython is great!
@DerekMahar ปีที่แล้ว
If I recall correctly DuckDB 0.8 moves NULL values to the bottom of a query result.
@cpcloud ปีที่แล้ว
That sounds right!
@DerekMahar ปีที่แล้ว
What shell do you use to display that "counter" prompt and the tables? It resembles a Jupyter notebook, but in a terminal.
@cpcloud ปีที่แล้ว
Regular IPython! Nothing super fancy!
@kennethcopeland9445 ปีที่แล้ว
I am unsucessfully trying to apply this technique to a BigQuery table. Should it work with BQ?
@cpcloud ปีที่แล้ว
You should be able to use a bigquery table as part of the join, but it won't work as the primary execution engine that ties everything together: only DuckDB is supported for that. Please open an issue on the ibis issue tracker on GitHub if you're still having trouble!
@manojjoshi4321 ปีที่แล้ว
Hey Phillip, Thanks a lot for taking time to create this video. Very helpful. Look forward to seeing more interesting stuff on ibis. Thanks. Manoj
@cpcloud ปีที่แล้ว
Thanks for watching!
@holandacaua643 ปีที่แล้ว
Does this approach also work using the 'mssql' backend ?
@cpcloud ปีที่แล้ว
You can certainly use MSSQL with ibis, but the examples are designed to work with duckdb because it's easy to set up and get started with.
@holandacaua643 ปีที่แล้ว
@@cpcloud thanks
@gw1284 ปีที่แล้ว
Thanks for this tutorial! Curious how to display your camera to screencast?
@cpcloud ปีที่แล้ว
Not sure what you mean!
@sayantangoswami7432 ปีที่แล้ว
What ide/text editor setup is this?
@cpcloud ปีที่แล้ว
Neovim with a bunch of plugins!
@hengainizhang4274 2 ปีที่แล้ว
This work is similar as a python polars example.
@cpcloud ปีที่แล้ว
Totally! That's the idea, we're reproducing their example with ibis.
@hengainizhang4274 2 ปีที่แล้ว
what about duckdb?
@cpcloud ปีที่แล้ว
Good question! What do you mean?
@OskarAustegard 2 ปีที่แล้ว
Impressive results. May want to zoom in on the bottom left of the terminal window next time for ease of watching this on a phone
@cpcloud ปีที่แล้ว
Thanks for the tip!
@bennguyen1313 2 ปีที่แล้ว
Interesting how struct.unpack returns a tuple.. that's not immediately obvious to me! In any case, I understand, 'b' is 8-bit, 'H' is 16, and 'i' is 32-bit.. the upper/lowercase just sets if Unsigned/signed.. however what about 24-bit? For example, many ADCs output 24-bit signed data.. so if I save a file with 24-bit data I've captured, is it possible for struct.unpack to automatically save every 3-bytes as a signed 32-bit integer (sign extending and adding an extra byte automatically)? Also, given struct.unpack requires a byte-array.. is it possible to pass it an array of c_ubytes? I have a can-bus structure that has a data array element. class TPCANMsg (Structure): _fields_ = [ ("ID", c_uint), # 11/29-bit message identifier ("MSGTYPE", TPCANMessageType), # Type of the message ("LEN", c_ubyte), # Data Length Code of the message (0..8) ("DATA", c_ubyte * 8) ] # Data of the message (DATA[0]..DATA[7]) It'd be nice to do something like struct.unpack ('>h', msgval.DATA[0:4]) , but that wouldn't be proper syntax since DATA is not the type that could be sliced.
@cpcloud ปีที่แล้ว
The struct unpack I am showing here isn't related to the standard library struct module!
@TimSwast 2 ปีที่แล้ว
Video intro starts at 3:30
@cpcloud 2 ปีที่แล้ว
Thanks Tim! We'll edit the video!
@cpcloud 2 ปีที่แล้ว
Edited!
@adibauI 2 ปีที่แล้ว
That's a incredible ibis video, ty so much!
@cpcloud 2 ปีที่แล้ว
Glad you like it!

Phillip in the Cloud

ความคิดเห็น