Excellent. Please report back on whether it makes a difference for you (positive or negative). It's one of those your milage may vary things. With Pandas, the data is in C and outside the GC's purview so it might matter less.
Nice one! I'll definitely be trying this out. One small thing I'd probably recommend using time.monotonic() to avoid time drift due to NTP / daylight savings etc. when measuring this kind of stuff.
@@mikeckennedy kinda all the numbers but mainly the 700. With thinking about this a bit sense I guess this could be a bit of a historical number. On older machines with lower memory, higher numbers could cause problems. So using a 50,000 could be costly. I am now wondering if those numbers will be changed by default in the future as python is trying to make performance improvements.
@@jamesfitzpatrick9607 I see. It's interesting that it's 700. That seems insanely low to me. I bet you're right that in 1991 that made sense. It doesn't now. Plus, the thing to keep in mind is that it *only* applies to cycles that are missed by ref counting. Ref counting runs first, then if there are cycles that would have leaked, then the GC kicks in. For 95% of apps, that almost never happens. I'm glad they make it configurable.
Just ran this on one of my scripts with the three lines added. 5 min .3 sec before and 4 min 17.9 sec after. That is amazing.
Dain, that is awesome. Thanks for sharing your findings.
Always a big fan of Mike and these amazing videos. I learn so much in this space
Thanks so much :)
I have to give this a try. A lot of my data does come from a database but I tend to put into a Pandas dataframe.
Excellent. Please report back on whether it makes a difference for you (positive or negative). It's one of those your milage may vary things. With Pandas, the data is in C and outside the GC's purview so it might matter less.
Nice one! I'll definitely be trying this out. One small thing I'd probably recommend using time.monotonic() to avoid time drift due to NTP / daylight savings etc. when measuring this kind of stuff.
Awesome video Michael! Thank you!
Thanks for the share Michael, I have some very long running data pipelines that I think could benefit from this.
You're welcome! Let us know how it turns out.
Thanks Michael!
Is this valid for python 3.11+?
Why aren't the memory settings optimized by default?
thank you
A question I need to ask is why are those numbers so low?
What numbers are you talking about specifically? And what are you comparing them to?
@@mikeckennedy kinda all the numbers but mainly the 700. With thinking about this a bit sense I guess this could be a bit of a historical number. On older machines with lower memory, higher numbers could cause problems. So using a 50,000 could be costly. I am now wondering if those numbers will be changed by default in the future as python is trying to make performance improvements.
@@jamesfitzpatrick9607 I see. It's interesting that it's 700. That seems insanely low to me. I bet you're right that in 1991 that made sense. It doesn't now. Plus, the thing to keep in mind is that it *only* applies to cycles that are missed by ref counting. Ref counting runs first, then if there are cycles that would have leaked, then the GC kicks in. For 95% of apps, that almost never happens. I'm glad they make it configurable.