ONE TERABYTE of RAM saved with a single line of code (advanced) anthony explains

anthonywritescode

มุมมอง 55 918

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 23 ม.ค. 2025

ความคิดเห็น • 77

@htol78 ปีที่แล้ว ⁺⁴⁹
The thing i would really enjoy is troubleshooting process which lead to this solution.
@anthonywritescode ปีที่แล้ว ⁺²⁶
you'll want to check out next week's video then :)
@avapsilver ปีที่แล้ว ⁺⁵⁰
i work at datadog and its so cool seeing you use it and visualize everything nicely!!
@Slangs ปีที่แล้ว ⁺²
It will always feel good to see our product being used in the wild even when working for major companies, great job guys, amazing product
@myalpaca5 ปีที่แล้ว ⁺⁴²
How do you locate the position in the code where optimization is possible? Do you learn about gc.freeze() somewhere else first and then realize it could be used in the project? Or you notice there is high memory usage for the services and then actively looking for potential solutions and encounter gc.freeze()?
@anthonywritescode ปีที่แล้ว ⁺²³
it depends on the framework and how things are set up. usually you want it as late in the parent process before forking as possible.
I've known about this particular function for a while (even made a video on it a year or so ago). I'm currently trying to upgrade python and was hunting for a memory leak and decided to try this out for fun (and profit). had some success with this and similar approaches at previous employers
@lucaalberigo6302 ปีที่แล้ว ⁺⁷
For me to locate a problem usually it is a mix of debugging, experience(checking known bottlenecks for your application, example: access to disk, API interactions, parsing of big data sources, DB queries), and bench-marking; running operations containing different data to evaluate response times. You follow the data step by step until usually you hit a performance drop on a specific function(rarely your hole chain of calls is equally slow in all parts ).
The whole optimization process usually goes like this: optimization is needed for a certain piece of code, because is too slow/resource consuming; we analyze the code to try to understand the cause of the issue (eg. inefficient algorithm, too much memory used, slow operation because of too many api/database requests.. ). We first try to just make the code better, . If not sufficient then we try to apply known but maybe more complex optimization methods (if appropriate) like caching, optimizing external interactions. if we are not satisfied we try to find new solutions, by studying existing libraries, or checking if we need to use some new tools or libraries, or even restructure part of the code/ infrastructure.
It is a set of skills that you acquire with study (knowing the industry way to do something) and knowing the tools at your disposal by reading documentation of your libraries; then with time you build a set of solutions, at least for many common problems.
@redcrafterlppa303 ปีที่แล้ว ⁺³⁴
Isn't that why you generally avoid fork and use threads instead? All threads live in the same process sharing the heap while having their unique stack.
@JohnZakaria ปีที่แล้ว ⁺¹⁵
But python can't run true parallelism when you use threads. Maybe the new subinterpreter might deliver the solution
@redcrafterlppa303 ปีที่แล้ว ⁺²⁵
@@JohnZakaria I would say that's a design flaw in the language. Just another reason to hate on python 😂
@JohnZakaria ปีที่แล้ว ⁺⁷
Python was designed in the time where single core CPUs were the norm.
Yeah it might be a problem now.
Yes they could release python 4 and break everything for that to work, but that's painful for everyone
@wernersmidt3298 ปีที่แล้ว ⁺⁵
@@JohnZakaria Wasn't there some news that they are going to remove the GIL?
@JohnZakaria ปีที่แล้ว ⁺⁵
You're right i forgot about pep 703.
I think it was more for library devs.
The pep by itself wouldn't speedup code.
If I remember correctly it would slow down regular code
@codeman99-dev ปีที่แล้ว ⁺⁴
Talk about some great numbers to add to the resume!
@Jorge86797 ปีที่แล้ว
In my work I also noticed 9:25 this.. block algorithm specifics aligned for small objects optimisations.
However I have a need to... optimize if for storing bigger objects. It's.. bytes and str objects with sizes up to 5-10 MB (to be precise - thousands of incoming and outcoming html responses) which as we know - immutable and require.. continuous block of big size to store.
As result of this I have.. strange situation when process have for example total 50 MB of free RAM allocated to process but as It doesn't have free single continuous block with size of 5MB - process asks OS to allocate more RAM so I quickly run out of RAM with a lot of free memory I can't efficiently use.(All things inside single process)
Where or How I can get more detailed info about this? And in what direction I need to route?
@anthonywritescode ปีที่แล้ว
try jemalloc perhaps?
@Jorge86797 ปีที่แล้ว
@@anthonywritescode Thank You for advice. I will try that
@vinitkumar2923 ปีที่แล้ว ⁺⁸
Could we use in any Django project that uses celery or is it only specific to Sentry?
@anthonywritescode ปีที่แล้ว ⁺⁷
should be pretty universally useful, yeah
@pieter5466 ปีที่แล้ว ⁺¹⁴
4:40 Oh this is cool, I really need to learn more about the C implementation underlying Python.
edit: now I wonder how a circular garbage collector works...
@lonterel4704 ปีที่แล้ว ⁺³
Generation algorithm
@Barteks2x ปีที่แล้ว ⁺³
I don't know for sure ow it's implemented in python, but in general a GC works not by deleting stuff that needs to be deleted, and instead attempts to find everything that is referenced, and keeps that (by just traversing the object graph and keeping everything that is reachable)
@throwaway3227 ปีที่แล้ว ⁺¹
It's not the way Python does it, but Floyd’s Cycle Finding Algorithm is a pretty interesting way of finding circular references.
@glichking6812 6 หลายเดือนก่อน
I know nothing about any code or programming but I keep getting this video and still have no idea what's being said or how the solution worked
@ember2081 ปีที่แล้ว ⁺⁸
you've got to be so proud of yourself jesus
@skreftc ปีที่แล้ว ⁺⁵
This is a great video. Could you mention whether you saw a visible change in CPU usage and task latency?
We implemted this at work and we did see a decrease in memory consumption but the CPU increased quite a bit. Which is also seen by some tasks taking twice as much time.
@anthonywritescode ปีที่แล้ว ⁺²
our CPU didn't change noticeably, if anything it improved a tiny bit (which is what I expect)
@jlowe_n 11 หลายเดือนก่อน
Hey Anthony - the just found your last few videos and they have been great - I've been using memray cprofile pystack a lot the last year and its good to see how other folks are using it.
One question on gc.freeze() --- I've tried to recreate the standard Python behavior with CoW and fork with a basic example. (load a handful of modules, fork, do some minor calculations, force gc.collect). Examining the shared memory unique memory set in Debian, I don't seem to be able to recreate the issue in trivial cases.
@anthonywritescode 11 หลายเดือนก่อน
it's impossible to tell without seeing your setup
@spaghettiking653 8 หลายเดือนก่อน
If you disable the GC at this point before the fork, doesn't that make your program never free memory at any point after the fork? Do you ever re-enable the GC?
@anthonywritescode 8 หลายเดือนก่อน
gc freeze does not disable the gc
@itay51998 ปีที่แล้ว ⁺³
I know some python but not so in-depth, can barely understand what you are showing in cpython.
How would one learn this stuff?
@CouchPotator ปีที่แล้ว
That would be because the cpython stuff is C code, not python. And must of that code are Macros ( the lines begin with a #) and, to simplify, that is code that is run before it's complied. Mostly it's checking what compiler and system it's going to be used on.
___GNUC___ being the GCC and ___CLANG___ being the Clang C Compilers, respectively . ___STDC_VERSION___ is the version of the C language standard being used. _MSC_VER is the version of Microsoft's Visual C complier.
@australianman8566 9 หลายเดือนก่อน
how did he open paint when he's on ubuntu?
@eduardmart1237 ปีที่แล้ว ⁺¹
Can you make a guide on how to use Celery with Flask and Django? Especially when you create celery workers and wait them in flask.
@anthonywritescode ปีที่แล้ว ⁺¹
personally I would not recommend using celery. the architectural decision to use it at work predates me and is almost too big to change at this point
@eduardmart1237 ปีที่แล้ว
@@anthonywritescode what are the alternatives?
@anthonywritescode ปีที่แล้ว
any work queue really
@lonterel4704 ปีที่แล้ว ⁺²
I think you can also do this trick with gunicorn
@anthonywritescode ปีที่แล้ว ⁺²
yep! or really any prefork framework
@sepgh2216 ปีที่แล้ว
Exactly why I came to the comments. Wondering if anyone has tried this on Gunicorn and saw the results.
@rkdeshdeepak4131 ปีที่แล้ว ⁺²
Hey how do you use these windows apps directly on your linux desktop?
@drz1 ปีที่แล้ว ⁺²
VM
@rkdeshdeepak4131 ปีที่แล้ว
@@drz1 I know that , how does he make the individual apps appear directly on the linux desktop ? I have seen multiple times, e.g. paints in this video
@kamilogorek ปีที่แล้ว
This is not linux desktop. It's Windows with Linux VM in fullscreen mode, so he can simply tab out to other window apps@@rkdeshdeepak4131
@anthonywritescode ปีที่แล้ว ⁺⁵
not even full screen either but yes -- I crop the obs scene to just the Linux vm
@shadowpenguin3482 ปีที่แล้ว ⁺¹
Had to think a bit to understand, to put it in other words, he does not have a Windows VM in Linux, but a Linux VM in Windows, OBS is running on windows and is cropped the area of the Linux VM. When he moves a windows window on top of the Linux VM window it is not in the VM but on top of it.
@__Brandon__ ปีที่แล้ว ⁺¹
great work
@brookskd87 ปีที่แล้ว
Neat trick. Instead of using Celery prefork why not use the solo worker which is single process and let k8s scale the workers? This works well for our application and uses much less resources. The health probes and pod termination are tricky with long running tasks but possible by touching a file periodically. This way k8s handles hung tasks and more pods not worker processes is how you scale up.
@anthonywritescode ปีที่แล้ว
in theory that's better. practically though there are memory leaks and significant (unused) overhead of just getting the django app initialized. so single worker would be pretty wasteful (that prefork had such an impact is kind of a testament to that)
if each worker were a separate service that had very specific dependencies it would probably make sense? though that would involve tons of work since we have hundreds of different tasks
@trainerprecious1218 ปีที่แล้ว
i am sorry if i missed but what does "paging into those objects" mean?
@anthonywritescode ปีที่แล้ว
without going into too much detail memory is segmented into chunks which are called pages. when paged in they become resident (copied from the parent process)
@Asgallu 3 หลายเดือนก่อน
Great video
@ractheworld ปีที่แล้ว ⁺²
What a good engineer! This is why some guys rake in more dough than others.
@remboldt03 ปีที่แล้ว
You know how to make programs more efficient
I know how to use Paint more efficiently
We are not the same
@sconnz ปีที่แล้ว
Jeeze what type of server has 6+ terabytes of ram 😮
@anthonywritescode ปีที่แล้ว ⁺³
not a single server, a kubernetes cluster
@sconnz ปีที่แล้ว
@@anthonywritescode Thanks, that makes sense.
@smccrode ปีที่แล้ว
Hope you get a raise or a bonus for this! ;)
@miguelborges7913 ปีที่แล้ว
Is that an ubuntu vm on windows?
@Rachelebanham 11 หลายเดือนก่อน
dang python sucks at copy on write!!
@danieloberhoff1 ปีที่แล้ว ⁺⁴
hmm, tbh i would never runb something as big in python. maybe rather nodejs? but maybe that has another can of worms...still, the severe performance problems I keep running into with python would strongly disincentivise investing that deeply on it on a high performance server...
@joshix833 ปีที่แล้ว ⁺²
NodeJS has big performance problems too. Something native like Rust would be better
@protonjinx ปีที่แล้ว ⁺⁴
this just reinforces my belief that garbage collection based memory management is evil
@anthonywritescode ปีที่แล้ว ⁺³
a bit naive don't you think
@nezbrun872 11 หลายเดือนก่อน
@@squishy-tomato Projection much?
Yeah, just throw hardware at the problem. Cloud vendors must love you.
@andrey6104 ปีที่แล้ว ⁺¹
Нихуя не понял, но видос интересный, спасибо Антоха.
@k1zmt 11 หลายเดือนก่อน
Ну чего ты не понял-то? Сказали сборщику мусора не отслеживать ссылки. Его структуры перестали копироваться в дочерние процессы.
@Terrados1337 ปีที่แล้ว ⁺¹
Imagine someone tries to learn python and they start on their merry way, learning the basics, building their first hello world. And then you run in Dumbledore style and ask them calmly: "HARRY! Did you waste a Terrabyte of RAM using garbage collection?!?!"
@djtomoy ปีที่แล้ว
Huh?
@JarosławPorada-y5c ปีที่แล้ว ⁺²
Do you think that running gc.freeze after gc.collect would improve more memory usage?
def _create_worker_process(self, i):
worker_before_create_process.send(sender=self)
gc.collect() # Issue #2927
return super()._create_worker_process(i)
I put that signal just before collect and thats why this come into my thought.
@anthonywritescode ปีที่แล้ว ⁺¹
collect will likely make it worse because it will make more holes in arenas

ต่อไป

เล่นอัตโนมัติ

using memray to debug (and fix) a memory leak in krb5! (advanced) anthony explains #567