Profiling and optimizing your Python code | Python tricks

Sebastiaan Mathôt

มุมมอง 71 532

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 24 พ.ย. 2024

ความคิดเห็น • 139

@senajoaop 5 ปีที่แล้ว ⁺¹⁹
Had to do a fast and superficial analysis in a very long code. This video made it possible, thanks a lot pal.
@simonbrecher878 5 ปีที่แล้ว ⁺³³
Good video, but it is actually quadratic (polynomial), not exponential. Quadratic is n^2, polynomial n^k, where k is a constant and exponential is k^n. n is length of input. You would not be able to do even near 5000 in exponential problem.
@daviddvorak3278 3 ปีที่แล้ว ⁺¹⁰
The final solution is also not linear, but n*log(n), since python sort is not linear.
@eilonavizemer7755 3 ปีที่แล้ว ⁺¹⁷
Great video!
here is another, perhaps easier, solution to make this code's complexity linear:
1. lowercase all the movies
2. convert your movie list into a set (sets in python avoid duplicates)
3. convert the set back into a list and return it.
@liorbm1 3 ปีที่แล้ว ⁺¹
It will be nice to see perf difference between his final code to your idea..
@vermarajat2596 2 ปีที่แล้ว
i think conversion from set to list will take more time.
@cosminturtureanu692 2 ปีที่แล้ว ⁺¹
The goal is to find the duplicates, not to remove them
@sheikhakbar2067 4 ปีที่แล้ว
This channel needs a couple of millions subscribers... I always come back to it to learn those marvellous tips and tricks!
@Drahagoon 4 ปีที่แล้ว ⁺⁸
Awesome video! Well explained, with a simple, clear and typical hands-on example illustration.
Great work.
@mervynwinn1852 5 ปีที่แล้ว ⁺⁶
i love the way you say "popping"
@mstr_rprochowicz 5 ปีที่แล้ว ⁺¹
It helped me a lot in tracking expensive functions that were unnecessary used 2 million times in a loop. Thanks for this useful tutorial!
@MsSuyash1995 5 ปีที่แล้ว ⁺¹
I came here to get a glimpse of how cProfile module but leaving here impressed with your final solution...
I loved how you combined the zip() function after sorting the list...
And, a great job in illustrating the importance of why profilers are an important tool in a programmer's armamentarium...
@legau2k 6 ปีที่แล้ว ⁺¹⁰
Awesome video. It was great watching you go step by step through the ode optimization. Your solution for finding duplicates also was very clever and elegant. Worth a subscribe ^.^
@AyushMandowara_xx7 4 ปีที่แล้ว
This helped me optimize code by about 50-75% depending on the file contents being scanned. Earlier it was consuming about a minute on a large scan, while now it takes about 20 at max. The average speed is reduced to 10secs from 25secs. All I did after analyzing was change my Pandas Series objects (generated from Google Spreadsheets) to tuples (lists would also have the same effect but my data never changes in a single run).
Using cprofiler I could see that Pandas library was consuming loads of resources just to fetch values based on an index number. Thanks a ton!!
@balmittal1770 ปีที่แล้ว ⁺¹
Nice code optimization. specially the last one.
@MrAmbarish710 4 ปีที่แล้ว ⁺²
Man your videos are really really helpful! Best explanation of cProfile, profiling and optimization in python. Please keep posting videos...
@sailalmishra4860 5 ปีที่แล้ว ⁺⁵
Buddy this is Amazing, You should not be on such low subscriber count.. God bless
@prakharchaurasiya8107 4 ปีที่แล้ว
Finally some optimization that is not too complex. Thanks.
@shivan2418 4 ปีที่แล้ว ⁺⁴
In case anyone in the future reads this
I found that this method executes even faster than the method he ended up with.
from collections import Counter
def find_duplicate_words_counter(src='movies.txt'):
return [movie for movie, count in Counter([movie.lower() for movie in read_movies(src)]).items() if count>1]
@Alister222222 3 ปีที่แล้ว ⁺³
Was going to post this as well - converting the list into a Counter (e.g. a special dict type from the collections module) and running a comprehension to get back everything that had a count above 1 does seem to be the cleanest way to get to the solution, and I am pleased it is also the fastest!
@droit19 2 ปีที่แล้ว
@@Alister222222 - I tried this and was 0.13 seconds faster or 44% faster than the Zip method
@0versun0 6 ปีที่แล้ว ⁺³¹
More more more videos. Yours video is very helpful! Keep going
@abhishekpandey7096 5 ปีที่แล้ว
Hey🌏🌏🏕️
@sm3801_smo 6 ปีที่แล้ว ⁺²
I initially subbed because of your Biological Psychology videos, but I didn't know you're into programming, very useful video!
6 ปีที่แล้ว
+Samuel Muñoz thanks ! Yes, the Bio Psy lectures are something new for me. Most of the videos are about Python and/or OpenSesame
@DaanWaardenburg 4 ปีที่แล้ว ⁺²³
Keep coming back here when my code starts running slow :P
@sschmachtel8963 4 ปีที่แล้ว ⁺¹
Yeah I can imagine :-) First time here.
Coming back to remember yourself that errr... yeah ... why on earth does it take so long and frustrates me ... how am I ever going to find out wth this code is slow. Is it me or is it some crazy circumstance that goes on in my libraries that I use.
While waiting for your code to finnish you can actually study the problems and get them fixed. But I do think not by hand just by profilers
@razintailor 4 ปีที่แล้ว ⁺¹
Great explanation. Lucid and fundamental. It is indeed helpful.
@vinitkumar2923 2 ปีที่แล้ว ⁺¹
Great video and explanation. Thanks for sharing this.
@yildirimicen766 2 ปีที่แล้ว ⁺¹
Hi Mr. Mathot,
you are great, I love your Python sessions... :)
@Julien-hg8jh 4 ปีที่แล้ว ⁺¹
15:30 auto corection !
Nice video BTW :D
@AmrXcellent 4 ปีที่แล้ว ⁺³
Good video but If I understand correctly the final change in code change does not account for a movie title that is duplicated more than once. So the first two iterations of the code are doing more functionality.
All in all nice video, I learned something new watching it. so thank you for that.
4 ปีที่แล้ว
That's correct: triple duplicates are not caught with this method. And thank you!
@deadman1999 2 ปีที่แล้ว
yes, I was thinking the same thing, the final code was so fast because it only checked its 1st neighbor, taking into account that there were only 1 duplicate.
@Grimlor 6 ปีที่แล้ว ⁺¹
I've found this so useful! Thank you for this video.
By analyzing my code and applying a little tweak, I've already managed to save 0.8 seconds of runtime. And I've only just started! :D
6 ปีที่แล้ว
+Grimlor glad to hear it!
@bunlonglay463 3 ปีที่แล้ว ⁺¹
Hey, shouldn't be your last solution, where you sort the movies list, O(n log(n)) and not as you said O(n)?. Sorting the movies list takes O(nlog(n)) time. Also when you use zip with slices of the movies arrays, copies of movies are created. This is also inefficient.
Could someone maybe confirm what I said?
Anyway, great video explaining the profiler
@ranelpadon8834 3 ปีที่แล้ว
Good analysis and build up of improvements. Thanks!
@arjunkirpal9776 6 ปีที่แล้ว ⁺¹
Thank you Sebastiaan! Would love more Asyncio videos!
@AutoXplorerYT 2 ปีที่แล้ว ⁺¹
Well explained! Thanks for this video....
@luciano_remes 2 ปีที่แล้ว ⁺²
Your last solution runs in NlogN time complexity, but you could actually make it faster by just using a set of found movies. It would run in Linear time and be way simpler:
found = set()
duplicates = []
for movie in movies:
if movie not in found:
found.add(movie)
else:
duplicates.append(movie)
@norwegiandud 3 ปีที่แล้ว
Helpful video, thanks!
Just one 🐛 with the 007-method (or weird feature).
If there are movies that are represented more than two times they appear as duplicates in the duplicates list.
E.G. 'the phantom of the opera' appears five times in the TXT file, and four times in the list of duplicates.
Now if this is a 🐛 or feature ... depends on who you ask.
@tonyradice4166 ปีที่แล้ว
Outstanding presentation!!!
@mahesh_kok 5 ปีที่แล้ว ⁺¹
This guy is crazy in coding , concepts and thinking...he brought down the execution time from 6 sec to .002 sec......this is insane ... tremendous work done bro...
@marveltv5341 4 ปีที่แล้ว
Careful... he is a hero 🙌
@maedehshahabi4744 2 ปีที่แล้ว
Thank you sir for your clear explanation.
@MultiRick15 2 ปีที่แล้ว
Wow! Great explanation.
@IsmaelRDeMelo 2 ปีที่แล้ว
You earned my follow at 15:33
@15kasturi ปีที่แล้ว
I just subscribed you by watching this video, very informative and nice goggles!
@farooqseeru948 5 ปีที่แล้ว ⁺¹
Brillant. Clear explanation.
@deividaspelakauskas9394 3 ปีที่แล้ว
Underrated.
@pygemssoftware4254 2 ปีที่แล้ว
Great work and explanation.
I would like to email my eyes to you as token of my appreciation😃
@marazDNG 2 ปีที่แล้ว ⁺¹
Great video man!
@fantasdeck 2 ปีที่แล้ว
I like how you edited your video to hide the little typo you made. But, cool tool. Will be using...
@nutcrackeroverdrive 6 ปีที่แล้ว ⁺¹
Thanx, Sebastiaan, very useful and helpful video.
@fuanka1724 6 ปีที่แล้ว ⁺²
Loved this. Optimization is really important to me. Thanks.
@hamol3d 3 ปีที่แล้ว ⁺¹
Great Video! Thank you.
@thecaveofthedead 6 ปีที่แล้ว ⁺¹
Excellent tutorial. Thanks.
@hrithiksharma2047 3 ปีที่แล้ว ⁺¹
Great tut bro! Thank you
@rgrapey 6 ปีที่แล้ว ⁺¹
Clear and informative!
@migovas1483 5 ปีที่แล้ว
This was great and clear, right to the point!!
@babuasian 6 ปีที่แล้ว
Appreciate it. Really useful for most of the programs..
@neelojp8460 6 ปีที่แล้ว ⁺¹
thank you so much for your videos they are really very helpful!
Do you have any own books about python ?
6 ปีที่แล้ว ⁺¹
+post fix Thank you! No, I'm afraid that I do not have any Python books myself. But there are plenty of good free Python books out there, such as Byte of Python.
@neelojp8460 6 ปีที่แล้ว
thank you for your answer, dank je wel :-).... you should wirte one about the tricks which you show us here... and here is the link for the Byte of Python for all others: www.gitbook.com/book/swaroopch/byte-of-python/details
@ikramu5719 4 ปีที่แล้ว
Thank you for that explanation. Neat solution with the zip and slices too! ps The link for the movies file is now out of date though.
@mariusnorheim 6 ปีที่แล้ว ⁺²
Hi Sebastian, I tried running the profile decorator, but 1) I'm using python 2.7 and 2) I'm running it in atom, not jupyter, so I get an error message. Would be awesome if you could post the code for python 2.7 as well in the file
@mariusnorheim 6 ปีที่แล้ว ⁺³
Actually got it to work. It seems that you'll have to encode the Unicode strings to byte strings, and use io.BytesIO, instead of io.StringIO.
@xanterx 4 ปีที่แล้ว ⁺¹
Love your shades 🤘
@ЕвгенийТитов-и9ю 4 ปีที่แล้ว
This is super nice video, thank you sir
@vyl6781 5 ปีที่แล้ว ⁺²
Saved my sanity.
@vaibhavjain1914 3 ปีที่แล้ว
Bruh in this video you are teaching code optimization but looking at your choice of wearable I feel I am learning how to assassinate enemy but amazing video 😀
@danyalt8221 2 ปีที่แล้ว ⁺¹
It Was Great! Thank You.
@anumsheraz4625 4 ปีที่แล้ว ⁺¹
is there any tool to identify how much memory is consumed by the code ?
@blanky_nap 6 ปีที่แล้ว ⁺³
Great video!
@siddharthindora7182 4 ปีที่แล้ว
Great Video...Thanks for explanation :)
@kpespinosa 5 ปีที่แล้ว ⁺¹
great explanation! cheers
@yildirimicen766 2 ปีที่แล้ว
Hi Mr. Mathot,
how about the following with "combinations" (you can even omit "movies.sort()"):
from itertools import combinations
# find duplicates in list of movies
movies = ['abc','abc','xyz','ddddd','ddddd','star wars']
print([m for m,n in combinations(movies, 2) if m==n])
@acho8387 4 ปีที่แล้ว ⁺¹
very good video! thanks!
@jeremyalvaprathama4069 4 ปีที่แล้ว
Awesome work! I just subscribed
@onlymusic2005 4 ปีที่แล้ว
Real treasure... bunch of thanx
@alishermatkurbanov9205 5 ปีที่แล้ว ⁺¹
What if the list has more than 1 duplicate, e.g. [1, 3, 1, 4, 1, 4, 4, 5] -> sorted [1, 1, 1, 3, 4, 4, 4, 5] -> zipped smth like this [(1, 1), (1, 1), (1, 3), (4, 4), (4, 4), (4, 5)], so 1 and 4 will be added to duplicates twice. Doesnt duplicates should be list with unique items?
5 ปีที่แล้ว ⁺²
That's correct. Triplicates will end up twice in the list of duplicates, which may not be what you want. An easy trick to get around that would be to use a set comprehension (th-cam.com/video/uTUV2eONqSQ/w-d-xo.html), rather than a list comprehension. Because sets by definition consist of unique items.
@svalaboj 4 ปีที่แล้ว ⁺¹
your video is very useful, thanks for the same.
@ke30_ 4 ปีที่แล้ว
I love this so much
@botenbireu7875 6 ปีที่แล้ว ⁺¹
Thank you a lot! very clear explanation!!!
@adityakushwaha3654 3 ปีที่แล้ว
But how do you know which code will be more efficient wrt present code ?
@vanglequy7844 4 ปีที่แล้ว
13:30 Who else pause the video and challenge yourselves? But beware of the tendency to jump into redesign the solution before profiling.
@mohammedgt8102 2 ปีที่แล้ว
Awesome video.
@benedictcoltman1983 4 ปีที่แล้ว
Superb! Thanks
@7aygames35 3 ปีที่แล้ว
The 22 people who disliked are those who were writing bad code and when it was pointed out to them, they just got angry
@drewduncan5774 6 ปีที่แล้ว ⁺³⁵
12:54 Quadratic, not exponential.
4 ปีที่แล้ว ⁺¹
NO it's in O(n*ln(n)) because of the Sort()
@stephenaiesi6073 4 ปีที่แล้ว ⁺⁴
With Big O notation we are really ony concerend with the term with the highest power. An algorithm on the order of O(3x² + 2x + 11) is usually reduced down to to O(3x²). I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion.
So in terms of Big-O, an algorithm on the order of a quadratic equation is usually considered to be on the order of its highest term.
If you think about comparing two algorithms, one operating at O(3x² + 2x + 11) to one that runs at O(3x²), let's see how different they really are:
So given the following equations:
f(x) = 3x² + 2x + 11
g(x) = 3x²
Let's see how they correspond given a single input (n=1)
f(1) = 16
g(1) = 3
The ratio between these two results is 5.33 and would go to show that quadratic and exponential are not swappable in this context
Now lets scale it to 100 inputs, n=100
f(100) = 30211
g(100) = 30000
Now they are operating at a ratio of 1.007. Not identical, but damn near close dependng on the precision needed. In terms of making algorithms efficient with computers 100 inputs is not considered much anyways.
Now let's scale it to 1,000,00 inputs
f(1,000,000) = 3000002000011
g(1,000,000) = 3000000000000
Ratio of 1.00000066
The difference in comparing these two with without the extra terms is often negligible when comparing them to algorithms on the order of a different exponential power.
Run the same exeriment with comparing f(x²) and g(x³), with and without extra quadatic terms and you can see that dropping the lower terms, though not exact, is definitely enough to compare the efficiency of algorithms.
So as the size of the inputs grows, paritculary towards quantities where optimization is necessary, we are usually dealing with such vast amounts of data that including the lower terms of the quadratic formula in our assessment of an algorithms efficiency does not necessarily provide extra insight.
ps: i'm fully aware this isn't the case in every domain, but it is for the most part how it is done and definitely applies to the kinds of problems in this video.
@_treed1 4 ปีที่แล้ว ⁺¹
Lol these comments. It's a loop in a loop which is n * n so n^2 tadah
@__gavin__ 4 ปีที่แล้ว ⁺¹
@@stephenaiesi6073
> I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion.
Big O notation has a formal mathematical definition. A function f(x) is said to be O(g(x)) if |f(x)| =x_0, where A and x_0 are some constant values. Hence, when considering big O notation, it really doesn't matter if you drop the 3 or not. If f(x) = O(3x^2) then all we are saying is that there exists some x_0 such that for all x>=x_0, |f(x)|
@calebmunuru3598 4 ปีที่แล้ว
Stephen Aiesi Thanks mate. This is a really good explanation
@graycybermonk3068 5 ปีที่แล้ว
You will kill me. Really Awesome.
@Jure1234567 3 ปีที่แล้ว
Can I do it with wxwidgets classes and multithreading?
@haonanqiu4251 3 ปีที่แล้ว ⁺¹
thanks a lot!
@JakobRobert00 4 ปีที่แล้ว
Thank you, this helped me so much :)
@alvaromartin6301 5 ปีที่แล้ว ⁺²
Excelente Content! New sub.
@TheFilipo2 6 ปีที่แล้ว
Thank you, this was super helpful!
6 ปีที่แล้ว
Good to hear!
@emasmach 5 ปีที่แล้ว ⁺¹
Nice. Excelent.
@Excess-qn7qh 4 ปีที่แล้ว
does the @profile annotaion only work with jUpiter?
@parietal100 6 ปีที่แล้ว ⁺¹
Thank you Sebastian
@dhananjaykansal8097 5 ปีที่แล้ว ⁺¹
YOU ARE JUST AWESOMEEEEEE
@IsmaelRDeMelo 2 ปีที่แล้ว
"Well, you can see that our code it's taking about 0.00023 to execute. But if you're not satisfied with that..." lmao
@fcoignmo 6 ปีที่แล้ว
Where did you get the "movies.txt" file (link)? Thank you for the vide, great work.
6 ปีที่แล้ว ⁺¹
My reply is a bit late, but I got this data from here: osf.io/r73y9/
@sashkazayebashka 6 ปีที่แล้ว
Great video/ Thank you man!
@АлексейТрофимов-ф5у 2 ปีที่แล้ว ⁺¹
thanks!
@naughtybuddha3942 4 ปีที่แล้ว
Where is the movies.txt? Please provide it, thanks.
@pushpendrasingh1819 6 ปีที่แล้ว ⁺²
Bro it would give duplicates result if our file is containing movies that are there in file more than 2.
Ex. movie.txt
Hello
Hello
Hello
then your code will print hello atleast twice.
@sailalmishra4860 5 ปีที่แล้ว ⁺¹
Hey, Hope you understand this is for demonstration purpose. Would be good to concentrate on the technique rather than logic. u can probably go about refining the logic.. A solution is to use set and then extract elements with more than 1 occurance.
@Memfis0 4 ปีที่แล้ว
you can use this: `duplicates = [name for name, count in Counter(movies).items() if count > 1]` instead of that zip zip method also this one doesn't require the list to be sorted remember to `from collections import Counter`
@xspager 5 ปีที่แล้ว
Awesome explanation but when you removed the function you also changed the way you do the searching, you stopped looping over all the movies and used the "in" operator
@nikithar3628 6 ปีที่แล้ว ⁺¹
Awesome
@DragonRazor9283 3 ปีที่แล้ว
from 6 seconds to 0.007 seconds wow!
@bhupendram5212 2 ปีที่แล้ว
Thanks a lot, but the duplicate logic is wrong. It will fail if the movies list has 3 or more consecutive duplicate movies.
@deepak1725 5 ปีที่แล้ว
Very Very nice
@BullishBuddy 2 ปีที่แล้ว ⁺¹
👍👍
@ВикторДзеба 2 ปีที่แล้ว
May you give us the movies.txt file please???
@chunceywei8284 6 ปีที่แล้ว ⁺¹
Thank you

ต่อไป

เล่นอัตโนมัติ