Profiling and optimizing your Python code | Python tricks

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024
  • (For more, visit pythontutorial... !) In this video, I show how you can profile Python code using the cProfile module, and how you can use this information to optimize your code, resulting (sometimes) in massive performance improvements.
    The Jupyter notebook is available from osf.io/upav8/

ความคิดเห็น • 139

  • @senajoaop
    @senajoaop 5 ปีที่แล้ว +19

    Had to do a fast and superficial analysis in a very long code. This video made it possible, thanks a lot pal.

  • @Drahagoon
    @Drahagoon 4 ปีที่แล้ว +8

    Awesome video! Well explained, with a simple, clear and typical hands-on example illustration.
    Great work.

  • @eilonavizemer7755
    @eilonavizemer7755 3 ปีที่แล้ว +17

    Great video!
    here is another, perhaps easier, solution to make this code's complexity linear:
    1. lowercase all the movies
    2. convert your movie list into a set (sets in python avoid duplicates)
    3. convert the set back into a list and return it.

    • @liorbm1
      @liorbm1 3 ปีที่แล้ว +1

      It will be nice to see perf difference between his final code to your idea..

    • @vermarajat2596
      @vermarajat2596 2 ปีที่แล้ว

      i think conversion from set to list will take more time.

    • @cosminturtureanu692
      @cosminturtureanu692 2 ปีที่แล้ว +1

      The goal is to find the duplicates, not to remove them

  • @balmittal1770
    @balmittal1770 ปีที่แล้ว +1

    Nice code optimization. specially the last one.

  • @simonbrecher878
    @simonbrecher878 5 ปีที่แล้ว +33

    Good video, but it is actually quadratic (polynomial), not exponential. Quadratic is n^2, polynomial n^k, where k is a constant and exponential is k^n. n is length of input. You would not be able to do even near 5000 in exponential problem.

    • @daviddvorak3278
      @daviddvorak3278 3 ปีที่แล้ว +10

      The final solution is also not linear, but n*log(n), since python sort is not linear.

  • @mervynwinn1852
    @mervynwinn1852 5 ปีที่แล้ว +6

    i love the way you say "popping"

  • @legau2k
    @legau2k 6 ปีที่แล้ว +10

    Awesome video. It was great watching you go step by step through the ode optimization. Your solution for finding duplicates also was very clever and elegant. Worth a subscribe ^.^

  • @sheikhakbar2067
    @sheikhakbar2067 4 ปีที่แล้ว

    This channel needs a couple of millions subscribers... I always come back to it to learn those marvellous tips and tricks!

  • @sailalmishra4860
    @sailalmishra4860 5 ปีที่แล้ว +5

    Buddy this is Amazing, You should not be on such low subscriber count.. God bless

  • @mstr_rprochowicz
    @mstr_rprochowicz 5 ปีที่แล้ว +1

    It helped me a lot in tracking expensive functions that were unnecessary used 2 million times in a loop. Thanks for this useful tutorial!

  • @MrAmbarish710
    @MrAmbarish710 4 ปีที่แล้ว +2

    Man your videos are really really helpful! Best explanation of cProfile, profiling and optimization in python. Please keep posting videos...

  • @MsSuyash1995
    @MsSuyash1995 5 ปีที่แล้ว +1

    I came here to get a glimpse of how cProfile module but leaving here impressed with your final solution...
    I loved how you combined the zip() function after sorting the list...
    And, a great job in illustrating the importance of why profilers are an important tool in a programmer's armamentarium...

  • @0versun0
    @0versun0 6 ปีที่แล้ว +31

    More more more videos. Yours video is very helpful! Keep going

  • @vinitkumar2923
    @vinitkumar2923 2 ปีที่แล้ว +1

    Great video and explanation. Thanks for sharing this.

  • @AyushMandowara_xx7
    @AyushMandowara_xx7 4 ปีที่แล้ว

    This helped me optimize code by about 50-75% depending on the file contents being scanned. Earlier it was consuming about a minute on a large scan, while now it takes about 20 at max. The average speed is reduced to 10secs from 25secs. All I did after analyzing was change my Pandas Series objects (generated from Google Spreadsheets) to tuples (lists would also have the same effect but my data never changes in a single run).
    Using cprofiler I could see that Pandas library was consuming loads of resources just to fetch values based on an index number. Thanks a ton!!

  • @ganeshchaudhari8087
    @ganeshchaudhari8087 2 ปีที่แล้ว +1

    Well explained! Thanks for this video....

  • @sm3801_smo
    @sm3801_smo 6 ปีที่แล้ว +2

    I initially subbed because of your Biological Psychology videos, but I didn't know you're into programming, very useful video!

    •  6 ปีที่แล้ว

      +Samuel Muñoz thanks ! Yes, the Bio Psy lectures are something new for me. Most of the videos are about Python and/or OpenSesame

  • @mahesh_kok
    @mahesh_kok 4 ปีที่แล้ว +1

    This guy is crazy in coding , concepts and thinking...he brought down the execution time from 6 sec to .002 sec......this is insane ... tremendous work done bro...

  • @prakharchaurasiya8107
    @prakharchaurasiya8107 4 ปีที่แล้ว

    Finally some optimization that is not too complex. Thanks.

  • @Julien-hg8jh
    @Julien-hg8jh 4 ปีที่แล้ว +1

    15:30 auto corection !
    Nice video BTW :D

  • @yildirimicen766
    @yildirimicen766 2 ปีที่แล้ว +1

    Hi Mr. Mathot,
    you are great, I love your Python sessions... :)

  • @ranelpadon8834
    @ranelpadon8834 2 ปีที่แล้ว

    Good analysis and build up of improvements. Thanks!

  • @arjunkirpal9776
    @arjunkirpal9776 6 ปีที่แล้ว +1

    Thank you Sebastiaan! Would love more Asyncio videos!

  • @shivan2418
    @shivan2418 4 ปีที่แล้ว +4

    In case anyone in the future reads this
    I found that this method executes even faster than the method he ended up with.
    from collections import Counter
    def find_duplicate_words_counter(src='movies.txt'):
    return [movie for movie, count in Counter([movie.lower() for movie in read_movies(src)]).items() if count>1]

    • @Alister222222
      @Alister222222 3 ปีที่แล้ว +3

      Was going to post this as well - converting the list into a Counter (e.g. a special dict type from the collections module) and running a comprehension to get back everything that had a count above 1 does seem to be the cleanest way to get to the solution, and I am pleased it is also the fastest!

    • @droit19
      @droit19 2 ปีที่แล้ว

      @@Alister222222 - I tried this and was 0.13 seconds faster or 44% faster than the Zip method

  • @maedehshahabi4744
    @maedehshahabi4744 2 ปีที่แล้ว

    Thank you sir for your clear explanation.

  • @Grimlor
    @Grimlor 6 ปีที่แล้ว +1

    I've found this so useful! Thank you for this video.
    By analyzing my code and applying a little tweak, I've already managed to save 0.8 seconds of runtime. And I've only just started! :D

    •  6 ปีที่แล้ว

      +Grimlor glad to hear it!

  • @tonyradice4166
    @tonyradice4166 ปีที่แล้ว

    Outstanding presentation!!!

  • @razintailor
    @razintailor 4 ปีที่แล้ว +1

    Great explanation. Lucid and fundamental. It is indeed helpful.

  • @15kasturi
    @15kasturi ปีที่แล้ว

    I just subscribed you by watching this video, very informative and nice goggles!

  • @Nobrezando
    @Nobrezando ปีที่แล้ว

    You earned my follow at 15:33

  • @MultiRick15
    @MultiRick15 2 ปีที่แล้ว

    Wow! Great explanation.

  • @hamol3d
    @hamol3d 3 ปีที่แล้ว +1

    Great Video! Thank you.

  • @pygemssoftware4254
    @pygemssoftware4254 2 ปีที่แล้ว

    Great work and explanation.
    I would like to email my eyes to you as token of my appreciation😃

  • @DaanWaardenburg
    @DaanWaardenburg 4 ปีที่แล้ว +23

    Keep coming back here when my code starts running slow :P

    • @sschmachtel8963
      @sschmachtel8963 4 ปีที่แล้ว +1

      Yeah I can imagine :-) First time here.
      Coming back to remember yourself that errr... yeah ... why on earth does it take so long and frustrates me ... how am I ever going to find out wth this code is slow. Is it me or is it some crazy circumstance that goes on in my libraries that I use.
      While waiting for your code to finnish you can actually study the problems and get them fixed. But I do think not by hand just by profilers

  • @marveltv5341
    @marveltv5341 4 ปีที่แล้ว

    Careful... he is a hero 🙌

  • @nutcrackeroverdrive
    @nutcrackeroverdrive 6 ปีที่แล้ว +1

    Thanx, Sebastiaan, very useful and helpful video.

  • @AmrXcellent
    @AmrXcellent 4 ปีที่แล้ว +3

    Good video but If I understand correctly the final change in code change does not account for a movie title that is duplicated more than once. So the first two iterations of the code are doing more functionality.
    All in all nice video, I learned something new watching it. so thank you for that.

    •  4 ปีที่แล้ว

      That's correct: triple duplicates are not caught with this method. And thank you!

    • @deadman1999
      @deadman1999 2 ปีที่แล้ว

      yes, I was thinking the same thing, the final code was so fast because it only checked its 1st neighbor, taking into account that there were only 1 duplicate.

  • @marazDNG
    @marazDNG 2 ปีที่แล้ว +1

    Great video man!

  • @luciano_remes
    @luciano_remes 2 ปีที่แล้ว +2

    Your last solution runs in NlogN time complexity, but you could actually make it faster by just using a set of found movies. It would run in Linear time and be way simpler:
    found = set()
    duplicates = []
    for movie in movies:
    if movie not in found:
    found.add(movie)
    else:
    duplicates.append(movie)

  • @norwegiandud
    @norwegiandud 3 ปีที่แล้ว

    Helpful video, thanks!
    Just one 🐛 with the 007-method (or weird feature).
    If there are movies that are represented more than two times they appear as duplicates in the duplicates list.
    E.G. 'the phantom of the opera' appears five times in the TXT file, and four times in the list of duplicates.
    Now if this is a 🐛 or feature ... depends on who you ask.

  • @danyalt8221
    @danyalt8221 2 ปีที่แล้ว +1

    It Was Great! Thank You.

  • @thecaveofthedead
    @thecaveofthedead 6 ปีที่แล้ว +1

    Excellent tutorial. Thanks.

  • @bunlonglay463
    @bunlonglay463 3 ปีที่แล้ว +1

    Hey, shouldn't be your last solution, where you sort the movies list, O(n log(n)) and not as you said O(n)?. Sorting the movies list takes O(nlog(n)) time. Also when you use zip with slices of the movies arrays, copies of movies are created. This is also inefficient.
    Could someone maybe confirm what I said?
    Anyway, great video explaining the profiler

  • @ЕвгенийТитов-и9ю
    @ЕвгенийТитов-и9ю 3 ปีที่แล้ว

    This is super nice video, thank you sir

  • @farooqseeru948
    @farooqseeru948 5 ปีที่แล้ว +1

    Brillant. Clear explanation.

  • @ke30_
    @ke30_ 3 ปีที่แล้ว

    I love this so much

  • @hrithiksharma2047
    @hrithiksharma2047 3 ปีที่แล้ว +1

    Great tut bro! Thank you

  • @svalaboj
    @svalaboj 3 ปีที่แล้ว +1

    your video is very useful, thanks for the same.

  • @babuasian
    @babuasian 6 ปีที่แล้ว

    Appreciate it. Really useful for most of the programs..

  • @benedictcoltman1983
    @benedictcoltman1983 4 ปีที่แล้ว

    Superb! Thanks

  • @siddharthindora7182
    @siddharthindora7182 4 ปีที่แล้ว

    Great Video...Thanks for explanation :)

  • @xanterx
    @xanterx 4 ปีที่แล้ว +1

    Love your shades 🤘

  • @neelojp8460
    @neelojp8460 6 ปีที่แล้ว +1

    thank you so much for your videos they are really very helpful!
    Do you have any own books about python ?

    •  6 ปีที่แล้ว +1

      +post fix Thank you! No, I'm afraid that I do not have any Python books myself. But there are plenty of good free Python books out there, such as Byte of Python.

    • @neelojp8460
      @neelojp8460 6 ปีที่แล้ว

      thank you for your answer, dank je wel :-).... you should wirte one about the tricks which you show us here... and here is the link for the Byte of Python for all others: www.gitbook.com/book/swaroopch/byte-of-python/details

  • @fantasdeck
    @fantasdeck 2 ปีที่แล้ว

    I like how you edited your video to hide the little typo you made. But, cool tool. Will be using...

  • @onlymusic2005
    @onlymusic2005 4 ปีที่แล้ว

    Real treasure... bunch of thanx

  • @deividaspelakauskas9394
    @deividaspelakauskas9394 3 ปีที่แล้ว

    Underrated.

  • @acho8387
    @acho8387 4 ปีที่แล้ว +1

    very good video! thanks!

  • @botenbireu7875
    @botenbireu7875 6 ปีที่แล้ว +1

    Thank you a lot! very clear explanation!!!

  • @Nobrezando
    @Nobrezando ปีที่แล้ว

    "Well, you can see that our code it's taking about 0.00023 to execute. But if you're not satisfied with that..." lmao

  • @mohammedgt8102
    @mohammedgt8102 ปีที่แล้ว

    Awesome video.

  • @fuanka1724
    @fuanka1724 6 ปีที่แล้ว +2

    Loved this. Optimization is really important to me. Thanks.

  • @anumsheraz4625
    @anumsheraz4625 4 ปีที่แล้ว +1

    is there any tool to identify how much memory is consumed by the code ?

  • @jeremyalvaprathama4069
    @jeremyalvaprathama4069 4 ปีที่แล้ว

    Awesome work! I just subscribed

  • @migovas1483
    @migovas1483 5 ปีที่แล้ว

    This was great and clear, right to the point!!

  • @rgrapey
    @rgrapey 6 ปีที่แล้ว +1

    Clear and informative!

  • @haonanqiu4251
    @haonanqiu4251 3 ปีที่แล้ว +1

    thanks a lot!

  • @yildirimicen766
    @yildirimicen766 2 ปีที่แล้ว

    Hi Mr. Mathot,
    how about the following with "combinations" (you can even omit "movies.sort()"):
    from itertools import combinations
    # find duplicates in list of movies
    movies = ['abc','abc','xyz','ddddd','ddddd','star wars']
    print([m for m,n in combinations(movies, 2) if m==n])

  • @mariusnorheim
    @mariusnorheim 6 ปีที่แล้ว +2

    Hi Sebastian, I tried running the profile decorator, but 1) I'm using python 2.7 and 2) I'm running it in atom, not jupyter, so I get an error message. Would be awesome if you could post the code for python 2.7 as well in the file

    • @mariusnorheim
      @mariusnorheim 6 ปีที่แล้ว +3

      Actually got it to work. It seems that you'll have to encode the Unicode strings to byte strings, and use io.BytesIO, instead of io.StringIO.

  • @kpespinosa
    @kpespinosa 5 ปีที่แล้ว +1

    great explanation! cheers

  • @АлексейТрофимов-ф5у
    @АлексейТрофимов-ф5у 2 ปีที่แล้ว +1

    thanks!

  • @BullishBuddy
    @BullishBuddy 2 ปีที่แล้ว +1

    👍👍

  • @vaibhavjain1914
    @vaibhavjain1914 3 ปีที่แล้ว

    Bruh in this video you are teaching code optimization but looking at your choice of wearable I feel I am learning how to assassinate enemy but amazing video 😀

  • @ikramu5719
    @ikramu5719 4 ปีที่แล้ว

    Thank you for that explanation. Neat solution with the zip and slices too! ps The link for the movies file is now out of date though.

  • @blanky_nap
    @blanky_nap 6 ปีที่แล้ว +3

    Great video!

  • @adityakushwaha3654
    @adityakushwaha3654 3 ปีที่แล้ว

    But how do you know which code will be more efficient wrt present code ?

  • @Jure1234567
    @Jure1234567 3 ปีที่แล้ว

    Can I do it with wxwidgets classes and multithreading?

  • @parietal100
    @parietal100 6 ปีที่แล้ว +1

    Thank you Sebastian

  • @alvaromartin6301
    @alvaromartin6301 4 ปีที่แล้ว +2

    Excelente Content! New sub.

  • @7aygames35
    @7aygames35 3 ปีที่แล้ว

    The 22 people who disliked are those who were writing bad code and when it was pointed out to them, they just got angry

  • @alishermatkurbanov9205
    @alishermatkurbanov9205 5 ปีที่แล้ว +1

    What if the list has more than 1 duplicate, e.g. [1, 3, 1, 4, 1, 4, 4, 5] -> sorted [1, 1, 1, 3, 4, 4, 4, 5] -> zipped smth like this [(1, 1), (1, 1), (1, 3), (4, 4), (4, 4), (4, 5)], so 1 and 4 will be added to duplicates twice. Doesnt duplicates should be list with unique items?

    •  5 ปีที่แล้ว +2

      That's correct. Triplicates will end up twice in the list of duplicates, which may not be what you want. An easy trick to get around that would be to use a set comprehension (th-cam.com/video/uTUV2eONqSQ/w-d-xo.html), rather than a list comprehension. Because sets by definition consist of unique items.

  • @JakobRobert00
    @JakobRobert00 3 ปีที่แล้ว

    Thank you, this helped me so much :)

  • @vyl6781
    @vyl6781 5 ปีที่แล้ว +2

    Saved my sanity.

  • @TheFilipo2
    @TheFilipo2 6 ปีที่แล้ว

    Thank you, this was super helpful!

    •  6 ปีที่แล้ว

      Good to hear!

  • @Excess-qn7qh
    @Excess-qn7qh 3 ปีที่แล้ว

    does the @profile annotaion only work with jUpiter?

  • @sashkazayebashka
    @sashkazayebashka 6 ปีที่แล้ว

    Great video/ Thank you man!

  • @drewduncan5774
    @drewduncan5774 6 ปีที่แล้ว +35

    12:54 Quadratic, not exponential.

    •  4 ปีที่แล้ว +1

      NO it's in O(n*ln(n)) because of the Sort()

    • @stephenaiesi6073
      @stephenaiesi6073 4 ปีที่แล้ว +4

      With Big O notation we are really ony concerend with the term with the highest power. An algorithm on the order of O(3x² + 2x + 11) is usually reduced down to to O(3x²). I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion.
      So in terms of Big-O, an algorithm on the order of a quadratic equation is usually considered to be on the order of its highest term.
      If you think about comparing two algorithms, one operating at O(3x² + 2x + 11) to one that runs at O(3x²), let's see how different they really are:
      So given the following equations:
      f(x) = 3x² + 2x + 11
      g(x) = 3x²
      Let's see how they correspond given a single input (n=1)
      f(1) = 16
      g(1) = 3
      The ratio between these two results is 5.33 and would go to show that quadratic and exponential are not swappable in this context
      Now lets scale it to 100 inputs, n=100
      f(100) = 30211
      g(100) = 30000
      Now they are operating at a ratio of 1.007. Not identical, but damn near close dependng on the precision needed. In terms of making algorithms efficient with computers 100 inputs is not considered much anyways.
      Now let's scale it to 1,000,00 inputs
      f(1,000,000) = 3000002000011
      g(1,000,000) = 3000000000000
      Ratio of 1.00000066
      The difference in comparing these two with without the extra terms is often negligible when comparing them to algorithms on the order of a different exponential power.
      Run the same exeriment with comparing f(x²) and g(x³), with and without extra quadatic terms and you can see that dropping the lower terms, though not exact, is definitely enough to compare the efficiency of algorithms.
      So as the size of the inputs grows, paritculary towards quantities where optimization is necessary, we are usually dealing with such vast amounts of data that including the lower terms of the quadratic formula in our assessment of an algorithms efficiency does not necessarily provide extra insight.
      ps: i'm fully aware this isn't the case in every domain, but it is for the most part how it is done and definitely applies to the kinds of problems in this video.

    • @_treed1
      @_treed1 4 ปีที่แล้ว +1

      Lol these comments. It's a loop in a loop which is n * n so n^2 tadah

    • @__gavin__
      @__gavin__ 4 ปีที่แล้ว +1

      @@stephenaiesi6073
      > I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion.
      Big O notation has a formal mathematical definition. A function f(x) is said to be O(g(x)) if |f(x)| =x_0, where A and x_0 are some constant values. Hence, when considering big O notation, it really doesn't matter if you drop the 3 or not. If f(x) = O(3x^2) then all we are saying is that there exists some x_0 such that for all x>=x_0, |f(x)|

    • @calebmunuru3598
      @calebmunuru3598 4 ปีที่แล้ว

      Stephen Aiesi Thanks mate. This is a really good explanation

  • @emasmach
    @emasmach 5 ปีที่แล้ว +1

    Nice. Excelent.

  • @dhananjaykansal8097
    @dhananjaykansal8097 5 ปีที่แล้ว +1

    YOU ARE JUST AWESOMEEEEEE

  • @graycybermonk3068
    @graycybermonk3068 5 ปีที่แล้ว

    You will kill me. Really Awesome.

  • @DragonRazor9283
    @DragonRazor9283 3 ปีที่แล้ว

    from 6 seconds to 0.007 seconds wow!

  • @fcoignmo
    @fcoignmo 6 ปีที่แล้ว

    Where did you get the "movies.txt" file (link)? Thank you for the vide, great work.

    •  6 ปีที่แล้ว +1

      My reply is a bit late, but I got this data from here: osf.io/r73y9/

  • @vanglequy7844
    @vanglequy7844 3 ปีที่แล้ว

    13:30 Who else pause the video and challenge yourselves? But beware of the tendency to jump into redesign the solution before profiling.

  • @naughtybuddha3942
    @naughtybuddha3942 4 ปีที่แล้ว

    Where is the movies.txt? Please provide it, thanks.

  • @deepak1725
    @deepak1725 5 ปีที่แล้ว

    Very Very nice

  • @chunceywei8284
    @chunceywei8284 6 ปีที่แล้ว +1

    Thank you

  • @pushpendrasingh1819
    @pushpendrasingh1819 6 ปีที่แล้ว +2

    Bro it would give duplicates result if our file is containing movies that are there in file more than 2.
    Ex. movie.txt
    Hello
    Hello
    Hello
    then your code will print hello atleast twice.

    • @sailalmishra4860
      @sailalmishra4860 5 ปีที่แล้ว +1

      Hey, Hope you understand this is for demonstration purpose. Would be good to concentrate on the technique rather than logic. u can probably go about refining the logic.. A solution is to use set and then extract elements with more than 1 occurance.

    • @Daimakaicho
      @Daimakaicho 4 ปีที่แล้ว

      you can use this: `duplicates = [name for name, count in Counter(movies).items() if count > 1]` instead of that zip zip method also this one doesn't require the list to be sorted remember to `from collections import Counter`

  • @nikithar3628
    @nikithar3628 6 ปีที่แล้ว +1

    Awesome

  • @xspager
    @xspager 5 ปีที่แล้ว

    Awesome explanation but when you removed the function you also changed the way you do the searching, you stopped looping over all the movies and used the "in" operator

  • @ВикторДзеба
    @ВикторДзеба 2 ปีที่แล้ว

    May you give us the movies.txt file please???

  • @bhupendram5212
    @bhupendram5212 2 ปีที่แล้ว

    Thanks a lot, but the duplicate logic is wrong. It will fail if the movies list has 3 or more consecutive duplicate movies.