The Fastest Way to Loop in Python - An Unfortunate Truth

mCoding

มุมมอง 1 406 296

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 18 ธ.ค. 2020
What's faster, a for loop, a while loop, or something else?
We try several different ways to accomplish a looping task and discover which is fastest.
― mCoding with James Murphy (mcoding.io)
Source code: github.com/mCodingLLC/VideosS...
SUPPORT ME ⭐
---------------------------------------------------
Patreon: / mcoding
Paypal: www.paypal.com/donate/?hosted...
Other donations: mcoding.io/donate
BE ACTIVE IN MY COMMUNITY 😄
---------------------------------------------------
Discord: / discord
Github: github.com/mCodingLLC/
Reddit: / mcoding
Facebook: / james.mcoding
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 2.2K

@AlexPadula 3 ปีที่แล้ว ⁺⁷⁶²⁰
You know, if the time difference between a while loop and a for loop matters to your application, Python might not be the most appropriate language.
@abebuckingham8198 2 ปีที่แล้ว ⁺³²⁹
Comparisons are expensive and iterating is cheap. That's true in assembly, C and Python.
@firstname4337 2 ปีที่แล้ว ⁺¹⁰⁰
WRONG -- he just showed that numpy works fine
@HiltownJoe 2 ปีที่แล้ว ⁺⁶¹⁸
@@firstname4337 Yes, because if you use numpy you use C. That is the whole point of numpy, to not use python for operations that are repeated millions of times.
Alex' point stands, because in the numpy case we use Python for everything that is not time sensitive, but for the thing that is time sensitive we use a small C monster wrapped in a thin fluffy Python blanket.
@boredhero 2 ปีที่แล้ว ⁺⁷
@@HiltownJoe Have you heard of PyPy. I ran his source file in pypy3 and got less than a quarter of a second for all methods except sum generator and sum list comp (0.72 seconds for those)
@coreylazy 2 ปีที่แล้ว ⁺¹⁸⁴
Clearly not a protip. A professional understands the constraints of their environment and the project they're working on. The choice of programming language is not (always) up to the individual writing the application.
@nathandecena7694 3 ปีที่แล้ว ⁺⁵⁷⁸⁷
"The Fastest way to Loop in Python is not to Loop in Python" Great sentence.
@karx11erx 2 ปีที่แล้ว ⁺³¹
Talk about understanding the letters, but not the meaning.
@samuelhulme8347 2 ปีที่แล้ว ⁺⁸³
To loop,or no to loop that is the question
@nathandecena7694 2 ปีที่แล้ว ⁺⁹
@@samuelhulme8347 Hahaha DEEP
@mohamedfilali977 2 ปีที่แล้ว ⁺¹
@@samuelhulme8347 ok, nice job, now im lost ..lool
@Nerketur 2 ปีที่แล้ว ⁺²²
"a strange language. the only winning way is not to use it.
how about a nice game of chess?"
@peterwilson8039 ปีที่แล้ว ⁺¹¹⁸⁴
I'm a physicist and my experience is that when you spend the majority of your time writing the code, and running it takes only a few seconds, reducing the development time is much more important than reducing the run-time, and that's why I like Python.
@mCoding ปีที่แล้ว ⁺²⁵²
So true, and something many eager-optimizers need to hear! They will certainly learn from experience as they discover where they spend their time.
@yutubl ปีที่แล้ว ⁺²²
Hi Peter, well I've been a software developer delievering a (Basic) scriptable technical measurement data PC-application NI DIAdem, and we put most of our development effort into giving our users like you (engineers & physicists) optimized script commands, which behave similar to that of Python. So using already implemented loop command may give you better performance than own loops as described here.
@FlanPoirot ปีที่แล้ว ⁺²⁵
that's the thing tho, they're not mutually exclusive, you can have a very easy to write and a decently performant language (look at julia)
why python is slow is not bc it can't go faster or bc it's a sacrifice for expressiveness (as there are python implementations that are way way faster but that lack total compatibility with libs)
The reason why python is slow is because it's poorly designed. It uses linked lists for lists instead of vectors (log of n vs log of 1), it compiles to an intermediate language but it leaves it at that, there's no JIT in Cpython, the global interpreter lock impedes people from writing super performant code, etc, etc
Python is slow bc of the choices they've made over the years, they're consistently made choices that affect performance, they never took the time to think about it and see how they could implement abstractions that would not take a big toll on performance, they just added things on top as they saw fit.
But that's not a big surprise once you see that python was a pet project of Guido and he just made it bc he could and people happened to pick it up and spread it (even tho it's performance and features were worse back then)
@CottidaeSEA ปีที่แล้ว ⁺³⁷
You're using Python for simpler scripts, like what it was made for. That's a valid use case. Making large systems in Python is just stupid for the simple reason that it will be expensive to run.
@spl420 ปีที่แล้ว ⁺³
That's depends on task, but when we talking about science - it's 100% true.
@tomaskot9278 3 ปีที่แล้ว ⁺¹⁰⁹⁶
Fun fact: A good C compiler would detect this common pattern (summing sequential numbers) and replace the for loop with the direct calculation of the result. And if even the number of elements was hard-coded, the compiler would even simply replace the whole code with the hard-coded numeric answer.
@Czeckie 3 ปีที่แล้ว ⁺³³
I always wondered what's the point of this? Are series we have summation formulas for really coming up in a real software? As far as I can tell the only application is to show it in a presentation about optimizing compilers. And frankly, this is even a wrong showcase of optimization, cause the useful techniques are general and pliable enough for compiler to reason about. This is just a hardcoded nonsense.
@vinno97 3 ปีที่แล้ว ⁺¹⁴⁵
@@Czeckie this exact problem isn't common. But good compilers can recognize loads of (small) patterns in a codebase that, when combined, can amount to a large speedup. Sometimes some optimized chunks can again be combined into a new, smaller and/or faster, chunk.
@joestevenson5568 3 ปีที่แล้ว ⁺⁶⁵
@@oj0024 Yes, if you write shitty enough code the compiler cannot fix it for you because it can’t make assumptions about what you’re trying to accomplish. This is not a revelation or even a shortcoming, its a feature.
@oj0024 3 ปีที่แล้ว ⁺³⁰
@Matt Murphy He is right that compilers can do it, but if we look at the two most modern compilers right now only one of them can detect it and only if you write the code in a non-obvious way.
The second part showed that compilers don't do it with constants either, they just execute the code internally. I'm not saying that compilers aren't doing great optimizations, just that the claim that "A good C compiler would detect this common pattern" is wrong.
Especially since such optimizations don't gain anything (I'm talking about the pattern transformation, not the constant evaluation). They can only be done in very few cases, and you usually assume that the programmers know what they are doing. (e.g. the only reason to implement the summation code would probably be for benchmarking purposes)
@oj0024 3 ปีที่แล้ว ⁺¹⁷
@@joestevenson5568 clang didn't detect the pattern when using a normal for loop, it only detected it if you write the non obvious and usually not used while(n--) loop.
@daifee9174 3 ปีที่แล้ว ⁺⁷⁹⁹¹
pro tip how to boost python: use C
@gutoguto0873 3 ปีที่แล้ว ⁺⁴⁹⁶
or C++, either choice is good.
@embeddor2230 3 ปีที่แล้ว ⁺³⁶⁸
C++ should be the better option here, cause of the OOP support.
@mCoding 3 ปีที่แล้ว ⁺¹⁴³³
Or any compiled language, really.
@Jonas_Meyer 3 ปีที่แล้ว ⁺⁷⁰⁶
Assembler the only real choice.
@leopoldbroom7679 3 ปีที่แล้ว ⁺³³³
@@Jonas_Meyer You're not a real programmer until you can program in assembly
@FaranAiki 3 ปีที่แล้ว ⁺⁷³⁹
What I did not know about this video, is, 100_000_000.
I did not know if python has a feature to add underscore between number.
Thanks, anyway :)
@mCoding 3 ปีที่แล้ว ⁺³³⁶
I add secret lessons to all my videos!
@lawrencedoliveiro9104 3 ปีที่แล้ว ⁺⁵³
Added in Python 3.6.
@monojitchatterjee3185 3 ปีที่แล้ว ⁺¹⁵
Same! I didn't know this too!
@loutragetadk453 3 ปีที่แล้ว ⁺⁶
Same
@rogervanbommel1086 3 ปีที่แล้ว ⁺⁴
Same
@julianmahler2388 2 ปีที่แล้ว ⁺²⁰⁴
5:08 It's not about NumPy being primarily written in C (so is CPython), but about Python' dynamic vs. NumPy's static typing. In a dynamically typed language like Python, every time the interpreter encounters an expression like c = a + b, it will first have to check a's type (and find out it's an integer). Then it checks b's type (and finds out it's an integer too). Then it checks if there's any known operation to add (+) two integer types (of course there is). If you do this one million times in a row (e.g. in a loop), the same checks will be performed over and over again. It's like asking "Is a still an integer? ... What about b? ... Do I know how to add two integers?" one million times in a row. And of course all of that happens at runtime. That's why Python is so damn slow in that regard.
@dariuszspiewak5624 ปีที่แล้ว ⁺¹⁵
Yeah... that's the price you pay for flexibility. But there are ways to overcome this. They all require some kind of hinting that the structures you use contain only elements of the same type.
@sumnerhayes3411 ปีที่แล้ว ⁺³⁸
> In a dynamically typed language like Python, every time the interpreter encounters an expression like c = a + b, it will first have to check a's type (and find out it's an integer). Then it checks b's type (and finds out it's an integer too). Then it checks if there's any known operation to add (+) two integer types (of course there is).
A dynamically typed language implementation doesn't *have* to do things this way, that's just the naïve approach (albeit the easiest approach out of the box for new implementations). A smarter specializing JIT can create precompiled versions of functions and expressions for commonly used types (e.g. integers), and can determine that they are invariant and avoid having to re-check the types each time you hit the expression.
The default CPython implementation does things the way you describe, but alternate implementations like psyco and PyPy take the specializing approach in their JITs.
Not surprisingly, PyPy executes these for and while loops on the order of 50-100 times faster than CPython does on my machine (and ~10 times faster than numpy).
$ python3 benchmark.py
while: 5.254076757992152
for: 2.9640753349813167
sum_range: 1.1016407490242273
sum_numpy: 0.6697274879843462
$ pypy3 benchmark.py
while: 0.05818561598425731
for: 0.07444041498820297
sum_range: 0.07667819698690437
@abhishankpaul ปีที่แล้ว
I hope this problem can be partly solved if people declare data types of nearly all the variables as they need to do in C before hand like I do most of the time
@kylehart8829 ปีที่แล้ว ⁺²
@@sumnerhayes3411 There are tons of other reasons why dynamic typing is pointless and bad. It saves you having to *gasp* tell the program the type of a variable one single time; and it costs you dearly. All python projects should enforce static typing, even though that doesn't gain performance. All languages should be statically typed, period, because it is infinitely more readable and provides no development benefits at all.
@grawss ปีที่แล้ว ⁺²
@@kylehart8829 I would have disagreed prior to learning Python, back when it was just dogma in my head. Interesting how that works.
However, the learning process was made easier by removing a step. I think a better solution would be to make it extremely clear how to toggle Python into enforcing static typing, and why that's a huge benefit.
@kitrodriguez992 3 ปีที่แล้ว ⁺⁴⁴⁸
I'm in my 3rd year in Computer Science and trust me, I've watched SO many youtubers that do this and you're by far the smallest for an unknown reason. You explain things really well, you don't beat around the bush and overall just make quality content. :D I'm now subbed and I'll be coming back for more :D
@mCoding 3 ปีที่แล้ว ⁺³³
Thanks so much for the kind words!
@deedewald1707 3 ปีที่แล้ว ⁺¹
Excellent content and testing summaries !
@krele14 2 ปีที่แล้ว ⁺³
Then again, he's the only one that would place "just know your result ahead of time lul" as a viable suggestion.
@NickByers-og9cx 3 ปีที่แล้ว ⁺⁸⁶⁵
You want to be careful when timing functions, your processor likely speeds up or down based on system load and ambient temperature changes. Although you can still capture order of magnitude changes very easily like you did. If you're on a laptop though where cooling and power are limited, take extra care, and run multiple simulations in different orders at random times!
@mCoding 3 ปีที่แล้ว ⁺⁴⁹⁰
I cannot stress this enough! Benchmarking is such a difficult topic even though it seems so simple.
@luziferius3687 3 ปีที่แล้ว ⁺⁹¹
One thing you can do is to disable CPU boost clocks during benchbarks. This way, you get stable clocks regardless of load created by the tested function. For example on Linux, you can just write 0 to /sys/devices/system/cpu/cpufreq/boost to disable the boost or write 1 to enable it again. This’ll get you much more stable results that are comparable even if ambient temperature changes.
@NickByers-og9cx 3 ปีที่แล้ว ⁺²⁹
@@luziferius3687 that's a good approach! Myself I usually try to do perf stat whenever possible, and count the actual CPU cycles it takes. Run each command say 3 or 5 times and take the median!
@Teriton 3 ปีที่แล้ว ⁺⁴⁶
@@mCoding Obviously the best way is to time it yourself with a stopwatch
@MisterAssasine 3 ปีที่แล้ว ⁺⁴
imo it would have been better to use a smaller n in the function and a higher number of runs
@SteinGauslaaStrindhaug 3 ปีที่แล้ว ⁺²⁸⁸
Half through the video I thought: actually for this simple sum you don't even need to loop just the formula. And you didn't disappoint.
While this might seem like a very synthetic example which won't often be possible in real life, you'd be surprised how many times I (a programmer who is not even very good at maths) have been able to replace large loops and delete large chunks of code and replace it with a simple formula.
Not usually this clean and simple, but quite often you find that the code is looping through a lot of data calculating stuff and then filtering out stuff, and if you simply rearrange things you might be able to do the filtering before the loop which might even reduce the loop to a few statements without a loop. Very often this is not obvious in the code but very obvious when sketching out the desired results on paper.
Pencil and paper is definitely not obsolete.
@mCoding 3 ปีที่แล้ว ⁺¹⁰⁰
Indeed! Never underestimate math! The feeling of deleting 30 lines of scratch and replacing it with 3 lines of math is indescribable.
@OatmealTheCrazy 3 ปีที่แล้ว ⁺⁴⁴
@@mCoding it's the moments when you're lying in bed and it kinda hits you out of nowhere that are the best
Of course, it is kinda a let down when the divine revelations are actually wrong though
@ttt69420 2 ปีที่แล้ว ⁺¹³
Doing it on paper with a pencil probably triggers a different part of your thinking process from all the years of doing math that way.
@delqyrus2619 2 ปีที่แล้ว ⁺⁸
You can actually be even faster than with math, if you know how CPUs work. If you instead of calculating (n * (n -1)) // 2 just calculate (n * (n -1)) >> 1 it should (in theory) be a few clockcycles faster. While most math operations are pretty trivial for the CPU(+, -, *) division is a pretty annoying task, so you might spare some time with just bitshifting.
@timewave02012 2 ปีที่แล้ว ⁺⁹
@@delqyrus2619 Bit shifting is math too. No different than multiplying and dividing by powers of ten being easy in decimal. A good compiler will optimize it though, so it's probably best not to bother.
@opticalmoose8091 2 ปีที่แล้ว ⁺⁷⁵
Sometimes it’s just impossible to express something in numpy - iterative algorithms, for instance. I had a very hot piece of code, which implemented an A* pathfinding algorithm, which was slowing my entire program down.
I looked for solutions and found a package called Numba - it allows you to decorate a single function and it would try to compile it the best it can. There are various optimization settings to consider, and in order to get *significant* improvements in performance (like, order of magnitude and more) you basically had to rewrite your program as if you were writing in C, but using Python’s syntax - so, no dynamic typing, using very limited subset of built-in types and refrain from using custom classes as much and possible.
But it didn’t require too much work, and it was a great benefit in the end - I ended up with a blazing fast function without needing to write it in actual C (not very hard) and then figure out how to couple this with Python (quite finicky if you’ve never done it). So yeah, I’d recommend looking at Numba if you need a more general solution.
@NJ-wb1cz ปีที่แล้ว ⁺¹
Or just ditch python and go for java or other jvm languages or go or rust etc
@pavelperina7629 ปีที่แล้ว ⁺⁴
@@NJ-wb1cz Eh, rust is likely one of the hardest languages to learn (after "modern" C++ with template metaprogramming). Julia seems quite easy to learn and pretty fast (if strongly typed it's close to C, maybe twice or three times slower, but not two orders of magnitude like Python). On the other hand I'm not sure how to structure code which is too complex for a single file and too simple to put it into package. It was basically developed for data crunching.
Also year or two ago Rust was lacking some libraries, especially for plotting graphs (C++ lacks them too), painting (Cairo) and even exporting data as images (no support for writting of 16bit PNG files)
@julians.2597 3 หลายเดือนก่อน ⁺²
@@NJ-wb1cz right, who would do something as stupid as just adding another library as dependency when they could add an entirely different development stack just to optimise one function.
@NJ-wb1cz 3 หลายเดือนก่อน
@@julians.2597 why is adding libraries stupid? That's a strange stance. Sure, moving to another language is an option, but you'll likely use libraries there as well.
@CodeDisease 3 ปีที่แล้ว ⁺⁴¹⁵
“Python is an incredibly slow language”
You just insulted my entire race of people...
but yes.
@CodeDisease 3 ปีที่แล้ว ⁺⁴
I guess I’ll be first then...
@suryanshthakur5820 3 ปีที่แล้ว ⁺⁶
Your guess right
@soupnoodles 2 ปีที่แล้ว ⁺⁸
Its not incredebily slow, compared to others its slower yes
but it can acomplish the tasks its meant for with no issues
Machine learning, Automation, Data Science and Cyber Security
@CodeDisease 2 ปีที่แล้ว ⁺⁴
@@soupnoodles Yes but you can do all of that much faster with C, you're better off using C.
@CodeDisease 2 ปีที่แล้ว ⁺¹
@@soupnoodles Also why did you like your own comment?
@saivardhanchowdary7918 3 ปีที่แล้ว ⁺³⁰⁸
The TH-cam algorithm got you mate. All the best!
@mCoding 3 ปีที่แล้ว ⁺²⁷
Thanks for the support!
3 ปีที่แล้ว ⁺²²³
This is the reason why python is such a good programing language. You can write your program with easy and pretty syntax and use packages wrapping C code to do the complex and time consuming tasks. This just shows that you can use python and have fast running code.
I would also add that if your program's only focus is speed and performance then don't use python, simple as that.
@mCoding 3 ปีที่แล้ว ⁺¹⁵⁸
People too often leave out the speed of development in their calculation of speed!
@waldolemmer 2 ปีที่แล้ว ⁺¹³
Python isn't special in this regard
@imrobbinganyonewhotalkstom4881 ปีที่แล้ว ⁺⁴
People oftentimes forget lua.
@HydratedBeans ปีที่แล้ว ⁺⁸
A faster processor is cheaper than another developer to speed up development. That’s why I use python.
@NJ-wb1cz ปีที่แล้ว ⁺¹
@@HydratedBeans unless you depend on oython-specific libraries I don't think the development speed is noticably greater compared to, say, kotlin. And a loosely typed language will always have more mistakes and harder to statically check
@sploofmcsterra4786 ปีที่แล้ว ⁺⁴
This was a valuable lesson to learn when I was coding a neural network. I wasn't aware of this fact and was trying to parallelise, before realising numpy automatically did that. Turning everything into vector operations led to a speed increase that was crazy.
@deemon710 2 ปีที่แล้ว ⁺³⁴
Oh snap! I thought I was saving on resources by not using additional modules like numpy but now I see it's ridiculously faster. I wasn't aware pre-built functions were that much better either. Thanks for the reveal!
@eric_d ปีที่แล้ว ⁺¹⁰
He actually did say in the video that Numpy uses a lot more resources, and with a bigger number, you may not have enough memory. Your original thought was correct, that you do save resources by not using additional modules. You may save time, but not resources. Also, if you compile your code, those additional modules will need to be compiled in, which will make the executable larger, and require more resources as well. As we are quickly approaching the limits of how far we can go with hardware (CPU technology is pretty much capped out, unless someone figures out how to make atoms smaller), we need to be more mindful of the amount of resources we use. Programmers need to stop being lazy. They have to stop thinking that they can bloat their code with hundreds of modules because CPU's will be faster by the time their code is in use.
Just because speed and bandwidth are available, doesn't mean they should be used. For example, I know of some websites that include 50-100MB of Javascript code for each page view. Sure, dial-up is a thing of the past in most parts of the world, and people have faster internet connections than ever before. The page loads pretty fast on a gigabit connection. What about the people who have much slower connections? Is it really necessary to design your website where every single page view uses an entire day's worth of bandwidth on a dial-up connection? If you have several pages of the same website open in different browser tabs, your memory utilization goes through the roof! I've seen a single browser tab on one of those websites using over 2.5GB of RAM! I haven't even checked how much unnecessary Javascript is downloaded when using something like Facebook, but I've seen a single tab with Facebook open using over _5GB RAM_!!!
This kind of waste needs to stop. Even if it does save a second or two here and there, it could cost several more seconds (or longer) in other areas. I've always hated importing modules in any language I've ever programmed in. I would much rather write my own code to do what I need done, whenever possible. I realize that's not always possible, but it's always better than adding bloat to a program that's not needed. Look at what's known as the "demo scene". It's something that started LONG ago, where people try to make the most exotic program possible in assembly language that fits within a 64KB executable. People have fit 10 minute long full screen videos with music into 64KB! They have created complete first-person-shooter games in 64KB! These programs run plenty fast on hardware from over 30 years ago! If more of todays programmers took the time to write good code, we wouldn't even need computers as fast as we have.
Sorry, that started off as a 1 line reply, and ended up going 100 directions. There's still soo much more I want to say about the topic, but who is going to read it anyway?
@moustafael-shaboury2659 ปีที่แล้ว ⁺³
@@eric_d I will! I'm very new to programming and I really appreciated reading that. Thank you for the reminder :)
@hemanthkotagiri8865 3 ปีที่แล้ว ⁺⁵²
To the point, quality content. Subscribed!
@mCoding 3 ปีที่แล้ว ⁺⁶
Thanks for the support!
@GuilhermeFArantesSouza 3 ปีที่แล้ว ⁺²⁵⁴
"The best way to loop in python is to not loop", hahaha, loved it.
Also got curious about doing the numpy sum but with the built-in range (because of the memory problem), how much would that impact the performance?
@mCoding 3 ปีที่แล้ว ⁺³⁵
In my experience, I've never hit a RAM limit because of this, but in memory constrained systems this is definitely something to watch out for. I wish numpy has something like std::ranges::iota to use as an array-like.
@aquilesviza5550 3 ปีที่แล้ว ⁺⁶
@@mCoding There is a library called dask, is used in distributed computing and has support for numpy arrays. working with blocks consume less memory but might be a bit slower.
@tonyflury2291 ปีที่แล้ว
i think numpy.arange(...) creates a potential memory issue (it allocates the a large array in this case), but the Python3 builtin range(...) doesn't allocate a large array - it is a lazy iterator that simply tracks the last value generated so it knows what to generate next.
@DeuxisWasTaken 6 หลายเดือนก่อน ⁺⁸
Very good video. Note: NumPy isn't just C, it's highly optimised C with SIMD capabilities. The only way you can go faster than that is with GPGPU like OpenCL, CUDA, Vulkan or Metal. (And even that isn't assured since GPGPU has initial overhead.)
@npip99 3 ปีที่แล้ว ⁺⁸
Fun fact: If you write this in C, gcc/clang will automatically optimize the for-loop sum into n(n-1)/2 for you.
@underfilho 2 ปีที่แล้ว
really?
@cottawalla 2 ปีที่แล้ว ⁺⁶
It was the one addition statement executed many times that slowed the python "for" loop. The conclusion is that looping can be very fast but executing python statements anywhere in the code is very slow.
@chrisray1567 3 ปีที่แล้ว ⁺⁵
Great video. I’ve always found list comprehensions to be faster than the equivalent for loop or while loop.
@simonmultiverse6349 3 ปีที่แล้ว ⁺³
There is a language called Lython, so you don't try to loop in Python; instead you poop in Lython.
@Connor-of6mu 3 ปีที่แล้ว ⁺¹
I really like your channel! These are very interesting insights into the Python language in small videos which makes them easy to digest. Keep up the good work!
@mCoding 3 ปีที่แล้ว
Awesome, thank you!
@aidanwelch4763 5 หลายเดือนก่อน ⁺¹
Thanks for this, I was trying to explain to people in JS that essentially always calling a native function is faster than writing a more optimized(in terms of Big-O) way of doing something. They were asking Big-O proof and didn't believe me that it's just not relevant. Looping through a string 10 times with a native regex check is faster than iterating through it once in pure JS.
@brianchandler3346 3 ปีที่แล้ว ⁺⁸
Expertly presented. I never would have thought to look at it where C was used behind the scenes. I will definitely try and make the better to worst progression habit when writing Python.
@laughtale1607 3 ปีที่แล้ว ⁺³⁶
That's why Gauss exists : simple use of (N-1)N/2 in this case
@mCoding 3 ปีที่แล้ว ⁺¹⁸
And he came up with this at such a young age too!
@bFix 3 ปีที่แล้ว ⁺⁸
wait isn't it: (N+1)*N/2?
@laughtale1607 3 ปีที่แล้ว ⁺⁷
@@bFix He takes sum to N-1 in the example
@Samsam-kl2lk 3 ปีที่แล้ว ⁺²
@@laughtale1607 Yes, so it is n*(n-1)/2
@laughtale1607 3 ปีที่แล้ว
@@Samsam-kl2lk yup
@supergamerfr 3 ปีที่แล้ว ⁺⁴
This channel is gonna blow up great work
@reprC 3 ปีที่แล้ว ⁺²²
Cool video. I’d be curious how `numba` stacks up in this, both the JIT and AOT variants
@youuuuuuuuuuutube 2 ปีที่แล้ว
Numba would make it closer to the unoptimized C/C++ code. I used it to render fractals and I got it to run 50x faster than unoptimized Python, and almost the same speed as the C code. The thing is ... that you can make the loop a lot faster in C too because you can use SIMD intrinsics, so you could do 4 or 8 or even more operations in a single instruction.
@tobiasgrun5116 2 ปีที่แล้ว
Just tested.. The JIT execution time is similar to the math version. To all Numba / LLVM developers: Great job!
@minghaoliang4311 3 ปีที่แล้ว ⁺⁶⁹
Can you do a comparison of recursion over loop?
@mCoding 3 ปีที่แล้ว ⁺³³
I have many video ideas in the works, this is definitely a contender.
@ikickss 3 ปีที่แล้ว ⁺⁵
Usually recursion is slower and takes up more memory. And many "big" problems cannot be coded recursively due to stack size limitations. But if compiler/interpreter can do tail recusion optimization, it's a very nice option. Unfortunately, not all problems are applicable to it.
@shukterhousejive 3 ปีที่แล้ว ⁺³
The CPython interpreter doesn't do tail recursion so even if you write it out correctly it's still faster to loop, save the recursion for when you forget how to convert your code into a loop correctly
@sbwlearning1372 3 ปีที่แล้ว
@Sec Coder Check out Lisp or Prolog. The recursion is conceptual in terms of the problem or model.
Imagine a family tree. The definitions and relationships are recursive. Mom and Dad produce offspring each of which in turn can become Mom's or Dads who produce offspring.
Comes into its own when these Relationships are highly complex ( Imagine every leaf on a real tree was in fact another tree and so on and so forth. The Mantlebrot set is recursive )
Essentially they become huge lists or recursive data structures.
Those languages became prioritized and optimised around lists and recursive data structures.( With a few constructs and language primitives not found in the likes of C or Pascal such as "is a type of "
"is connected to" "is related to", "is derived from" etc etc )
Pythons modelling capabilities Tuples Lists etc etc Means most of it can now at least be modelled easily before being translated to something else for speed.
Ps for fun check out Occam . It's editor was indent sensitive just like Python
It also let you mix sequential and parallel programming .!!!! Oh lordy 😁
@fat_pigeon 3 ปีที่แล้ว
@Sec Coder It lets you express common algorithms in a purely functional way, i.e. without needing to mutate variables. Doing so makes your code easier to analyze mathematically and makes it more composable.
@dwarftoad ปีที่แล้ว ⁺³
I think numpy also will generally, if it can, use parallel CPU operations (SIMD) to process the array, which is probably why it's faster than standard sum() (and also why it makes sense to create the array of data with arange() even if each value could be calculated in the loop. But definitely good to be aware of this memory requirement.)
@christianhill8651 3 ปีที่แล้ว ⁺²
You just convinced me to learn algorithms and functions. This was an amazing illustration.
@Jcewazhere 3 ปีที่แล้ว ⁺²
In my one of my programming classes we learned about loop unrolling. This is much more handy to know about. :)
@dddbra1748 3 ปีที่แล้ว ⁺⁴⁶
I think your channel deserves more subscribers.
@mCoding 3 ปีที่แล้ว ⁺⁵
I appreciate that!
@kenhaley4 3 ปีที่แล้ว ⁺²⁰
Another reason the numpy approach is faster: The elements of a numpy array must all be of the same type, and there's no overhead to check (and convert to numeric, if necessary) the type of each array element.This probably contributes much more to the increase in speed than the language difference (C vs. Python.)
@vishnukumar4531 2 ปีที่แล้ว
Other than that, the sum itself can be computed parallelly and fused together later, thus we are not forced to iterate serially element by element!
@dmitripogosian5084 2 ปีที่แล้ว
In compiled languages loops are often optimized on compilation stage, such as being unrolled where necessary, with particular emphasis on efficient cache/prefetch from memory utilization. One should look on original Fortran code of LAPACK routines where some loop optimizations were done 'by hand'
@shinobuoshino5066 5 หลายเดือนก่อน
@@dmitripogosian5084 loop unrolling is never efficient when cache and prefetch is the topic unless you explicitly wrote "from 0 to 4" just to avoid copy pasting code 4 times by yourself, and even that may not be worth unrolling depending on body of the for loop which most compilers can decide by themselves if you do profile guided optimization pass. There's a reason why -O2 more performant for most software than -O3 and unrolling is one culprit of that.
@tharfagreinir ปีที่แล้ว ⁺⁴⁵
There's a fun story about that summation formula. Apparently mathematician Carl Friedrich Gauss came up with it at a young age when his teacher set his class the task of adding up all the numbers from 1 to 100 and was very surprised by Gauss' solution. Regardless of the veracity of the story, the formula is indeed often referred to as the Gauss summation formula. Mind you, a lot of things in math are named after Gauss as he was extremely prolific.
@magicmulder ปีที่แล้ว ⁺¹⁴
You left out one important aspect; the teacher wanted to keep the kids busy for a while and Gauss came up with the solution within minutes.
@thomas-sk3cr 7 หลายเดือนก่อน ⁺⁹
Another important aspect, the formula is actually (n+1)*n/2.
@MattDunlapCO 6 หลายเดือนก่อน ⁺¹
@@thomas-sk3cr Exactly. I was looking for this correction.
@adrianmizen5070 6 หลายเดือนก่อน
@@thomas-sk3cr which actually illustrates a pitfall of using external math formulas instead of letting the computer do the work, that they are going to be less clear about what the code is supposed to do. getting the wrong answer really fast is worse than getting the right answer slowly
@thomas-sk3cr 6 หลายเดือนก่อน ⁺²
@@adrianmizen5070 not sure if I understand you correctly. I do think that it's definitely better to use a formula if there is one. But it should be properly documented of course and you have to make sure that it's the right one. Maybe we can agree on that? I don't think you should reimplement well-known formulas with loops just because it's more clear to the reader at first glance :)
@ahmadhafian3785 2 ปีที่แล้ว ⁺¹⁷
It would be interesting to see how recursive function would perform
@danejohnson8657 2 ปีที่แล้ว ⁺⁴
Python doesn't have tail-call optimization, so you probably just overflow the stack.
@andreabradpitto 3 ปีที่แล้ว ⁺⁶
This is very informative and well presented. Also, it seems like you are very talented in explaining stuff, but this is only the first video of yours I have watched, so I'll have to do some further benchmarking ;)
Thank you!
@bartkl 2 ปีที่แล้ว ⁺⁸
Nice!
Question: if you increment i inside the for loop, won't that affect the amount of iterations (skip every other one)?
@mCoding 2 ปีที่แล้ว ⁺⁷
It actually does not affect the number of iterations. The iterator of range keeps its own internal counter that python uses to assign to i each iteration.
@incremental_failure ปีที่แล้ว
The range is built before executing the loop contents.
@TheSandkastenverbot 2 ปีที่แล้ว ⁺²
Thanks for the information! It's really helpfull to get a glance into the Python interpreter's inner workings
@TARS.. 3 ปีที่แล้ว
This channel is a gold mine, thank you so much for these. Keep them up!
@mCoding 3 ปีที่แล้ว
Thanks, will do!
@shingo371 3 ปีที่แล้ว ⁺⁷²
Awesome content, got very surprised when I saw your low sub count
@mCoding 3 ปีที่แล้ว ⁺¹⁰
I appreciate that!
@anb1142 3 ปีที่แล้ว ⁺¹
me too
@GuRuGeorge03 3 ปีที่แล้ว ⁺⁶⁶
Oh this is an amazing recommendation by youtube. I came to the same conclusion as you while having massive performance issues in my machine learning models in non obvious parts of the code. Once I noticed that it is the loops that are awfully slow I tried refactoring them as much as possible and ended up using a mixture of things u've shown here. I didn't even know or speculate that it's because of C but now everything is much clearer in my head. Thanks!
@mCoding 3 ปีที่แล้ว ⁺¹³
You should also check out Jax, Numba, and Cython for potentially enormous speedup for little effort if you are doing ML.
@robertmielewczyk4219 3 ปีที่แล้ว ⁺⁸
you're not supposed to do ML with for loops in theory you can and it's very undestandable if you do it that way but any solution using matrices/tensorflow will be faster. Like he says in the video numpy is fast
@Parthsean 3 ปีที่แล้ว ⁺²
This is probably the first coding channel that I looked up just for fun. Great work. Can you cover advance pythonic syntaxes and topics?
@mCoding 3 ปีที่แล้ว
I'm always open for suggestion! What do you want to see?
@Parthsean 3 ปีที่แล้ว
@@mCoding Nothing in particular just any fancy pythonic things which help improve performance and code readability.
Also, one thing you can do is make a mess of a code and start refactoring it. That would be amazing to see.
@googleuser4203 3 ปีที่แล้ว
A lot of important tips! Thank you. Subscribed!
@jetison333 3 ปีที่แล้ว ⁺⁴
Very informative video, keep at it and im sure you'll grow a ton.
@mCoding 3 ปีที่แล้ว
I appreciate that!
@gregorymorse8423 3 ปีที่แล้ว ⁺³
Did you run these tests with Cython? Probably bringing everything to C would have interesting results. The async model on the for iterator for example should not be faster than a while loop unless the async code were optimized out.
@TranquilSeaOfMath 2 ปีที่แล้ว
Nicely demonstrated and explained.
@fisch37 2 ปีที่แล้ว ⁺¹
Question: How are those rates when you precompile using py_compile to create a .pyc file?
@spacelem 2 ปีที่แล้ว ⁺³³
My own results were 6.06 seconds for the while loop, 4.25 seconds for the for loop, and 569 ms for the numpy loop.
I tried it in Julia (almost identical code except for the end tags), and the results were 1.3 ns for both the for and the while loop. Checking the lowered code it turned out it had optimised the loop away completely, so we got sum_math for free! So instead to force it to do the actual sum, I modified it to s += i * (rand() < 0.5), and it came in at 576 ms. So even with a call to rand() and a branch, Julia was still nearly 10x faster than raw Python, and was about on par with NumPy (which didn't have a call to rand() ).
If I force Julia to construct the range object with collect(1:n), then I don't need so that it matches NumPy, then it gets even faster at 290 ms, but now it allocates 763 MiB of memory, so clearly there are pros and cons of using range objects, although I'd stick with the range object for most usages.
So, if you like Python but you want C performance (and you like to use arrays a lot and don't like to add all the extra numpy syntax just to do that), maybe check out Julia.
@incremental_failure ปีที่แล้ว
Most like myself use Python for the libraries, nothing else comes close except for maybe C++. It would take a lifetime to recreate all the libraries in Julia.
@spacelem ปีที่แล้ว
@@incremental_failure there has a lot of ongoing work to make native Julia libraries (mostly in the scientific and statistical communities), but Julia can also call C, C++, and Fortran code, and the PyCall package already allows you to embed Python in Julia, so you're never left in the lurch if you choose to use Julia.
I've already seen Julia be hugely faster and more flexible when doing a lot of ODE modelling than when using R and calling deSolve which is written in Fortran.
@supermariobrosz8 3 ปีที่แล้ว ⁺⁵
Great video! Another idea is to time yield in functions and generator expressions in comparison to the traditional loops
@mCoding 3 ปีที่แล้ว
Great suggestion!
@ferociousfeind8538 2 ปีที่แล้ว
yes, graph the size of n against the time it takes different strategies to compute the sum >:3
make the computer dance for my entertainment!
@fillfreakin2245 ปีที่แล้ว ⁺¹
If you're using a counter within the while loop, then yes that counter is slower than the for loop. If you don't, and you use the while loop as intended and doing the same stuff inside the loop, then it is essentially the same speed.
def while_loop(n=100_000_000):
s = 0
while s < n:
s += 1
return s
takes about the same time as:
def for_loop(n=100_000_000):
s = 0
for i in range(n):
s += 1
return s
Use the right loop for the right task. If you have to put an incremental counter in your while loop then you're using the wrong loop.
@ssfjor หลายเดือนก่อน
Comparing the sum with the counter limit certainly won't give the correct result in s: We are calculating the sum of all integers and analyzing various ways to do that. I. e., if the limit where 10, with 1,2,3,4 we reach it, way before reaching the correct sum 55.
@thelazyrabbit4220 2 ปีที่แล้ว
Dayumn... TIL something new as well as some new trick. You definitely earned my sub!
@BartMassey-PO8 3 ปีที่แล้ว ⁺⁸
Rust with 64-bit numbers looks to be about 50× faster than numpy on my box (once I convinced the compiler to stop solving the problem at compile-time without losing optimization). Rust with BigUint bignums looks to be about 5x faster than numpy. The obvious for loop with pypy3 is about 12x faster than numpy. So there's still plenty of room for speedup.
@ColinBroderickMaths 2 ปีที่แล้ว ⁺⁵
Just wanted to note that using an iteration index in a while loop isn't really good practice anyway, since that alone would almost always mean a for loop is the correct pattern. So it's not really surprising that the while loop was slower in this experiment.
I'd also like to warn people that numpy is only faster if you are operating on large sets or large arrays of numbers. If you apply a numpy function to a single number, it's almost always much slower than the equivalent function in Python, presumably due to some overhead.
@rursus8354 ปีที่แล้ว
That isn't the point. The point is the efficiency of raw Python.
@dragonsage6909 2 ปีที่แล้ว ⁺²
Wow .. I learned about 5 new things from this.. using timeit is a great idea.. thank you!
@MrMonishSoni 3 ปีที่แล้ว
This Video is so awesome, never came across such a beautiful way to understand loop in such depth. Thank you so much for making this video. ❤❤❤💕💕💕
@mCoding 3 ปีที่แล้ว ⁺¹
Glad you enjoyed it!
@sergeyshchelkunov5762 3 ปีที่แล้ว ⁺⁵
Very informative. Excellent insight into Py inner workings. Great incentive to employ Numpy (and Multiprocessing) whener possible. Give a simple yet splendid example of why knowing math is important for any programmer who tries to use Py for data science type applications.
@logicprojectsspeed2023 3 ปีที่แล้ว ⁺¹¹
Great video. Commenting mostly to help the algorithm push this. Would have loved to see a comparison with straight C thrown in there given that C code was the source of the speedup. What's the cost of the python to C transitions?
@mCoding 3 ปีที่แล้ว ⁺⁵
A C compiler would probably compute the answer at compile time so there is nothing to compare! A Python to C transition would be a huge undertaking even for a small project.
@oj0024 3 ปีที่แล้ว ⁺²⁰
Here are the results on my pc:
python with for loop: 8.74s
c with gcc and no compiler optimizations: 0.235734s
c with gcc and full compiler optimizations: 0.048770s (gcc doesn't detect the pattern and actually executes the entire code)
c with clang and full compiler optimizations: 0.000001s (clang detects the pattern and optimizes it using the Gaussian formula)
@alejandromorales8784 2 ปีที่แล้ว
@@oj0024 wow, there is absolutely nothing to compare. I knew python was slow and C was fast, but I didn't know that the difference is to this extent
@tonyflury2291 ปีที่แล้ว
@@mCoding Nuitka will take Python code and transpile to C (which uses the Python standard Library to do the object iterations) - on some applications it is 2x faster than normal Python (and it is a work in progress).
@maxch.9135 2 ปีที่แล้ว
you are the only programmer that i pressed liked, subscribed and the bell icon :)
@AeryelleCat 2 ปีที่แล้ว ⁺¹
That is ridiculously clear, it blew me away. Straight to the point, extremely clear explanation, very methodical, and everything is shown perfectly. Just purely amazing tbh.
@TheParkitny 3 ปีที่แล้ว ⁺³
There is another option. Often you can use the numba library to get a speed boost over numpy.
@micirei 3 ปีที่แล้ว ⁺¹⁰
Your channel is great! I suggest you use a high pass filter for your voice to cut off the low frequencies that make your voice sound muddy, great job so far.
@mCoding 3 ปีที่แล้ว ⁺⁵
Thanks for the tip! I have no idea what I'm doing video editing so tips like this can really help improve my video quality!
@bva0 3 ปีที่แล้ว ⁺²
Cool video. I would add that numba and cython are also super fast alternatives.
@SanixDarker 2 ปีที่แล้ว
Good content !
Liked and subscribed !
@edwinjonah 3 ปีที่แล้ว ⁺⁵
Awesome video! A few days ago I was comparing the execution time on using a loop vs. the Gaussian formula, it's nice to see some other options in between.
@mCoding 3 ปีที่แล้ว ⁺¹
Thanks for the support!
@lb2040 2 ปีที่แล้ว ⁺¹
@@mCoding Very interesting video indeed, something I never really thought about … but the math-guy in me needs to say it: You got the formula wrong… 😬
@fanoflinoa6109 2 ปีที่แล้ว ⁺¹⁴
Great comparative video
The math solution if you include n in the addition is ((n+1)*n)/2
Because you used range() and the while loop was set to < and not
@aioia3885 2 ปีที่แล้ว ⁺¹
Flooring the division is very much recommended, because otherwise python will convert the result to a floating point number which may cause you to lose precision
@fanoflinoa6109 9 หลายเดือนก่อน
@kurogami2771 it does in this case as we are adding up a series of numbers from 0 to n, which by definition excludes negative values.
As for flooring, it is true that we should avoid using floats when not needed even for just one operation so we can either floor or int() the results.
@abebuckingham8198 หลายเดือนก่อน
@@aioia3885 Either n or n-1 is even so it's never a float. Using floor isn't required.
@aioia3885 หลายเดือนก่อน
@@abebuckingham8198
edit: "either n or n-1 is even so it's never a float" is not correct because in python division with / will always result in float conversion (see type(1/1)). the rest of this reply is just trying to clarify what I now realize was not necessary to clarify
in my comment I said "flooring the division is very much recommended" but what I really meant was not using the floor operator on the result but instead using the integer division operator. in python if you do a/b the result will always be a float bu if you do a//b then the result will be an integer if both a and b are as well. doing a//b is basically like doing int(a/b) except that there will be no conversion to a float and so no precision will be lost. for example, int((10**18+2)/2) evaluates to 5e17, because the division was a floating point one, but (10**18+2)//2 does result in the expected value of 5*10**17+1
@daveys 5 หลายเดือนก่อน
I’m a Python novice and I found this to be really interesting and useful. Many thanks!
@Sorestlor 2 ปีที่แล้ว ⁺¹
One can also use numba if applicable which can speed up numpy oddly in some cases. I believe because it bypasses the intermediary python steps.
@mrtnsnp 3 ปีที่แล้ว ⁺¹⁸
The numba package may be useful in some cases.
@mCoding 3 ปีที่แล้ว ⁺³
Absolutely, if you can use Numba to @jit your problem, them it will very likely speed up the solution! I'll probably cover this in a future video.
@mrtnsnp 3 ปีที่แล้ว ⁺⁴
@@mCoding I did a small test.
from numba import njit
@njit
def int_sum(n=1_000_000_000):
s: int = 0
for i in range(n):
s += i
return s
timeit.timeit(int_sum, number=1)
5.373000021791086e-06
timeit.timeit(int_sum_p, number=1)
44.33714248199999
int_sum_p is the same function, without decorator. The result is very close to C speed. But obviously, the best improvement comes from doing the maths properly.
@dmccallie 3 ปีที่แล้ว
Numba is amazing, if you can limit the code in the @jit’ed routine to simple loops and math and/or numpy operations. You can’t use things like lists or dicts
@gregorymorse8423 3 ปีที่แล้ว ⁺¹
Should do all the examples with numba this is also interesting. Cython also. Masterclass in squeezing loop performance, could also show sum(x for x in range(n)). Ideally use matplotlib to show the data for different sum sizes and prove the obvious that it is linear
@mrtnsnp 3 ปีที่แล้ว ⁺¹
@@gregorymorse8423 When trying to make a comparison with compiled C++ code, I discovered that clang removes the loop altogether, applying the maths that the programmer should have done in the first place. And I suspect that numba applies the same optimisation, otherwise the total time does not make sense with the clockspeed I have.
@zf0666 3 ปีที่แล้ว ⁺³
Good example of how being good at math and algorithms beats all micro optimizations of the code itself :)
@mCoding 3 ปีที่แล้ว ⁺³
Exactly!
@maxteer2800 3 ปีที่แล้ว
Great stuff! I never considered that while loops would be slower than for loops, but even as you started I was guessing it would!
@mCoding 3 ปีที่แล้ว
Glad you liked it!
@smellyD5655 2 ปีที่แล้ว
Does it get any faster if you right shift 1 bit instead of using / division, or does the optimiser do that for us when we use a literal 2?
@emilemil1 2 ปีที่แล้ว ⁺⁴
It would be a good idea to show examples that does a bit more work within the loop, to show that if your loop body is costly then how you loop is largely insignificant.
@K9Megahertz 2 ปีที่แล้ว
Yeap, I was thinking the same thing. These tests really don't tell you anything meaningful.
@Generlc_Human 3 ปีที่แล้ว ⁺⁹
when you dont know how to code but you know english: "i know some of these words"
@sebastienollquist1318 2 ปีที่แล้ว
I was so much expecting this formula to come up at the end :p
@HombrexGSP 3 ปีที่แล้ว ⁺¹
Hey! I'm just wondering if you can tell me the name of the font that you're using in the editor, I think it looks really great. Thanks! Have a nice one.
@mCoding 3 ปีที่แล้ว ⁺¹
It's called JetBrains Mono, and it is the default one in PyCharm. I also recommend Consolas!
@KonradKeck 2 ปีที่แล้ว ⁺⁴
Thanks for that! I had no idea about the ratios. Two ideas though:
I'm doing statistics and ETL jobs with pandas. I would've been interested in other methods as well like recursion or doing the same with pandas instead of numpy (whether it's any slower than pure numpy), pandas + apply + lambda, etc.
Second, going one step further and optimising performance for if-elif-else / mapping function would be great for another video.
@OMGclueless 3 ปีที่แล้ว ⁺³¹
It's also worth taking a second to remember that even in its slowest form, the python code is adding up 100 million numbers in 20 seconds. It's slow, but "slow" is a relative term.
@mCoding 3 ปีที่แล้ว ⁺⁶
This is a good point that everyone should keep in mind!
@carmelo5991 7 หลายเดือนก่อน
Is slower than other languages but still enough because computers are very fast
@edwinontiveros8701 6 หลายเดือนก่อน
On any CPU capable of many teraflops, 20 seconds for 100 million numbers means eons. It's like walking and taking a step only every 5 seconds, short distances It's not that bad, but will badly scale once you need to travel farthest.
@tamoregi 7 หลายเดือนก่อน
If I recall correctly there was an optimisation in LLVM/Clang that was able to detect looping over elements in order to add them and inject the n(n-1)/2 directly into the assembler. :)
@lakshyachopra_ 2 ปีที่แล้ว ⁺¹
This video was very informative. Thanks a lot James!
@mCoding 2 ปีที่แล้ว ⁺¹
Glad it was helpful! You are very welcome!
@MrGryph78 3 ปีที่แล้ว ⁺⁷
Pretty certain the numpy functions are built using SSE/AVX intrinsics, hence their massive speed up compared to scalar only methods.
@JensRoland 3 ปีที่แล้ว ⁺³
No loop unrolling variant? Nice roundup of methods, and nice video in general!
@mCoding 3 ปีที่แล้ว ⁺¹
You're right, I totally forgot about some "optimizations" one could try. Maybe a later video. Thanks for the support!
@superscatboy 3 ปีที่แล้ว ⁺³
Unroll to 100,000,000 lines of additions, then reduce them to a single value and have the function just return that value. Boom, orders of magnitude faster!
@n00blamer 3 ปีที่แล้ว ⁺¹
@@mattmurphy7030 Unrolling is more about reducing dependencies than iteration overhead, the multiple accumulators are summed in the end. Branch predictor doesn't play much role here as this loop is fully predicable, only the last iteration flushes the pipeline after the loop gets up to speed. Data dependencies can potentially stall the ALU, but if the CPU has a really short pipeline it won't have any effect.. this is highly dependent on the microarchitecture so arguing one way or another is a bit pointless unless we specify the hardware..
@verycrazyguy1 3 ปีที่แล้ว ⁺¹
I'd love to see a discussion including jit which is also a parallelism c based implementation for for loops.
@ButtKraken01 2 ปีที่แล้ว
This was a great explanation and earned you my subscription!
@yolamontalvan9502 3 ปีที่แล้ว ⁺⁶⁰
That’s why I would use C to program a missile.
@mCoding 3 ปีที่แล้ว ⁺⁴⁸
Watch out for those null pointers though, wouldn't want your missile program to CRASH!
@bamberghh1691 3 ปีที่แล้ว ⁺⁹
Usually Ada or SPARK are used in high reliability systems like aircraft instead of c since they are much safer
@mCoding 3 ปีที่แล้ว ⁺⁷
I had heard about this before, thanks for sharing, it makes a lot of sense in safety critical applications. I know NASA still uses C for (as least some?) space shuttles!
@mwanikimwaniki6801 3 ปีที่แล้ว ⁺¹
@@bamberghh1691 I knew ADA is used for missiles... But is there a specific reason why.
@arpandhatt6011 3 ปีที่แล้ว ⁺⁹
@@mwanikimwaniki6801 Ada has been specifically designed to catch as many mistakes at compile time. It’s static analysis tools are extremely thorough, and in general, it’s designed to prevent undefined behavior, from its type system, to its syntax. It’s an interesting language.
@nvmcomrade 3 ปีที่แล้ว ⁺¹³
Half way through the video I thought to my self why even loop to find an answer to a simple arithmetic progression and then I was surprised that this was addressed as well. Wow, impressive.
@mCoding 3 ปีที่แล้ว ⁺¹
Thanks!
@lepepito 3 ปีที่แล้ว
It serves the purpose to explain the problem of loops and also the process of finding different solutions to a problem
It’s a nice exemple
@user-hk3ej4hk7m 3 ปีที่แล้ว
I'm quite satisfied seeing the fastest way to do stuff is using a declarative style, it's probably it safest option too since you eliminate off by one errors.
@g.4279 ปีที่แล้ว
Look up tables are also a great way of trading computation time for some additional program size.
@andreiiaz2097 3 ปีที่แล้ว ⁺⁴⁶
Python is kinda like the doctor:
It's the last place you want to be, but you go when you need it, because otherwise things might get more complicated
@akshaybodla163 3 ปีที่แล้ว ⁺⁸
I personally like to start with python. Its vast libraries and simple structure is easy for me to test programs. Later when I decide on a specific implementation, i'll code that in C++.
@andreiiaz2097 3 ปีที่แล้ว ⁺¹
@@akshaybodla163 Yeah, I can see that, it's nice, but how do you transfer a python library to C++?
@antonliakhovitch8306 3 ปีที่แล้ว ⁺¹
@@andreiiaz2097 A lot of C/C++ libraries have Python bindings so you can just use the same ones a lot of the time.
@maythesciencebewithyou 3 ปีที่แล้ว ⁺²
In reality, python is the first thing most people will use because it is easy to use.
Python is only usefull because if works as a glue for other languages. Easy access to libraries written in more difficult languages.
@jaredgreathouse3672 3 ปีที่แล้ว
Such a fact. I do statistics research using Stata. My rule is Stata where I can, Python where I must.
@PKua007 3 ปีที่แล้ว ⁺⁴
Actually any modern C or C++ compiler with optimization turned on (-O3 or similar) would by itself compute the value of the sum compile-time and just inline the result. I am shocked that Python interpreter does not even try to for example throw out unused instruction during compilation. There must be a way to enable this. Or maybe this instructions actually have a side effect because of its weird namespace policy and cannot be thrown away without potentially spoiling the code. I am not sure...
@shinobuoshino5066 5 หลายเดือนก่อน
Both C and C++ compilers such as GCC and LLVM are written in C++.
Python is written in C.
JavaScript is main Python's competitor in easy to use scripting language... And it's V8 is written in C++ and its JIT is incredibly fast, the only thing that can compare is LuaJIT which is practically abandonware because one dev who made it has stopped caring and noone else is intelligent enough to pick it up and maintain it properly. Still, the difference is that LuaJIT is very fast... But it can't stream code and execute it in my browser as fast as possible, most projects don't even use lua for one reason or another.
Just shows that C is obsolete language and all good programmers use C++.
There of course will still be C zealots that will pretend otherwise, but they can't even write their own compiler and have to depend on C++ programmers who they hate more than they hate C++ for some reason.
@GNew0 ปีที่แล้ว
is it possible to use the numpy and clean up the memory so that you can work with larg numbers? idk if it even makes sens tho, i'm really noob in coding yet
@tonyflury2291 ปีที่แล้ว
The sum(...) builtin is also written in C, but is slower than numpy because the sum(...) builtin also has to deal with non numerics (you can use sum(...) on a list of strings for example) - numpy is only ever dealing with numbers, so it doesn't have to deal with the type dispatching layer.
@WhiteThunder121 3 ปีที่แล้ว ⁺⁵
"Is there anything o beat numpy?"
Try compiling numpy top native machine code using numba:
@nb.njit(cache=True, parallel=True)
def numba_sum():
return np.sum(np.arange(100_000_000))
> 0.00064 seconds
Or suing your GPU (Cupy):
def cupy_sum():
return cp.sum(cp.arange(100_000_000))
> 0.001747 seconds
Cupy should be an order of magnitude faster than you CPU, but only if you do some more complex computation than calculating the sum of the array.
@mCoding 3 ปีที่แล้ว ⁺¹
Good point! Maybe some day I'll do a comparison of which C library for Python is the fastest... so many choices though.
@WhiteThunder121 3 ปีที่แล้ว
@@mCoding Would be interesting to watch.
Good video btw. I did not expect the "i = i+1" to have such a huge impact when iterating.
@SillyLittleMe ปีที่แล้ว ⁺¹²
If you are writing pure and iterative heavy Python code, then PyPy is your best bet for performance. I ran the same test as mentioned in the video on Python 3.10 and PyPy 3.9 and my god the difference was staggering! Here are my results :
Python 3.10:
for loop : 6.6 secs
while loop : 11 secs
PyPy3.9:
for loop : 0.22 secs
while loop : 0.36 secs
PyPy is really meant for long running code and real world results show that.
@novakboskov9090 ปีที่แล้ว
Do you think that the Numpy's range realizing the sequence (keeping it in the memory before the summation) is exactly the reason why it is faster than sum? Is writting one integer to the memory milion times at arbitrary locations equally fast as writing a large continuous chunk?
@mahdiashrafmahir1534 ปีที่แล้ว
Thanks for the video, it was insightful
@VivekYadav-ds8oz 3 ปีที่แล้ว ⁺⁵
This seems insane to me that summation took more than a second! What is the record for the same in raw C/C++?
@brandonkirincich8181 3 ปีที่แล้ว ⁺²
It would definitely be much faster. Especially since this is a special optimization case. It might not even generate a loop.
@mCoding 3 ปีที่แล้ว ⁺¹
100_000_000 is a big number! Agreed, C/C++ would likely optimize it away.
@toebs_ 3 ปีที่แล้ว ⁺²
@@mCoding I think you should be more clear regarding the difference between language and implementation. It was definitely an interesting video, but this is not a problem inherent to the language itself, but rather a problem of (probably?) CPython. Also, C doesn’t optimize anything, gcc does. At least if you don’t tell it otherwise.
@xrafter 3 ปีที่แล้ว
@@toebs_
Oh yes you need to enable the optimization flag -On n for random number .
@waterdrinkert 3 ปีที่แล้ว ⁺²
Did you check if the functions return the right answer? When I ran them the numpy functions gave the wrong answer.
Works fine with n=10_000 but as soon as you add more digits it fails.
@mCoding 3 ปีที่แล้ว ⁺¹
There's a missing dtype=numpy.int64 which is fixed in the git repo now.
@waterdrinkert 3 ปีที่แล้ว
@@mCoding With this addition to the code, the numpy loop takes almost twice as much time to run. It is still lightning fast compared to the rest.
@CFSworks 3 ปีที่แล้ว
@@waterdrinkert Have you tried the 64-bit Python executable? 32-bit binaries can't access the full 64-bit width of the processor registers and have to perform 64-bit sums as a pair of 32-bit sums instead.
@waterdrinkert 3 ปีที่แล้ว
@@CFSworks I am using the 64-bit version of Python. I belive the issue was that C uses static types. Static types are also what makes C so fast and memory efficient. They are not a bad thing, you just have to pay attention to them.
@CFSworks 3 ปีที่แล้ว
@@waterdrinkert Sorry, I should have clarified that my reply was to your second comment, not the first.
If you're getting a significant slowdown even with native 64-bit operations, it could possibly be taking twice as much time to run because the bottleneck is the arange operation, and switching from 32-bit to 64-bit integers requires writing twice as many bytes to memory before summation can begin.
@FallinPython 3 ปีที่แล้ว ⁺²
Thanks for the tips!
The comparison with NumPy is not fair because it is made for array computing and hold an unique data type.
By the way, you deserve far more subscribers.
@mCoding 3 ปีที่แล้ว ⁺¹
Thanks!
@edwardwong654 ปีที่แล้ว
Quite interesting. I will probably never need to know this but if I get a chance to show off, I will find a way, haha. Thanks and very interesting.

ต่อไป

เล่นอัตโนมัติ