The Genius Way Computers Multiply Big Numbers

PurpleMind

มุมมอง 323 159

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 1 ก.พ. 2025

ความคิดเห็น • 754

@PurpleMindCS หลายเดือนก่อน ⁺⁴⁵
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/PurpleMind/ . You’ll also get 20% off an annual premium subscription.
@qqqqqq-ox4ck 29 วันที่ผ่านมา ⁺¹
NOOOOOOOOOOOOOO!!!!!!!!!
@qqqqqq-ox4ck 29 วันที่ผ่านมา ⁺²
STOP!!
@le-pas-cuit44719 29 วันที่ผ่านมา ⁺¹
How to have the offer free for 30 days?
@platedpen 22 วันที่ผ่านมา ⁺¹
loved this video
@enormousearl8838 29 วันที่ผ่านมา ⁺⁷³¹
Cunningham's Law: "The best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer." Kolmogorov demonstrated Cunningham's Law before the internet existed lol.
@PurpleMindCS 29 วันที่ผ่านมา ⁺⁶³
Lol
@edwardmacnab354 29 วันที่ผ่านมา ⁺¹⁹
No , the best way to get a flood of unwanted email ads is to post an answer on Reddit .
@mustgreetor 29 วันที่ผ่านมา ⁺²⁵
@@edwardmacnab354 Cunningham's Law in action right here
@lexigan6896 29 วันที่ผ่านมา ⁺⁵
@edwardmacnab354 wait what?? could you explain more
@BillAnt 26 วันที่ผ่านมา
Well my meth mind misread it as Cumminginher Law.... my bad. lol jk
@Qfeys หลายเดือนก่อน ⁺³³⁴
On the graph, you can see that the bend for the python code curve starts at 2^6, which is incidentally the same as 64. This is because on your computer - assuming you use a 64 bit computer - there is a chunk of silicon that has the multiplication algorithm burnt into it. Burning it in the silicon has the advantage that it can do all the additions in parallel, so the time is no longer dependent on the number of simple operations.
When python has higher numbers and needs to use Karatsuba algorithm, it only breaks down numbers until they are 64 bit, and then sends that to the cpu, as the cpu always does its multiplication at the same speed, no matter how many digits you need.
@thomquiri9860 29 วันที่ผ่านมา ⁺⁶
I'm dumb af, I'm not yet ready to instinctively assume that 64 "digits" can be binary digits lmao, good catch! But then he's wrong when he says it's just the basic algorithm being applicated here, it just uses an hardware instruction, and any "big number" uses karatsuba's algorithm anyway
@thomquiri9860 29 วันที่ผ่านมา ⁺⁶
so after thinking more about it, there's no way python uses a simple mul instruction because that wouldn't handle overflow, but maybe it makes multiple 32 bits multiplications in a 64 bits register so that it can't overflow?
@kellan5431 29 วันที่ผ่านมา ⁺²⁵
@thomquiri9860It probably depends on the architecture, but x86_64 does have a 64-bit multiplication instruction. It stores the full result in 2 registers.
@christianbarnay2499 29 วันที่ผ่านมา ⁺¹⁷
I'm surprised he didn't mention it since it corresponds to the Word model and "doing some basic arithmetic on single digit numbers" that he mentioned in the introduction at 1:30.
As you described it, the 64-bit CPU architecture has physical components that ensure constant time operation on 64-bit numbers. So in this architecture, 64-bit numbers (which we call a word) are the actual "single digit" numbers that cost exactly one operation. And this graph is a direct confirmation that the 64-bit version of Python performs as it should by delegating all 64-bit operations to the CPU and cutting bigger numbers into 64-bit blocks.
@asandax6 29 วันที่ผ่านมา ⁺⁷
If you are going to go down this rabbit hole you might as well mention SIMD which can do multiple 64bit or less operations per instruction and can also do multiplication and addition in one instruction.
@EdgarRoock หลายเดือนก่อน ⁺⁶⁰⁵
8:18 Speaker: I'd encourage you to pause the video and see if you can figure that out.
Me: No, no, please go on.
@DavidLindes หลายเดือนก่อน ⁺¹
mood!
@louisrobitaille5810 หลายเดือนก่อน ⁺¹²
8:22 I didn't need to pause the video. a*c and b*d can be calculated first, then stored in memory and used for the "middle step", saving computation time. The proof of turning 100(a*c + b*d) into 100[ (a+b)*(c+d) - a*c - b*d] isn't nearly as obvious though, unless you know the mathematical trick for it I think 🤔. I haven't tried to do it yet 🤷‍♂️.
@projekcja หลายเดือนก่อน ⁺²⁰
I actually paused the video, figured it out, and continued the video only to hear him suggesting I would pause the video.
@DavidLindes หลายเดือนก่อน ⁺¹
@@projekcja Been there! I didn't with this one, but I've totally done that. :)
@klausgrnbk6862 หลายเดือนก่อน ⁺⁴
@@louisrobitaille5810 yeah, It felt counter intuitive to add b and c to the terms, and then subtract the products of the addition again. That to me is the genius, it was a flashback to wonderfully simple math learned 30 years ago ;)
@vlc-cosplayer หลายเดือนก่อน ⁺¹⁸⁸
Saying "it's impossible to improve on this" is the cheat code to get people to find a better solution, just so they can prove you wrong.
Reminds me of posting a question and then giving yourself a wrong answer on purpose, because people would rather prove someone wrong than help someone else 😆
@SheelByTorn 29 วันที่ผ่านมา ⁺⁵
that's how conjectures work tho
@altrag 28 วันที่ผ่านมา ⁺⁶
@@SheelByTorn I'd strengthen that statement: it's not just how conjectures work, it's their entire purpose for existing - a challenge to the world to either prove or disprove a hypothesis.
@romannasuti25 26 วันที่ผ่านมา ⁺³
Just as clarification, there are also actual algorithmic hard limits: comparison sorts, for example, can literally never outperform n log n, proven formally. When you see sorts outperform n log n, they’re exploiting absolute ordering.
@vlc-cosplayer 26 วันที่ผ่านมา ⁺¹
@@romannasuti25 if you definitively prove that something is impossible, then it's even better, since no one can improve on your result. 👀
@jwpogue หลายเดือนก่อน ⁺²³⁰
This video is ridiculously well made! holy crap! The animations and scripts are beautiful and incredible!
@PurpleMindCS หลายเดือนก่อน ⁺²⁵
Thanks so much! Glad you enjoyed.
@friedrichmyers หลายเดือนก่อน ⁺⁵
It probably uses panim
@David_Lloyd-Jones หลายเดือนก่อน ⁺³
I know your type. I bet you're one of those pancake-eaters the machine seems to like so much.
@JordanMetroidManiac หลายเดือนก่อน ⁺⁹
It looks similar, if not identical, to 3Blue1Brown’s open source animation tool.
@friedrichmyers หลายเดือนก่อน ⁺¹
@@JordanMetroidManiac It is VERY similar.
@Starwort หลายเดือนก่อน ⁺⁶⁵⁵
FYI Python integers have a `.bit_length()` method you could have used instead of converting them to a string and taking their length
@PurpleMindCS หลายเดือนก่อน ⁺²⁴⁹
Ah! I did not realize that. Was looking for something like this. Thanks :)
@YatharthRai หลายเดือนก่อน ⁺⁸
Thanks I'll use this
@mohamedahmed-rk5ez หลายเดือนก่อน ⁺⁴⁰
However this .bit_length() method is really good but it gives you 1+floor(log(n)) and couldn’t give you rational number
It will be good for discrete cases
@0xDEAD_Inside หลายเดือนก่อน ⁺²
Damn! Nice!
@sdspivey หลายเดือนก่อน ⁺¹¹
@@mohamedahmed-rk5ez Why? There is an exact number of bits. Even if it rounded up to the next whole byte, it would still be an integer, ie. rational. Seems like python would be faster just to count the number of bytes or whatever int-class is used.
Right off, I could just take the max_index of the array that holds the number, multiply that by the int_size, then subtract off each zero bit at the top. Always an integer answer, no slow FP functions needed. Am I smarter than the python programmers? Probably not.
@esra_erimez หลายเดือนก่อน ⁺⁹⁶
In Knuth's "The Art of computer Programming" Volume 2 has a Chapter 4.3.3. "How Fast Can We Multiply?". It is a very interesting read.
@drdilyor หลายเดือนก่อน ⁺¹²⁰
btw, the schonhage-streissen algorithm uses FFT / NTT which is another genius very practical O(n log n) algorithm for multiplying polynomials, but when applying it to number multiplication, we run into precision issues or the NTT modulo becomes too small beyond some point, thats why there is an extra log log n factor.
@chrisalex82 29 วันที่ผ่านมา ⁺⁶
Based pfp btw
@asandax6 29 วันที่ผ่านมา ⁺³
Most of Physics and Mathematics is just Precision Problems. Equations tend to work until they don't.
@Polyamathematics 29 วันที่ผ่านมา ⁺²⁴
This video is exceptionally well made and well paced. I got to the end of the video without realising 20 mins had passed. Amazing work!
@PurpleMindCS 29 วันที่ผ่านมา ⁺⁶
Thanks so much! Everyone go check out Polyamathematics by the way -- fantastic channel that you'd definitely really enjoy if you like my content.
@leisti 28 วันที่ผ่านมา ⁺²
I agree. I subscribed immediately.
@kaustubhpandey1395 หลายเดือนก่อน ⁺⁸³
Nice of you to Grant exposure to a small channel like 3Blue1Brown❤
@kaustubhpandey1395 หลายเดือนก่อน ⁺¹¹
P.S. this is a joke
@anon_y_mousse หลายเดือนก่อน ⁺⁵
@@kaustubhpandey1395 I think what you meant to say was that you *intended* for it to be a joke, but jokes have to be funny.
@denki2558 หลายเดือนก่อน ⁺¹⁷
@@anon_y_mousse being funny is an extrinsic property (a relationship between an audience and the joke), not an intrinsic one. So, you can't define what is a joke or not based on how funny it is to you.
@anon_y_mousse หลายเดือนก่อน ⁺³
@@denki2558 There's enough universality that you often can, and let's face it, the majority of dad "jokes" are universally reviled.
@denki2558 หลายเดือนก่อน ⁺¹⁰
@@anon_y_mousse Jokes can simply be unfunny to a demographic. The universality claim can also be easily disproven. For any arbitrary joke, there's at least one person who thinks it is funny- the author if it.
@zacharyzadams หลายเดือนก่อน ⁺³⁷
13:12 If you want a concrete interpretation of why O(n^log2(3)) shows up, look into the Master Theorem. Basically, for any recursive divide-and-conquer algorithm that's leaf-heavy, all you need to find the complexity class is to know the number of problems it splits into at each level and the new problem size. So every recursion of Karatsuba calls Karatsuba 3 times, each with a problem size of n/2. That's where the 2 and 3 come from, and that's also why naive multiplication which calls itself 4 times becomes O(n^log2(4))=O(n^2).
@gregorymorse8423 หลายเดือนก่อน ⁺⁴
A simple divide and conquer algorithm has a log(n) tree with n leafs. Karatsuba has the lo-hi vs hi-lo mul making it not a simple case. The Master Theorem formally proves it but it was already obvious
@PurpleMindCS 29 วันที่ผ่านมา ⁺⁶⁷
Hi everyone! A lot of you are pointing a particular spot in the video (namely around 16:00) where I'll admit I was a bit unclear with my explanation of what's going on, so let me clarify a few things here:
First of all, on the horizontal axis, n is the number of **base 10** digits, not the number of bits for the numbers being multiplied. Secondly, the "version of the naive O(n^2) algorithm" I referred to is, on most modern computers (for up to 64-bit numbers), implemented in parallel on a CPU. That takes us up to 19 base 10 digits, which is roughly 2^4.24. On the graph, this is the section from log_2 n = 0 to log_2 n = 4.24, and there you really do see an almost perfect horizontal line, which makes sense if multiplications of this size are done during just a few clock cycles in hardware.
Secondly, I pointed to n = 2^7 = 128 **base 10** digits (up to 426 bits) for the part where the slope changes, indicating a switch to Karatsuba's algorithm at that point. After reading through the source code: github.com/python/cpython/blob/main/Objects/longobject.c, it appears that the exact cutoff happens at KARATSUBA_CUTOFF = 70 **base 2^30** digits on 64-bit machines, or 70 **base 2^15** digits on 32-bit machines. My computer is a 64-bit machine and (2^30)^70 is a 64 digit **base 10** number, so I think the correct place to draw the dotted red line on the graph in the video was at log_2 n = 6 instead of 7. Coincidentally, 64 just happens to be a power of 2 which is probably what was causing some of the confusion. But between roughly 4.24 and 6 on the graph, Python really is using the O(n^2) "schoolbook algorithm," as referred to in the source code linked above, because the asymptotic performance of Karatsuba's algorithm isn't kicking in yet for numbers that small due large amounts of overhead. However the "schoolbook algorithm" is also implemented with base 2^30 (for 64-bit) or base 2^15 (for 32-bit) numbers, so the video I have on screen with the base 10 representation is just to give you an approximate picture of what's going on behind the scenes, since base 2^30 numbers are obviously very hard to represent visually in an appealing way.
Let me know if there's anything else you think I missed or that I was unclear about!
@victorvanlent1312 28 วันที่ผ่านมา ⁺¹
Can't you pin this comment?
@erikkonstas 28 วันที่ผ่านมา ⁺²
@victorvanlent1312 No, the sponsor comment most probably has to be pinned, and you can't pin two comments at the same time.
@robertbackhaus8911 25 วันที่ผ่านมา
Could you think of the computer doing these in base 2⁶⁴, each operation of the ALU using 64-bit numbers? 2⁶⁴ makes a ~19-digit base 10 number, so it's changing to the faster algo on what for it is to it is only a 4 'digit' number, if I'm correct.
Then you have whatever algorithms the multiplication unit in the ALU is using. And all those branching operations look very parallelizeable,
@Pystro หลายเดือนก่อน ⁺³²
I wish you would have mentioned that a+b and c+d have one more digit (or bit) than each of the halves, and how that affects (or actually doesn't affect) the scaling.
And surprisingly it doesn't even mess up the "powers of two" scheme that hardware is typically aimed at. Your most elementary unit that you eventually fall back to just has to be able to multiply numbers that are 1 longer than you'd think from dividing your number's length by 2 to the power of the number of "Karatsuba divisions" that you do.
@PurpleMindCS หลายเดือนก่อน ⁺¹⁰
That is correct! I did actually mention this in a red blurb on the screen at that time but for the sake of concision I decided to leave it as a footnote of sorts.
@Alphabetatralala หลายเดือนก่อน ⁺³
I mean as much as this detail is interesting, about anyone can figure it out by themselve while watching the video. When there's 'one more digit' due to addition, this digit is always a 1 no matter the base, and this is especially easy to take into account in base 2.
@yashprajapati8857 หลายเดือนก่อน ⁺⁸
This video is so well made! The flow of information in this is perfect, when you were talking about time complexity I was in fact recalling about galactic algorithms and how the constant multiple could sometimes grow so large seemingly faster functions could be outperformed by "slower" ones for data used in practice, and then you mentioned about galactic algorithms at the end and I was utterly delighted. The flow of info was exceedingly good going from introducing about algorithms and mathematics and then going into a story form talking about history of its discovery for the breakthrough idea, deriving everything from scratch and explaining the reason behind everything, showing the practical implementation in Python and testing it, and even after that not stopping there and explaining the scenario of development and improvement on algorithms beyond that and showing how useful research is. Amazing video, I'm impressed!
@Derpinator01 29 วันที่ผ่านมา ⁺⁸
Me seeing reused terms in Katsuraba's method: "Ooh, and then we can merge the similar terms to get 99900[axc] + 100[(a+b)x(c+d)]-99[bxd] to bring the total number of multiplications down to 3!"
Video: *Goes straight to the math comparing O(n^log(2)3) to O(n^2)*
Me: "Oh, that works too."
@galacticminx 26 วันที่ผ่านมา ⁺⁴
I had the same thought, but later realised 10000axc - 100axc is the fast way to compute 9900axc in base ten. I expect the same would apply in base two.
@ItsGray3 หลายเดือนก่อน ⁺⁹
Man, how in the world do you not have more views and subscribers???? Like seriously, the animations beautiful, the math interesting, and the explanations are chopped down into more digestible chunks.
@PurpleMindCS หลายเดือนก่อน ⁺²
Thanks so much!
@nicemathproblems 27 วันที่ผ่านมา ⁺²
Awesome video! Well done 🥰
@PurpleMindCS 27 วันที่ผ่านมา ⁺¹
Thank you! Very much appreciated
@donaldhobson8873 หลายเดือนก่อน ⁺⁸
The last term in the sum is
n*1.5^k where k=log_2(n)
=n*(2^log_2(1.5))^k
=n*2^(log_2(1.5)*k)
=n*(2^k)^log_2(1.5)
=n*n^log_2(1.5)
=n^(1+log_2(1.5))
=n^(log_2(1.5*2))
=n^log_2(3)
So the last term in the sum, the bottom row of the recursion, takes asymptotically the same amount of compute as the whole thing. (up to a factor of 3.)
@PurpleMindCS หลายเดือนก่อน ⁺⁶
Indeed! There's a special name for that (it's called a leaf-dominated recurrence).
@Lewehot หลายเดือนก่อน ⁺⁵¹
I figured out the two digit at a time multiplication and a few things not mentioned in the video when I was 14 y/o in 1982. I was rotating points of a tie fighter on an Apple II+ 6502 CPU. I think I got it down to 20 clock cycles to do a 16 bit multiplication on an 8 bit CPU that took 1 to 5 clocks to do a single operation on the CPU. Reached 50,000 mults per second. Then rotating 3D points required sine and cosine so I had to optimize that as well plus square roots. Lots of work and experimenting. I could rotate 35 or so 3D points on Darth’s Tie Fighter at 3 frames a second including time to clear the screen and draw the lines one pixel at a time in assembly code. This was 1 frame per second faster than Woz who built a 3D rotation utility program. I didn’t realize the importance of what I was doing, was just trying to rotate a tie fighter to show my geeky friends.
@TheFrewah หลายเดือนก่อน ⁺⁸
How funny! At 14, I figured out how to manually calculate square roots. My aha moment came when i did 9/3=3. That can be seen as an equilibrium. Divide by something smaller than the square root and you get something larger. So 9/2=4,5. Just calculate the average and repeat (2+4.5)/2=3,25 which is better. Similar to the babylonian method but more intuitive. Best of all is that it can be extended to do cube roots. Or fourth roots. I did nothing, not even telling my lousy math teacher who I thought would say that someone had told me this.
@leif1075 หลายเดือนก่อน ⁺³
That does sound like a lot of work..was it.modtly fun and enjoyable? Thanks for sharing.
@edwardmacnab354 29 วันที่ผ่านมา ⁺²
@@TheFrewah And then there's the guy who can just get the answer in his head with zero calculation
@ThomasPalm-w5y 29 วันที่ผ่านมา ⁺²
I did something similar on the Apple II, only to realize the bottleneck was how long it took to draw lines not to calculate the end points.
@TheFrewah 29 วันที่ผ่านมา ⁺¹
@@ThomasPalm-w5y Some things may come as a surprise. Someone wanted to make a c++ program to outperform python and it didn’t work as expected. The reason was that he used a print statement that which contained
@micah6635 9 วันที่ผ่านมา ⁺¹
Came across these video a couple of weeks ago and started it but never finished. Today was the first day of my algorithms course and this was the subject. Definitely wish I finished this video back when I saw it for the first time.
@AdityaRaj-ki3md หลายเดือนก่อน ⁺⁴
Never expected this type of great explanation. One of the greatest channel I found on computation.
@PurpleMindCS 29 วันที่ผ่านมา ⁺²
Thanks so much! Glad you liked it.
@RandomBurfness หลายเดือนก่อน ⁺²⁹
OH MY GOD YOU WRITE A SET INCLUSION SYMBOL INSTEAD OF EQUALITY WHEN TALKING ABOUT TIME COMPLEXITY, FINALLY SOMEONE THAT DOES IT PROPERLY!!!!!!!!!! THANK YOU!!!!!!!!
@drdilyor หลายเดือนก่อน ⁺⁵
🤓
@PurpleMindCS หลายเดือนก่อน ⁺⁷
😃😃😃😃
@spicybaguette7706 หลายเดือนก่อน ⁺¹
🎉🎉
@spicybaguette7706 หลายเดือนก่อน ⁺²
One minor nitpick though: big O notation is an upper bound, so not all algorithms that are O(n) look like a line. For example, any algorithm that is O(log n) is also O(n), but it doesn't look like a line
@gregorymorse8423 หลายเดือนก่อน ⁺¹
@@spicybaguette7706wrong. It's worst case does look like a line which is exactly what big O is addressing. You completely missed the point.
@raphaelmonserate1557 28 วันที่ผ่านมา ⁺³
just finished the data structures and parallelism class at UW, and you did a great job compressing everything in that class into a wonderfully small package. 🏆
@PurpleMindCS 28 วันที่ผ่านมา ⁺¹
Thanks so much! Glad you enjoyed it.
@whiteoutbored หลายเดือนก่อน ⁺²
woah! as i was watching i didn't realize just how underrated this channel was! subbed, and great work on the video!!
@MobiusHorizons 28 วันที่ผ่านมา ⁺⁴
Probably worth mentioning that python doesn't "switch from the naive algorithm to karotsuba at some size" it's actually going from doing a single hardware multiply to multiple hardware multiplies. ie the inflection point on your graph is where the number goes from 1 digit to multiple digits, where each digit is the largest number the computer internally knows how to do math on. from your graph it looks like that is 2^6 (64 bit) which would make sense. Modern CPUs operate on 64bit numbers natively.
@PurpleMindCS 28 วันที่ผ่านมา ⁺²
Hey, I just made a comment about this :) Hopefully that should help clear things up.
@ke9tv หลายเดือนก่อน ⁺²⁰
Most Python implementations use Comba multiplication rather than the schoolboy method for numbers too small to resort to the Karatsuba method. It has the same big-O time as the schoolboy method, but takes an aggressive approach to carry propagation that reduces the constant factor significantly. (The recursion in the Karatsuba logic also bottoms out into Comba, rather than going all the way down to single precision.)
Lovely explanation! Subscribed.
@DjVortex-w หลายเดือนก่อน ⁺¹
The more official name for "the schoolboy method" is "long multiplication".
@EricKolotyluk หลายเดือนก่อน ⁺⁵
I could feel the joy you have in your explanation. Thanks.
@markbloyd9852 29 วันที่ผ่านมา ⁺¹
He does. We're climbing buddies, and I enjoy when we are resting sometimes that he shares with me some ideas that are more on my level of comprehension. He's very good at helping me understand what he's talking about.
@sidreddy7030 หลายเดือนก่อน ⁺⁵
This is such a great video. Can't wait to see the CS content you will come up with next.
@PurpleMindCS 29 วันที่ผ่านมา ⁺²
Thanks so much! I'm looking forward to it as well :)
@kart_elon_xd หลายเดือนก่อน ⁺¹²
9:54 oh no, math?? In a math vídeo??!
@trueriver1950 หลายเดือนก่อน ⁺³
It's also got maths, but that's ok because it's also a maths video 😅
@PurpleMindCS หลายเดือนก่อน ⁺⁶
XD
@leif1075 หลายเดือนก่อน ⁺²
@@PurpleMindCSCan you PLEWSE share how or why Laratsuba would come up with thia and hiw can I come upmwith something like thst orngresterand be a math genius. I won't settle for anything less. Thanks for sharing and hope to hear frlm you.
@cdeerinck 29 วันที่ผ่านมา ⁺³
If the numbers are kept in binary, there is an algorithm (I just created it) to multiply them in o(n) time, that actually requires no multiplication operations. Just bit shifts, addition, and a logical AND test for odd numbers. Say we have 2 numbers x and y. Given:
x = 10101...
y = 11011…
r = 0
while x > 0 # since x is divided by 2 each time through the loop, this runs in o(n) time, where n is the number of digits in x
If x % 2 != 0
r+= y
x = x >> 1 # equivalent to dividing x by 2
y = y
@henrikherranen2610 29 วันที่ผ่านมา ⁺²
Interesting question.
Your while loop is performed n times, and i guess that leads you to think O(n). However, do.note that addition is on average performed every other loop, and (trivial) addition is also O(n), as are shift operations that are performed every loop. So, n loops with n execution complexity for each loop -> O(n²)
Did this help?
@cdeerinck 29 วันที่ผ่านมา
@@henrikherranen2610nope, that would be 3*o(n), which is still considered o(n). Still linear, but with a different slope.
@ric6611 29 วันที่ผ่านมา
@@cdeerinck but you do those "3*O(n)" n times.
@vulcanfeline 29 วันที่ผ่านมา
booth's algorithm, which is the only one i learned during my time getting a master's degree in computer science and i have no clue how i remembered that name after 40 yrs
@henrikherranen2610 29 วันที่ผ่านมา ⁺¹
@@cdeerinck Well, in that case you just like that have invented the revolutional algorithm that no-one ever thought of in over 60 years. Congratulations!
No, but in real life, if you have two *nested* loops, then you have O(n²). The addition and shifts all contain "invisible" / implicit loops, as you have to make a loop of n to implement them.
I have no will to argue more, so:
End of line.
@aidanthird หลายเดือนก่อน ⁺⁹
2:07 nice touch with pi and e
@PurpleMindCS หลายเดือนก่อน ⁺⁴
Thanks!
@parthsavyasachi9348 29 วันที่ผ่านมา ⁺¹
I started listening to the video and it gave me idea to multiply two large digit numbers far far faster than this method. The method is so simple that I am surprised no one is using it.
@victorfunnyman 29 วันที่ผ่านมา
a fermat fan I see
@parthsavyasachi9348 29 วันที่ผ่านมา
@@victorfunnyman f ING TH-cam removed my comment
@TheNameOfJesus หลายเดือนก่อน ⁺²⁴
@20:04 - "efficient multiplications with thousands of digits are now essential to public Key Cryptographic systems like RSA that keep your information secure." This is not accurate, because RSA is NOT used to encrypt user data. RSA is only used to encrypt the user's session key for a symmetric algorithm such as AES, Triple DES, DES, Blowfish, etc. The *average* session key used on the Internet is 200 bits, which is about 25 decimal digits. That's not a big number at all, by this video's definition of "big." Having to multiply a 25 digit number by another 25 digit number using any algorithm just once for a single SSL session (eg, one web page) is not going to make any difference to the user's experience. That's my opinion.
@seheyt หลายเดือนก่อน ⁺¹
You still need the RSA keys which are routinely 4096 bits
@TheNameOfJesus หลายเดือนก่อน ⁺⁷
@@seheyt That's true, but in order to get a 4096-bit RSA key you have to multiply two 1024-bit prime numbers, p and q, then append the product of two other 1024-bit numbers, p-1 and q-1. And my point was that you have to do this process only once per session, not once per block of user text. Moreover, in this video, he said that these faster algorithms are better only when you have thousands of decimal digits, and 1024 bits is only 128 digits. So there's no performance benefit to the newest algorithms for something that's done only once every web page.
@seheyt หลายเดือนก่อน ⁺²
@@TheNameOfJesus Fair analysis. It was rather irrelevant to point out the session key though :) I wonder what the role of FFT/convolution based algorithms is in this picture. I assume they may not be suited for exact integers (rather for numerical computation)
@BobJewett 29 วันที่ผ่านมา ⁺²
10 bits is 3 decimal digits (2^10 = 1024) so 200 bits is about 60 decimal digits.
@BobJewett 29 วันที่ผ่านมา ⁺²
@@TheNameOfJesus No, 1024 bits corresponds to about 308 decimal digits.
@markthart2661 หลายเดือนก่อน ⁺²⁶
GPUs use the naive O(n^3) for matmul, because that one can be done in parallel. Recursive (divide and conquer) does not work great with specialized circuits.
@gregorymorse8423 หลายเดือนก่อน ⁺¹
Wrong. Papers are published showing Strassen and Winograd outperform naive algorithms for as little as 2048×2048 matrices. Either you've been living under a rock for 20 years or rely only on free public tools which aren't providing state of the art.
@framegrace1 หลายเดือนก่อน ⁺¹
They use special parallell algorithms for matrix multiplication. Which is the clue of their speed.
As all hardware implementations I know, yes GPUS also use some sort of O(n^2) algorithm for plain multiplication.
@TskJafri 23 วันที่ผ่านมา ⁺¹
I just opened TH-cam for a nice video to watch during lunch. Thank you! Instant subscribe. Kudos, and looking forward to more.
@PurpleMindCS 22 วันที่ผ่านมา ⁺¹
Glad you enjoy the content :)
@rocketman5004 23 วันที่ผ่านมา ⁺²
make 3 arrays. one for each number you want to multiply and one for the result. the result array has the lenghth of both numbers combined.
now make 9 arrays, the lenghth of number a. array 1 is a, array 2 is array 1+a, array 3 is array 2+a, array 4 is array 3+a, and so on.
now go along the result array, shifting it by one number everytime you multiply it with the next number from array b and just add the corresponding array 1-9.
that way you just have to do 9+n additions of large numbers.
additions should be easier than multiplications anyway.
@WuddupDok 27 วันที่ผ่านมา ⁺¹
Awesome video! Great pacing, depth, and use of animation to reinforce the ideas being presented
@TmOnlineMapper 26 วันที่ผ่านมา ⁺²
As a little note as to why the naive approach is actually fast for smaller numbers.
When we go back to the long multiplication, the idea is that you shift the top number and multiply it by the digit of the lower number you're currently doing. In binary all the digits you have are 0 and 1. Multiplying by 0 gives zero, so you have nothing to add. Multiplying by 1 just gives the same number. So essentially you just shift the top number over one by one and if the corresponding digit in the bottom number is a 1, you just add it to the total. And as a further optimization, you can just ignore the tailing digits of your sum as you go along, further reducing the work you need to do. It's still O(n^2), but very efficient, since the individual operations are extremely fast on CPUs.
@OliviaCynderAera หลายเดือนก่อน ⁺⁸
This video reminds me how I got good at multiplying by 9. Just add a 0 at the end and then subtract the original number! Somehow just trying to multiply by 9 without that was too much headache.
@leif1075 29 วันที่ผ่านมา ⁺¹
Zero at the end of what? Youboeft it kind of vague sorry..or isnt itneasier or maybe as easy and as clear as ypur method is just replace 9nwith 10 minus 1..that's what i think of and Inthink as effective as your method..
@minecraft_user2 29 วันที่ผ่านมา
@@leif1075 they meant the same thing
9 * x = (10 - 1) * x = 10x - x
@KulaGGin 29 วันที่ผ่านมา
Yes, it's a system. That's how a mathemagician does it, you can find the video about it here on YT.
When I have to multiply like 7 by 39, you can do the same: 7x4 = 28 * 10 = 280 - 7 = 273. You can do all that in your head in 10 seconds.
@Matthew-eu4ps หลายเดือนก่อน ⁺²
In university we had a method that involved: reducing the inputs modulo a series of primes, where a large power of 2 divided p-1. Then do a finite field Fourier transform, add the results, and reconstruct using Chinese remainder theorem. I can't remember the cost, but I think the reason had to do with the maximizing the computational benefit of the word size on the computer.
@Jonasz314 22 วันที่ผ่านมา ⁺²
Karatsuba algorithm also exists in C++ through boost library. C++ does not have native support for big ints, as long as you're dealing with 64-bit quantities or less, multitplication is done by hardware, so there's no point in trying to optimize. The boost bigint library is a little clunky to use, and word to the wise - use the decimal version rather than the binary version, it's actually faster (the reason being that you never have to worry about overflow during the intra-limbs multiplication at leaf level, which saves a ton of cycles).
in C++, Karatsuba is slower than straight-up multiplication with anything less that ~1000 bits of data, so that's typically where the branching stops (Boost cuts off Karatsuba at 40 limbs, each limb is a 32-bit quantity).
@Laurencio-lm7ui หลายเดือนก่อน ⁺¹⁰
1:59 the numbers being added are the first 6 digits of pi and e
@notafraidofchange หลายเดือนก่อน ⁺⁴
Multiplied by 100,000 of course. 😁
@PurpleMindCS 29 วันที่ผ่านมา ⁺²
:D
@matthewwoods6972 24 วันที่ผ่านมา ⁺²
It can be a fraction. 117/200. So n^1&117/200
@A_literal_cube 23 วันที่ผ่านมา
log2(3) is an irrational number.
@matthewwoods6972 23 วันที่ผ่านมา
@A_literal_cube fair but you said .585 in my defense.
@A_literal_cube 23 วันที่ผ่านมา ⁺¹
@@matthewwoods6972 Sorry, I didn't create this video, and he said "approximately 1.585" not exactly.
@morten-punnerud_engelstad 27 วันที่ผ่านมา ⁺⁴
Can't we add a parallell number scan and split up (reduce) bases where two numbers in the same position combined is not over 10, then it can be split into chunks and processed faster with parallell processing. Then combine them together because we know the spits is independent som its just "text" combination afterwards (map it together).
I see that numbers with 5000000 digits takes on average 11 seconds to process on Python (M1 MacBook Air), with standard deviation of 0.25 seconds.
A good place to start testing new algorithms.
@paperiswhite2 22 วันที่ผ่านมา ⁺¹
I got no idea why youtube recommending me this, but it got me into math for probably a week now
@shadeblackwolf1508 28 วันที่ผ่านมา ⁺³
Funny detail, in binary, multiplication, is expressable as summing one number with bitshifted copies of itself where the bitshifts match the indices of every 1 in the other number. imagine A times B. This algorithm iterates over A to get the indices, then for each index creates a bitshifted version of B, then needs to do some number of summations of the numbers. Not sure how well this performs, but it's fun to come up with algorithms like this. Time complexity for each step: iterating over a number's bits is O(N), Iterating over the list of indices to create bitshifted copies which while fast is still O(N^2). summing many large numbers would also be O(N^2).
@FunWithBits 28 วันที่ผ่านมา ⁺⁴
16:10 - the slope is flat at the start on the chart because a 64-bit CPU can multiply two 31-bit numbers directly. Then for up to two 63-bit numbers in just takes two to three instructions.(basically a multiply high and multiply low) After 63 bits it starts grow with the expected growth.
@PurpleMindCS 28 วันที่ผ่านมา ⁺²
Hey, I just made a comment about this :) Hopefully that should help clear things up.
@Blingsss หลายเดือนก่อน ⁺²
This video beautifully illustrates how a mathematical breakthrough can change the course of technology. Karatsuba’s Algorithm remains a cornerstone in computational theory. Pairing these insights with SolutionInn’s study tools is a fantastic way to deepen understanding.
@mamiemando หลายเดือนก่อน ⁺²
Maybe you could be interested in the paper "addition is all you need" which describes an optimized multiplication for floats.
@trueriver1950 หลายเดือนก่อน ⁺¹
In the eighties there was a considerable (though minority) interest in doing everything in interest arithmetic. The language Forth is probably the best place to start if you want to understand how that worked back in the day.
Numbers like pi and e would be represented as rationals, that is as ordered pairs of integers, rather than as floats.
The introduction of the floating point co processor, and the later incorporation of floating point arithmetic into the standard CPU was more programmer-friendly so soon won the battle. However with sufficient programmer skill it would be possible even now to get better performance using only integer-rational arithmetic. That would entail scientists, engineers, and accountants all learning more integer number theory than they seem to want to...
@bananacraft69 28 วันที่ผ่านมา ⁺¹
We weren't taught some algorithm for multiplication, we were told to memorise the result of every multiplication x*y for 1
@punkbutcher5321 หลายเดือนก่อน ⁺¹³
I am not sure about the initial behaviour of the builtin multiplication at 17:00. 2**-22 s is about 10**-7 s, which is horribly slow for a simple multiplication, which should be done in ns (10**-9 s).
Probably some function calling overhead is the bottle neck at that point, where it also is important how you benchmark functions (measuring around or in loop? which time function? threading? etc)
In Python I definitely would compile the functions with numba or alike, to reduce the calling overhead.
The math is presented really well though :)
@bru57000 หลายเดือนก่อน ⁺¹
My guess is this measurement is biased because of incompressible time of API calls handling datastructure transfert or transformation from the interpretive language to the machine code.
@JonBrase หลายเดือนก่อน ⁺¹
10-7 is 100 ns, which is to be expected when you're working at arbitrary precision, even at less than a machine word of precision, an arbitrary-precision multiplication can't just do a hardware multiply and nothing else: you have to follow a pointer to each number, determine the length of each, decide what algorithm to use, retrieve the actual value from memory (if it indeed fits in a machine word and can thereby be loaded into a register), and only then can you perform the hardware multiplication, and then you have to store it back into memory.
Even without any function call overhead, the bookkeeping will be expensive, as will the memory accesses if they miss in cache (especially if they get punted all the way down to L3 or RAM. A single access that misses in cache can easily eat 100 cycles).
@trueriver1950 หลายเดือนก่อน
This doesn't matter, as the discrepancy will be constant. The demo is only looking at orders of magnitude here, hence the validity of the O() notation
@punkbutcher5321 หลายเดือนก่อน
@@trueriver1950 I am not doubting the validity of the O notation, but people might draw wrong conclusions from the example shown. For example the location of the "kink" will be shifted due to the offset, leading to wrong conclusions about where the transition happens, which actually does impact the presented explanation.
I did not expect an in-depth performance tutorial, but hinting at caveats would have been nice. I am just trying to provide a clearer picture, point out pit-falls and what should be expected when people try to do similar studies out of curiosity.
@punkbutcher5321 หลายเดือนก่อน
@@JonBrase As the presented code does not show the data type used, it is absolutely possible more book keeping is done than I expected, plus I am not familiar with performance levels of arbitrary precision modules in Python.
However, Python does have a very heavy footprint when it comes to calling functions, I would not even look at different cache levels for this issue. Also keep in mind, that even for a single-digit multiply the cost is allegedly 100ns, which to me is absurd, so I still doubt this without seeing more details.
However, it would be important to know how the performance is measured, otherwise we are just guessing. I shot myself into the foot once, because the rng for the data was part of the loop, and as the algorithm got faster (rewriting and compiling) that was the limiting factor at some point. The amount of input data pre-calculated and stored then also affects the cache levels used during the loop, assuming we measure the sequential throughput.
@pvic6959 28 วันที่ผ่านมา ⁺³
18:37 loved the 3blue1brown reference. I feel like this channel is the same thing but or computer science and I LOVE it!!
you and 3b1b in my (purple)mind are at the same level of effort and good videos
@PurpleMindCS 28 วันที่ผ่านมา ⁺²
Thank you! Much appreciated.
@walkergege2105 29 วันที่ผ่านมา ⁺²
What a dedication for this video, im definitely supporting you...
@marca9955 24 วันที่ผ่านมา ⁺³
Ingenious. Ingenious is the adjective.
@pagenotfound_code_404 27 วันที่ผ่านมา ⁺¹
correction at pseudocode line 3 change this line:
if n = 1: output x*2
into this line:
if n/2 = 1: output x*2
@RayHorn5128088056 หลายเดือนก่อน ⁺¹
Amazingly, in five decades of professionally writing code, i never needed to know any of this. Interestingly, low-level math works differently on numbers you can express using binary bits rather than strings of digits.
@Roxor128 29 วันที่ผ่านมา ⁺¹
The recursive splitting is something I came up with independently while trying to figure out how to build a multiplier that could be used to either do two n-bit multiplications in parallel or a single 2n-bit multiplication.
I ended up on that track after noticing that a lot of GPU specs on Wikipedia listed half-precision performance being double that of single-precision, and wondering how Nvidia and AMD did it.
That factor of two difference suggested the possibility of parallel operations, and while it was easy to figure out for addition, multiplication had me stumped for quite a while before I realised you could do a 2-digit multiplication as 4 single-digit ones with appropriate adding and scaling of the results, and that the same thing could be translated into circuitry by treating blocks of bits like digits for smaller multipliers.
I eventually came up with something in Logisim Evolution that would do a single 32-bit multiplication, or two 16-bit ones, or four 8-bit ones.
@adamlakhal4035 7 วันที่ผ่านมา ⁺¹
your channel is great! continue
@PurpleMindCS 7 วันที่ผ่านมา ⁺¹
Thanks so much!
@Morbazan125 29 วันที่ผ่านมา ⁺²
I’ve largely forgotten most of what I learned about math due to not having to do anything other than basic calculations for nearly 30 years😂
@baxtermullins1842 28 วันที่ผ่านมา ⁺¹
Interesting! In 1970, a couple of us in a radar lab developed code for a 16-bit, fixed point computer to multiply 2048 bit numbers as a lark. But it did allow for quad precision numbers. We used a straight-line assembly program with no recursive properties. Fast method!
@seheyt หลายเดือนก่อน ⁺¹⁰
1:44 allocating is not a neutral example of "accessing memory", and is usually several orders of magnitude more costly than the other examples of prototypical "operations"
@nurmr 29 วันที่ผ่านมา ⁺⁴
It depends on the memory allocator being used.
@KevinDay 29 วันที่ผ่านมา ⁺¹
But how long each operation takes doesn't affect algorithmic complexity. It matters for your actual real-world performance in the end, but it doesn't change the O notation.
@seheyt 28 วันที่ผ่านมา
@KevinDay, true, however, O(n) with the constant being 2 orders of magnitude bigger still makes them uncomparable. So, it matters if you can express and evaluate the the complexity of the compound operation in the underlying operations
@greenstonegecko หลายเดือนก่อน ⁺²
Time Complexity is such a difficult topic. This video doesn't do it justice, but it's an entire field of math that specializes in optimizing math.
It's weird how a field of math based on "being practical" could become so theoretical, it stops being practical...
@MrKahrum 27 วันที่ผ่านมา ⁺¹
next leap: irrational parsing for large solid state virtual storage, and prime factorization to maximize its utility.
@tnealclips 29 วันที่ผ่านมา ⁺²
That is a very surprising and interesting result. Great video
@PurpleMindCS 28 วันที่ผ่านมา ⁺²
Thanks! Glad you enjoyed.
@rafaelsantiagoaltoe6606 หลายเดือนก่อน ⁺³
Quick question: At 12:14, shouldn't it be from 1 to log2(N)? I don't think I can summarize well my line of thought, but I will give it a try.
Suppose we have two numbers of two digits (N = 2) being multiplied. To calculate it, we divide them into 3 subproblems of size N=1, so the total work would be 3 * c * 1. As the stated summation goes from K=0 to K = 1, there would be two steps (3 * c * 1 + c * 2), while in reality only one is actually made, at K = 0 we already have the result, so no work is done in that bit.
Well, this is my counter example. Now as for logically speaking, the algorithm's last step is summing up the calculated subproducts obtained from the step before, said last step is of size 3^1*N/2, after that we have the result and there is no step further to add for the work. If we made one more step, its work would be c * N, as if we had to do some calculations with the result.
I know it is just semantics, and in the end everything will be eaten up by the bigO notation, but I just want to make sure I got everything right.
@PurpleMindCS 29 วันที่ผ่านมา
The summation gives an upper bound. It's true that the work done at the base cases is very small compared to the linear work done to use the subproblems at all higher levels, but if you look for example at the tree depicted at 7:08 (which has the same height as the tree you would get from Karatsuba's algorithm), you'll notice that for n = 4, the height of the tree is 3 = log_2(4) + 1, so when we add up the work done at each level of the tree, we get a summation from 0 to log_2 n.
@BryndanMeyerholtTheRealDeal 25 วันที่ผ่านมา ⁺²
Wouldn't be more efficient if the numbers are in scientific notation, then adding the exponents while multiplying the mantissas, normalizing the number as needed?
@OneOfTheDips 23 วันที่ผ่านมา
That was my first thought tbh, I have no idea if it is practical in use
@beaconofwierd1883 22 วันที่ผ่านมา ⁺¹
Shouldn’t the Python native implementation be better than Karatsuba since it switches to regular ”hardware accelerated” multiplication for the sub problems aswell?
Or does it not do this? Feels like you should save a lot of time if it did, since you’re cutting n in half, meaning you go back to ”0 time complexity” very quickly.
@minebuilder1805 27 วันที่ผ่านมา ⁺²
Screw my math, I'll learn this now.
@akruijff 29 วันที่ผ่านมา ⁺¹
@3:30 It is like sorting. Qsort and mergesort have a complexity of O(n log n) but sleepsort has a complexity of O(n). The latter is a lot slower than the former.
@tristantheoofer2 หลายเดือนก่อน ⁺¹
honestly fire video, and your explanation is actually phenomenal for.. literally everything. im subbing
@PurpleMindCS 29 วันที่ผ่านมา ⁺¹
Thanks so much! Gad you liked it :)
@pablodelafuente4810 หลายเดือนก่อน ⁺²
This is beautifully explained. Congrats!
@PurpleMindCS 29 วันที่ผ่านมา ⁺¹
Thanks so much! Glad you enjoyed :)
@fiboot-oaken 26 วันที่ผ่านมา ⁺¹
This is a nice video, excpect the 'every 2min bait reminding to see the result in python'
@der.Schtefan หลายเดือนก่อน ⁺¹
This reminds me how Fast Fourier Transform magically is faster than a tradition approach.
@considerthehumbleworm หลายเดือนก่อน ⁺²
Fast adders are really interesting too!
@pulkitsingh6966 29 วันที่ผ่านมา ⁺³
amazing video! keep up the work
@PurpleMindCS 29 วันที่ผ่านมา ⁺¹
Thanks so much!
@fblua 28 วันที่ผ่านมา ⁺²
Thank you !! Great channel !! Please continue with those enricherous videos !!
Greetings from Buenos Aires, Argentina.
@PurpleMindCS 28 วันที่ผ่านมา ⁺²
Thanks so much! Glad you enjoy the content :)
@IvanToshkov หลายเดือนก่อน ⁺¹
5:45 Count Dracula! :D
Great video btw. I really enjoyed it.
@Gandi800 27 วันที่ผ่านมา ⁺¹
This is some true 2Blue1Brown quality. Great job and I hope your subscriber count takes off!
@PurpleMindCS 27 วันที่ผ่านมา
Thank you so much!
@supersmashbghemming6445 25 วันที่ผ่านมา ⁺¹
Great video! I think a cool idea for a similar video would be the brilliant deterministic k-select algorithm/median of medians! You'd do a great job explaining that I think
@PurpleMindCS 25 วันที่ผ่านมา ⁺²
I was actually already considering making a video on that... how funny!
@FishSticker หลายเดือนก่อน ⁺²
Good to see you're getting attention
@vlynn5695 หลายเดือนก่อน ⁺¹
Incredible Video!! You make Algorithmic Mathematics look like art!
@PurpleMindCS หลายเดือนก่อน
Thanks so much! Glad you enjoyed.
@unusedTV 28 วันที่ผ่านมา ⁺¹
Absolutely banger video, it's been a while since TH-cam recommended me something this good!
@PurpleMindCS 28 วันที่ผ่านมา ⁺¹
Thanks so much! Glad you liked it :)
@faith-chess 17 วันที่ผ่านมา ⁺¹
8:15 Every math Prof ever.
@blockshift758 หลายเดือนก่อน ⁺²
7:15 if i am understanding how this works it like separating each digits to their 10^n places? Like 1234 is broken down to 1000 200 30 4 and then individually multiplied to 5000 600 70 8?
@trueriver1950 หลายเดือนก่อน ⁺¹
For explanatory purposes yes.
In a computer the break down would be into powers of two rather than ten. (Exception below)
Or on some machines that are operated for 8bit manipulation the smallest growing might turn out to be in groups of 8 bits.
However, some machines also implement BCD arithmetic (or at least used to: for example the IBM 360 and 370 series, where fixed point integers of arbitrary length were stored in decimal, using 4 bits for each digit: hence the name "binary coded decimal").
Those machines could natively multiply arbitrarily long BCD integers in a single machine instruction, a non-interruptible instruction that typically took huge numbers of CPU cycles to complete.
These were much favoured in the COBOL era, as there's a direct mapping from the COBOL data types that don't tend to exist in science oriented languages.
Whether the microcode implemented the naïve algorithm or something better is an interesting question, which I never bothered to ask back in the day. Anyone with a preserved 370 is welcome to do the timing tests ...
@PurpleMindCS 29 วันที่ผ่านมา
Yep! @trueriver1950 does a good job explaining it here :)
@Tynach 17 วันที่ผ่านมา
Now do a video about how doing some of the operations in parallel can drastically reduce actual computation time, while not actually reducing complexity. That's how you can get an O(n²) algorithm to run in n*log(n) real time.
Examples include Wallace and Dadda multipliers.
@ericfielding668 หลายเดือนก่อน ⁺¹
This makes me more interested in division algorithms. I wrote my own division algorithm for arbitrarily large integers decades ago, but it was not efficient at all. It would have been quicker to subtract logarithms and then ant-log the result.
@Vasia787 28 วันที่ผ่านมา ⁺¹
9:25 bro, u can simplify the second expression to (10²c+d)(10²a+b)
„oh no, 10 to the power! Oh no, multiplication!“ - just add another function, That add n zeros to the end of the number
And now, -2×
@zikos3743 27 วันที่ผ่านมา
I think you missed the whole point. 10 to the power is not difficult to do, but the multiplication (10²c + d)(10²a + b) is exactly the think we want to compute in the first place (which is done in O(n)).
We don't necessarily want to get less × operations, especially if it is to use them on larger numbers.
The whole point of this is that instead of calculating a multiplication between 2 numbers of size 2n (giving us a complexity of O((2n)²) basically), we can use 3 multiplications on numbers of size n (giving us a complexity of 3×O(n²), so a bit faster).
By applying this principle recursively, each time dividing the size numbers to multiply by 2, we can get a better overall complexity, as explained in the video.
I hope this is a bit clearer now, it's easy to loose track of what we are trying to accomplish with those sorts of things
@coulie27 21 วันที่ผ่านมา
12:25 I have found this number (log2 3) = 1.585 related to Sierpinski triangles (and other fractals of that Hausdorff dimension) and the Collatz Conjecture. Here it is again! 🙌 Super neat
@Zyugo 24 วันที่ผ่านมา ⁺²
Let's get this to 10K likes
@michaelbauers8800 หลายเดือนก่อน ⁺³
Even if we implemented long multiplication in a CPU, I am surprised that would be O(N*N). Because all it has to do, is a max of N additions for N bits. And addition itself doesn't do addition in serial as that would be way too slow. It does fast addition, with look ahead carry, to parallelize addition. That being said, CPU's still don't use long multiplication I think, as there's faster methods like this video talks about. I just don't think a CPU doing shift adds in binary with fast adders is O(N*N). Feel free to correct me.
@ccpersonguy หลายเดือนก่อน ⁺⁴
Big-O notation describes how an algorithm behaves as N trends toward infinity. Modern CPUs do have fast multiplication and addition ***for specific finite-sized inputs*** (say, 64-bit integers). On arbitrarily large inputs, long multiplication is still O(N*N). Let's assume that some CPU can do a 64-bit multiply-shift-add in 1 clock cycle. Long multiplying 128-bit still requires 4 mult-shift-adds, 256-bit requires 16, etc. The inputs double in size, time complexity quadruples. Still O(N*N).
@framegrace1 หลายเดือนก่อน ⁺¹
They have lots of improvements reducing the constant time to the minimum, by using fast multipliers and such. But the complexity is still O(N*N)
@j3r3miasmg 28 วันที่ผ่านมา ⁺¹
Nice video that I'll use in the future to explain other people about karatsuba's method. The only thing I disagree a little bit is providing code through discord. It's almost like this is not a computer science channel if you do in that way just to marketing the discord channel. Congratz either way!
28 วันที่ผ่านมา ⁺¹
[question] At 20:02 it's mentioned that javascript is among the languages using Karatsuba's algorithm. I always assumed numbers in javascript are stored as 64-bit values which the CPU can multiple in constant time, and I also know that if you try to make numbers higher than MAX_SAFE_INTEGER you actually get gaps because everything is stored as floating point values. So, does this apply specifically to the BigInt class in javascript, or is something else happening with number primitives in javascript that I don't know about?
@PurpleMindCS 27 วันที่ผ่านมา ⁺¹
I believe it applies specifically to the BigInt class.
27 วันที่ผ่านมา ⁺¹
@@PurpleMindCS Thanks for the clarification! That makes sense to me. I would also suspect it depends on the runtime. I doubt the spec says anything about using Karatsuba's algorithm specifically because otherwise a further optimization would put a runtime out of spec which would kind of suck. It would be really interesting to see if there are differences here between alternative javascript runtimes or compilers.
@gianluca.g 25 วันที่ผ่านมา
Just one minor note, the cpu is multiplying numbers in constant time if you set your operation unit as the clock cycle. But the cpu itself uses some kind of algorithm baked as logic gates, so at the logic gate operation, the complexity is not constant
@TallinuTV 29 วันที่ผ่านมา ⁺¹
This was a really good explanation. Thank you!
@PurpleMindCS 29 วันที่ผ่านมา ⁺¹
Thanks so much! Glad you enjoyed :)
@axelk.1782 29 วันที่ผ่านมา
Wow, this video was a real eye opener! The first time that I heard about Karazuba's algorithm must have been in the late 80s or early 90s, and for all this time I have completely missed the point. Thank you so much! 🥰
@PurpleMindCS 29 วันที่ผ่านมา
Glad you enjoyed and learned something!
@pandorairq หลายเดือนก่อน ⁺²
What would interest me now is what performs better on the task of multiplying a large number with a small number lets say a number with more than 1000 digits with one with less than 2. Is the school way better or karatsubas method?
@framegrace1 หลายเดือนก่อน ⁺¹
It becomes an O(n) complexity problem, so use the faster for low values of n.
@PurpleMindCS 29 วันที่ผ่านมา ⁺¹
I believe that as @framegrace1 is indicating, the school way would be better for these types of inputs because it becomes an O(n) complexity problem and the extra additions and subtractions needed for Karatsuba's algorithm are expensive.
@markhelmick8084 29 วันที่ผ่านมา ⁺¹
Great video. May I suggest one on algorithms for finding primes? Or maybe a basic one for beginning students showing binary arithmetic? I've coded arithmetic libraries in assembly on 8 bit machines and it's cool to me how simple and elegant binary math works. I think every programming student would benefit from understanding this.
@PurpleMindCS 28 วันที่ผ่านมา ⁺¹
Thank you! Algorithms for finding primes is something I've been thinking about covering at some point, so it's great to see some interest for it.
@Pravin.Shidore 29 วันที่ผ่านมา ⁺²
No computer in the world can go beyond 18 digit of accuracy of the decimal.
@BesserWisserCode 27 วันที่ผ่านมา
What about __uint128?
@ProfRoxas 29 วันที่ผ่านมา ⁺¹
Now i can relate to the GameBoy's CPU for not having multiplication instruction in it, I wouldn't want this complexity for speed either
@FilmscoreMetaler 29 วันที่ผ่านมา ⁺¹
18:27 "And _this_ is where the record stands today." 🎵

ต่อไป

เล่นอัตโนมัติ

How AI Discovered a Faster Matrix Multiplication Algorithm