Source code of all of the techniques (for educational purposes): github.com/strager/big-oh-lines Corrections: 19:23 I show vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 12, 12, ...], But this is incorrect. The Vec items should be line numbers, not offsets. (The keys are the offsets!) So the Vec would actually look like this: vec![1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, ...]. (Thanks to @polyfoxgames9006 for pointing this out.) Pop quizzes: 04:24 average time complexity? answer at 18:34 09:42 another way to preprocess? answer at 19:09 12:23 why does binary search 2x? answer at 20:14 14:17 time complexity of SIMD algo? answer at 20:47 Lessons: 01:13 1: Try it! 03:28 2: Check your loop bounds 03:46 3: Stress test your algorithms 06:09 4: Big O is not the full story 07:42 5: O(n) ≠ O(L) 09:17 6: Preprocess your data 12:09 7: Big O is approximate 13:53 8: Big O is for big inputs 17:44 9: SIMD is awesome
I think that you should mention that big O notation really is really nothing more than a set of functions of a "certain shape", and that the shape of your expression measuring what you want (running time, space complexity, number of steps etc.) is in that set if it is asymptotically dominated by functions in that set. More importantly, I think it is quite harmful to bring up the notion of the average case runtime for a deterministic algorithm over a distribution of inputs as if it is says anything meaningful - since how do you define the input distribution, and what does it tell you? Nothing at all for a deterministic algorithm. You can flip the picture and consider an algorithm like the one you describe but where its input is considered a cyclical buffer (just start over when you reach the end...). Then you could have it pick an index uniformly at random and do average case analysis based on the random choice. This is a good example of both probabilistic analysis of running time and of the advantages of a randomized algorithm.
> I think that you should mention that big O notation really is really nothing more than Plenty of other videos explain it this way. I wanted to explain it a different way which I think is more intuitive. > I think it is quite harmful to bring up the notion of the average case runtime for a deterministic algorithm I brought up average time complexity after talking about best and worst case. I wanted to show that, given a best case of O(1) and worst case of O(n), average case is *not* necessarily 'in the middle' of best and worst. Average case time complexity was not O(n/2) or O(sqrt(n)) or something; it was just O(n).
@@strager_ But wait, O(n/2)=O(n). I think intuition is fine, but I think it's important to mention that that it is not the whole picture. Regarding your use of average case analysis, my point is that, sure, you can perform a thought experiment - but I think it is important to mention that it only has any merits when comparing algorithms w.r.t. a "meaningful" distribution and that it is very hard to define. My concern would be that viewers would be inclined to use uniform distributions over input spaces when they see it done this way when this is not mentioned. Anyways, take it for what it's worth. People seem to like your video, and so do I.
I'm primarily specialized in graphics programming & self taught. Got major ADD issues, and can't even remember the months cuz what's the point? Your formatting is perfect. It keeps me entertained, and you're good at speaking, all while teaching and representing ideas in simple ways (: sick af dawg, dis gnarly.
> I think it is important to mention that it only has any merits when comparing algorithms w.r.t. a "meaningful" distribution and that it is very hard to define. Agreed. I didn't make that nuance clear in my video.
Big O is not bad nor a lie, you just have to understand it. It does not tell you how fast an algorithm is. It only tells you how it scales with respect to its input size
Yea, it took me a moment to understand what he was saying in the screen shot, since it never dawned on me that anyone had ever believed the simplified (and therefore incorrect) "myth" as he was using it.
C'mon guys, everyone uses some way to lure people to watch their video. All in all, great video! A lot can be learned from this video, I think you'd be a great teacher if you aren't already.
The biggest thing to take away is that big O is not equivalent to the actual performance, it is an upper bound for the growth rate of an algorithm. When comparing algorithms look at the behavior when you double the amount of input.
@@tear728 I can't remember who it was but I recently saw a video where someone said they almost failed their Google interview because *the interviewer* didn't understand this. The question was something like "Algorithm A runs in O(n), algorithm B runs in O(logn). What's faster?" They correctly answered that B is better asymptotically but A might be faster for small inputs, and the interviewer insisted that was wrong.
This is like how they discovered a new “efficient” multiplication algorithm for large numbers but the constants are so huge it’s only faster once you’re multiplying numbers over 4.5 trillion bits long
Yup. Those "galactic algorithms", as theyre called, dont really have any direct real world use; they however are useful for the theortical part of computer science such as by showing that certain bounds can be broken, or by giving a framework of techniques by which practical algorithms can be newly developed or improved further. Theres also the chance that at one point well have to deal with such huge inputs, but as with abstract mathematics the usual way they become useful is by laying the groundworks.
@@kuhluhOG Yeah, but that falls under the "we might get to that amount of input data at some point" category, as we currently very much do not use even remotely the amount of input in any application to make a galactic algorithm faster; that is somewhat tautological since thats a part of the definition of what a galactic algorithm even is.
The worst algorithm I ever wrote had a complexity of 4^n. It worked instantly for anything less then 10, took 30 seconds for n=12 and when it hadn't finished after several minutes when I tried n=13, I decided to abort and look into writing a better algorithm.
As someone with a background in mathematics, this was all... kind of trivial to me. But I see misconceptions about big O notation all the time from programmers so I can imagine this being very much needed. It's really just about limiting behavior, the same tricks mathematicians use to tame infinities, divisions by zero, and all that. It feels absolutely absurd to say it... but the actual specifics of implementation, and real world performance metrics are FAR more important than a cool mathematical trick used for shenangians. Anyway, very good video, and a very needed one to try to push against the absurd hyperfocus on big O time complexity over more relevant discussions about overheads and actual implementation.
7:51 "[...] and, in fact, you have fewer lines than you have bytes" A file containing only always has one more line than bytes. In fact, an empty file has no bytes at all, but one (empty) line.
When I got taught big O at the university, many people (including me) failed to comprehend the actual meaning of it (some still don't). One could replace, like, 2 lectures worth of material with this one video. Keep up the good work!
at our uni we weren't even taught what big O actually means (what is explained in this video), instead we were taught to blindly follow all the myths...
The easiest way to understand big O in my opinion is to not just focus on the most significant term, but rather write out the entire time complexity, so instead of O(n^2) it'll be something like 4n^2+80n+20n*ln(n)+35. That kinda helps understand that the constants do matter and can overpower the n and which term is the most significant for a given n. Like O(n) isn't better than O(n^2) if O(n) is 8,000,000n and O(n^2) is n^2 unless n > 8,000,000. Like take this scenario, you have some counting function that relies on sorting and it's O(n*ln(n)), is it worth the effort in switching to a priority queue to get that O(n)? That depends on what you expect n to be. If this stuff will involve at most a thousand items, probably not unless there's some serious time critically going on (which generally there won't be), if it's going to involve billions then yeah.
Big O notation says that, given a function f and a function g, where f(n) is in O(g(n)), there exists some constants c and k such that f(n)≤c*g(n) for all n>k. I never found it hard to understand or formalize, but I can see why it seems hard to understand. In general, big O notation only matters when you have differing time complexities and large values of n. It tells you that, as n grows larger, a function that is not in O(g(n)) is going to necessarily be larger than a function that is in O(g(n)).
This channel is pure gold. Clear explanations, starting from simple problems and going to more sophisticated ones, step by step, while not focusing on little details for too long and constantly tossing in some new questions for viewer to think about. Thanks for your effort
It's important to keep in mind that Big O is for classifying the scalability of algorithms, not their performance in a given context. Big O analysis is not always a substitute for a benchmark.
The reason you don't need the log base is because you can convert any log to any other log using a constant factor - which is ignored in the O-notation. log_e(t) = log_2(t) / log_2(e) log_2(e) is a constant. In O-notation, log_2(t) and log_2(t)/C are the same, which means log_2(t) and log_e(t) are the same.
I met you on twitch and you even helped me with a terrible code that I wrote, I already knew you were like a genius but what you are doing on this youtube channel surpasses everything!
I'm a physics PhD student who has to code all the time despite having taking very few formal courses on it. I find your videos super helpful and have been learning a lot!
O notation might be for big data but different O complexities have different Ns that count as big. N=50000 may not matter for difference between O(N log N) and O(N log log N) but it is big enough to make a difference for O(N^2). I still have PTSD from times when I spent hours upon hours waiting for my Windows to update because someone in Microsoft deep down the history decided that number of updates will never be big and used some kind of bubblesort to sort them.
Actually, there are still some places in Windows like this, that someone have 10 minutes opening Start menu in some conditions. Or Explorer starting to slow down when doing something like deletion in a folder with thousands of files. (Explorer meaning desktop as well.)
You know, I think it's dumb they used bubble sort (at least use insertion sort!), but I really have to wonder how many updates you missed if bubble sort was enough to slow down the time to update significantly.
so basically programmers leave out constants that would tell them the full picture, and then they are suprised that it doesnt give them the full picture
You are not just a good instructor but also a good video creator/editor. Explanation: 10/10, Video Quality for creating graphs and all : 10/10, Depth of topic: 10/10. AMAZING...
Didn't learn anything new about Big O, though the reminder is nice. What I did learn, though, was how easy it actually is to use SIMD, at least for simple applications like this. I've tried looking into it in the past and just got super confused. I have to say, a video specifically on SIMD would be awesome.
> a video specifically on SIMD would be awesome. My previous video on hash tables also contained some SIMD stuff. That hash table SIMD was more advanced, but I didn't walk through the code. th-cam.com/video/DMQ_HcNSOAI/w-d-xo.html I do want to share more SIMD knowledge in future videos. 👍
@@treelibrarian7618 I was talking more about the structured approach to lanes and its ops over intrinsic instructions on small buffers, plus the nice Into/Froms. The names are funny though, what do you think about the swizzle trait? :P
@@strager_ I would love to see it, something I'm very interested in learning more about
ปีที่แล้ว +30
Great explanation! One aspect that isn't talked about much is that due to laws of physics the actual *random* access time (in seconds) to a pre-computed table of size n is O(sqrt(n)), not O(1). As a result there are algorithms that are actually slower in real time for large inputs even though they are predicted to be faster by big-O. There's a great article called "The myth of RAM" that explains it deeply.
In modern computers, are cache effects what you mean by sqrt(n)? The simplicity of RAM (Random Access Machine) which we normally use for complexity analysis is indeed deceiving. My case study didn't lend itself to an explanation of RAM (or CPU caches), so I kept it simple and focused on other issues with time complexity.
ปีที่แล้ว +5
@@strager_ yes, it's cache but the article I mentioned explains it can't be fixed by making cache bigger because of quantum physics (maximum amount of information in black hole is proportional to its surface area and nothing can be denser). And yes, I understand that this is introductory video so advanced stuff like this wouldn't fit in there.
If I understood that "great article" correct (I only had a quick view on it) the reason for that O(sqrt(n)) is that memory (RAM) is typically flat - so the double amount of memory needs the double area. If these area is a square when doubling the memory causes that the width of this square is increasing by factor SQRT(2) - and therefore the average physical distance the information has to "travel" is proportional to SQRT(2) of the size of the memory. As information could not travel faster than the speed of light that really would mean that the access to memory is getting slower the larger the memory is.... On the other side transistor density on chips is still increasing - double memory therefore not mean that the double area is needed - that might will no longer true in the future but for the next years O(1) makes more sense than O(SQRT(n)) for memory access.
ปีที่แล้ว
@@elkeospert9188 almost, circle is a better representation but when we use big-O we don't write constants, so we drop Pi. The article has other parts, where the author explains that quantum physics dictates it will never be O(1) and stay O(sqrt(n)). Increased density of transistors helps only by a constant. If you need to process 100PB of data, you can't fit it into RAM and will need to use other storage which will be slower.
@@strager_ Even ignoring physics fundamentals, if you're using virtual memory the pagetable lookups are O(ln n) so the O(1) claim for lookup tables was always suspect. Of course caches - incl TLB - mitigate this in practice, so long as you fit within it.
Obligatory comment noting that in the manually unrolled loop, the code may reach outside of the array bounds, resulting in a crash, if you're lucky. This is still one of the best complexity explanations I've seen.
> the code may reach outside of the array bounds, resulting in a crash Yes, that is true. I didn't explain this in the video, but when generating the table, I added 7 dummy elements to prevent out-of-bounds issues. github.com/strager/big-oh-lines/blob/537e6a048d10c89ee6ea421c48c4ecc7229d2d01/bol_linelinear8/src/lib.rs#L24-L26
@@strager_ alternatively you can just modulus the SIMD size, and do the non-SIMD naive stuff on the remainder, if you don't want to do array resizing nonsense
I know this was just educational, I know this was supposed to showcase Big O and different approaches to programming. BUT if I dont say this, I'll lose sleep over it so here goes. You can just increment a variable by 1 everytime you encounter a newline while compiling. So when it encounters an error, you have the line number ready. Time complexity O( time to reach error ), Space Complexity O(1). You are designing a compiler here, I'm sure you can add a line or two to the code can't you?
You would need to store that line number somewhere. Storing line+offset or line+column uses more memory than just storing the byte offset. That memory consumption is multiplied by every AST node, so it's actually O(n) space usage. Also, the line tables I describe in the video are handy for implementing LSP.
my favorite "non linear" algorithm is manipulations within the Disjoint-set data structure - it has an average and worst case of O(a(n)) where a is the inverse Ackermann function -- functionally less than 5 for any number you can write.
Wow that's amazing hahaha. At that point it may as well be constant time. But no, an infinite input size will have an infinite lookup time. Incredible.
Hey while we are traversing through the string, as soon as we hit a char we can store a counter which increments for every line and if we want to show the line number just use the counter variable which will not take any time
definetly possible, but usually in compilers there is a single pre processing step before the representation of the code changes. Compile time errors, as far as I know, are usually not found until the second step of compilation
> we can store a counter which increments for every line That is true, but in the real world it's usually more complicated than that. Context: A compiler in practice does not always report errors immediately. It might need to parse some code then report an error in the past. (For example, Rust cannot report a use-of-undeclared-function error until it reads the entire file because the function might be declared after the call.) Therefore the compiler needs to attach location information to each token and AST node. In addition to line information, a compiler also wants to report column information. The compiler *could* attach two numbers (line and column) to each token and AST node. But some IDEs want byte offsets, so we might need to store three numbers (line, column, and offset from beginning). That's a non-trivial amount of data to add to *every* token and AST node, so memory usage goes up. To reduce memory usage, a compiler could attach only *one* number (offset) to each token and AST node, and compute the line number and column number as needed. (Or not compute anything if the IDE just needs a byte offset.) Hence the table-based solutions discussed in the video. But that's for a traditional batch compiler. Nowadays compilers are integrated into editors. Compilers are given *updates*. LSP (Language Server Protocol) updates communicate line and column information *into* the compiler. Having the line table data structure makes this operation much cheaper. I didn't discuss any of this in the video because I didn't want to bore people who aren't interested in compiler frontends. But it's a good discussion. =]
Fantastic presentation and explanation! Although I watched you put this together, seeing the final version was even more engaging than I expected. Great work and thank you!
I really like how this video explains how better big O doesn't necessarily mean better performance for data one's working with, while at the same time it doesn't overdramatise things. As video states, big O describes the general shape of performance curve with increased data, but it isn't be all and end all. And as SIMD example shows, even if it has the exact same big O parameters as line table, it makes up for it with what I call "small c", i.e. those "insignificant" constant factors like the slope. Personally, my intuition is to prefer structures and algorithms with better big O parameters for key operations - the idea being that for most scenarios either the data is so small extra constant costs are negligible, or the data is so big that better big O matters. But like most things, it's not an absolute rule and in the end, it takes real life performance testing to get the most complete picture.
Having started ThePrimeagen’s course on Algorithms & Data Structures, this was a a pleasant addition for me. . I like that you did a pop quiz ‘before’ the material. This follows the research of learning effectively as described by Make it Stick and other books. . 👏👏👏 This week I learned ways to optimize if-statements in Bash Scripts for a CPU prefetcher-unit’s performance, and about misprediction penalties for the branch-priority features. . And I’ve only started learning to script/code/program for a couple months. . I’m so grateful for the efforts being out into maintaining and contributing via the internet so I could access all this! Whoo Hoo! 🤠
Wow, I’m actually planning on optimizing a batch processing API for tomorrow and I came across this video. Learned new insights that will help me make decisions on optimization. Thank you for this! 🙌
Thanks for this Amazing Video!! I am a fan of your channel, and love the kind of content you have been putting out recently Its really good to see you uploading more often, cant wait to see what you share with us next!
Excellent video! I found it to be perfectly balanced in terms of explanation, providing just the right amount of clarity without becoming overly lengthy or tedious.
I remember there was someone who also made a video about how they were frowned upon by an interviewer for mentioning that Big O is just about how algorithms scales and not comparing how they perform in all cases. It shocked me because it seems almost common knowledge.
the base of the logarithm doesn't matter because it is just a multiple and we are already omitting coefficient! also as a mathematician I would have loved to see Landau's Big-O notation in its actual form. we say that f(x) = O( g(x) ) as x -> ∞ if there exists an M > 0 and an N > 0 such that if n > N then f(x)
In my experience Big O is very rarely a useful metric in web frontend code, since frontend deals mostly with interactivity the number of "things" is relatively small and low latency and minimising network traffic is almost always more important than number crunching efficiency. In fact for almost all DOM related things; if the number of objects is even barely approaching a size large enough for Big O to be meaningful; the DOM tree is probably so large it will kill most mobile browsers for excessive memory use. Also since your code runs on any random client you don't control how fast it's hardware is so most number crunching is better done on the server (in which case it's the backend developers problem 😉)
These are some good informational videos. I really like the fast paced nature of them, focused on the question at hand. I feel I can learn in 20 mins what would take me a few hours of watching some other videos.
Thank you for the helpful information learning these subject alone sometimes feel depress but watching contents like this gives me hope thank you again!!
Big O is meant for describing limiting behaviour as it reaches infinity. We never use infinite elements. So there are plenty of algorithms that could have worse Big O notation and be faster because we use fewer elements. I personally never liked Big O notation in Computer Science because it often detracts from practical algorithms. Nice presentation. On computers, it's really difficult to know what will speed something up sometimes :)
Big O is also meant to describe worst case scenarios. Maybe the worst case is vanishingly rare for your particular use case, so optimizing for Big O might not lead to any speed improvements. Context matters! At best, Big O can be used as a guidance, but should never replace real analysis and measurements on actual real world data. You lose so much information about an algorithm by trying to describe it with just a few symbols.
@@shiinondogewalker2809 people who complain about big O's practical value often have no understanding what it means. Big O doesn't need anything in your program to go to infinity. The information it gives you becomes true for input size above N, what N is, depends on your formula. Could be 10, could be 10^10. People who think "it only matters for infinite value" remind me of debacle at Microsoft, some "practical programmers" who thought the number of updates will never be "sufficiently large" so they used the wrong sort algorithm leading to millions of people waiting hours, days for windows updates 😂
@@oscarfriberg7661 big O tells you the upper bound, while big Omega telling you the lower bound. So any argument against big O citing worst case scenario could be reverted into argument for big Omega, now you optimize for lower bound, for large input your algorithm will always be at least as good as the value you optimize for :)) And to many people big O and big Omega and big theta are all "Big O notations", so when they spout "Big O are useless" they mean any asymptotic notations.
"Big O notation in Computer Science because it often detracts from practical algorithms" Big O detracts from practical algorithms the same way the ability to count detracts from practical algorithm There are two stages when you design an algorithm the theory stage, where you use reasonings in discrete mathematics to derive the right computation then formally prove the time/space complexity (if you skip this stage it's because some smart person already did it for you and wrote it in a textbooks/papers somewhere then other people repeat it into wisdom and eventually it reaches you without you having to open a textbook). And the implementation stage, where you take into account what the compiler does, what the CPU does. Big O doesn't care about algorithm or its implementation, it cares about functions f(n), g(n) on (typically) natural numbers. It is a mathematical concept, much like counting, both deal with natural numbers, and both are used to estimate. Now guess which stage are big O and counting more often used in :)) Probably stage 1, but it does help in stage 2 too, it gives you a direction on which practical things to try when you don't have all the technical details of your machine, what kind of optimization is done on your CPU etc. because in practice, you don't start out knowing all these technical details. If you already knew like in this video, what kind of sampling distribution your input data have, how fast your algorithm runs on this specific CPU, then why would you go back to look at the more general idea which makes no such assumption about your system? if the aim is to disprove the theory using practical implementation I suggest at least understand the theory first. e.g. big O never implies O(log N) is faster than O(N^2), that's stupid, an algorithm in O(1) is also in O(N^2)... Or of course you could spend time running your algorithm on billion different input sizes, that's very practical too :)) Now do I need someone who is good at discrete math and algorithm on my team? no I don't. Do I want someone who doesn't understand the basics such as what big O notations tell you... absolutely not, that's a recipe for disaster.
Title is a bit clickbaity but I really appreciate your channel. Not a lot of people go to the lengths you go to to help viewers(/readers) really see the thought process and the knowledge that contributes to it. These videos, like the best documentation and code comments, makes both 100% clear.
I am so glad I found you from your hash table video. Concise, informative content, with proper suggestions and pop quizzes to keep the viewer entertained. What more could you ever want?!
very important video. I've seen a lot of novice programmers misusing big O notation... people like to put multipliers inside thinking it means anything, but O(1 billionth * n) really is the same as O(1 billion n). I think you could also have mentioned that big O is merely an upper bound. this means anything that is O(n) is also O(n³), and anything that is O(log n) is also O(n!).
> I think you could also have mentioned that big O is merely an upper bound. this means anything that is O(n) is also O(n³), and anything that is O(log n) is also O(n!). I think this confuses engineers more than it helps. In the world of computer science this matters, but not so much when we're talking about code.
@@strager_ yeah, I guess. coming from a cs background myself, I'm a bit too used to big os, little os, big omegas, little omegas... and theta. which is what most people think big O is.
Really good video! I found this explanation of big o a little unconventional but good. I especially enjoyed your visuals and your dive into SIMD at the end. Thanks!
Normally don't comment. Normally find these types of videos boring. Im commenting, i love these videos, i loved the other optimization video you did. In my opinion you should do more of these style, with the same quality, no need to rush to have video quantity. Good job man. 👍
Extremely informative and intuitive presentation. Thank you! :) However, I'll admit I'm not as familiar with SIMD stuff, so the speed at which that was moved through left me with some questions. Will probably look into that on my own, but I would love to see something on that if it is a topic you'd want to cover in the future too!
All software (algorithms) perform transformations on data structures: 1. Choose the correct data structure to distill your data (eg vectors vs hash tables). 2. Choose the correct algorithms to operate on your data (eg. for-loop vs binary search). 3. (Advanced) Note which ones work best toward your design goals.
Amazing dude.... how do you make those graphics? In case you use Adobe Premiere or any other editing software for it... what software do you recommend for drawing graphics like it?
People misunderstand Big O. It's *ONLY* supposed to tell you how it scales with more data, not how performant it is. I love the video, it really shows the reality.
8:34 You could also be much clever and notice that *the naïve algorithm is doing a part of the job of finding indexes you need to build the table used in the line table algorithm,* so you can *combine both naïve search loop with table building* in a new smarter algorithm. It's basically relying on a table as much as it is known until it is not, then rely on naïve, so the first run is as fast as naïve but the subsequent runs are faster as they don't re-compute what the table keep, building the table only as needed, never more than needed. It's basic bracket memoization. Nothing fancy.
You are one of the best computer science educators ive seen. Ive been self teaching CS, being a former mechanical engineer. So far this video, ans your hash table video have filled in large gaps in understanding concretely, as ive dealved deeper into the low level fundamentals
big O is a theoretical bound that nulls out the constants. You should take big O with a grain of salt and learn the trade offs. For example, bubble sort is actually faster than quick sort for low N values. This is due to the overhead of the constants (constants caused by recursive function calls mostly). For a sufficiently large N, the time saved by the algorithm itself overcomes the time lost by doing the function overhead, but as N get's smaller, you lose those time savings since the time saved by the algorithm is diminished. There is also worst case situations that come into play. It's all about what you know about your situation. In general big O is a great starting place, but context of your data ALWAYS matters.
Really nice video! Maybe one thing I would've added is how we could switch to a different algorithm based on the offset. Like "if (offset is very large) binarySearch() else simdLineTable()"
Wanna hear the best thing? It's Levin's universal search. It can solve any computable problem in an optimal time complexity. The only downside? The universal search adds a constant element which scales exponentially with the length of the optimal program. If the optimal program is L bits long and achieves O(f(n)) time complexity, Levin's universal search will achieve O(f(n)) time complexity, but with an added 2^L cycle delay. This delay quickly gets absurd, so the universal search has no practical application.
9:10 - Worth mentioning that theoretically (in context) you're already scanning the file once to work out that you need the line number, if your tokeniser is also calculating line numbers then you'd have to take that into account. You would be sacrificing the case where you didn't need any line numbers as you'd be spending some small amount of time to calculate where lines are even if you never needed them. Also, you would be making your code more complex however so make sure that the extra complexity is worth it. 14:00 - The somewhat tongue-in-cheek saying is "Everything is fast when N is small". Of course, the real meta-lesson here is that testing performance on a small set tells you very little.
> if your tokeniser is also calculating line numbers If you use the data structure from the video, the tokenizer would *not* also calculate line numbers. > the real meta-lesson here is that testing performance on a small set tells you very little. Huh? It tells you how your code performs with small data sets! In compilers, small files (
I have a violent fever and haven't been able to open my mouth or eat in three days. This was a great mental distraction and I definitely learned some things, I even got one of the pop quiz questions!
@@strager_ yeah, it's pretty common in generic sorting algorithms, that can't make any assumptions about the data, but should perform as well as possible in all cases regardless. for the line numbers, i would prefer the simplicity of only using a single algorithm. we're on the low end of the latency spectrum already. taking 50 ns vs 25 ns to compute line numbers for < 1000 errors doesn't really matter. (it's different for the sorting algorithm, cause some user may want to sort millions of tiny arrays, and that should still work.)
It's worth it with almost every divide and conquer-style algorithm due to the constant factors that O(n) notation doesn't show and the way caches improve performance on small datasets. But you don't just pick one algorithm at the start, instead you keep dividing the problem into smaller and smaller chunks, and you don't stop at the base case of 1 element. You stop earlier, once you reach some minimum number of elements that's better handled by the naive algorithm. Then you use the naive algorithm on those small chunks, and let the main algorithm make the higher-level decisions.
@@strager_ Another idea is to "optimize" binary search by checking inside the binary search function if the search range contains less than (8 or 10 or something like that) elements and if so do a linear search. On the other side. Calculating line numbers in small files is anyway "fast enough" - even if the normal binary search is a little bit slower than linear search for small amount of lines it does not matter.
Source code of all of the techniques (for educational purposes):
github.com/strager/big-oh-lines
Corrections:
19:23 I show vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 12, 12, ...], But this is incorrect. The Vec items should be line numbers, not offsets. (The keys are the offsets!) So the Vec would actually look like this: vec![1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, ...]. (Thanks to @polyfoxgames9006 for pointing this out.)
Pop quizzes:
04:24 average time complexity? answer at 18:34
09:42 another way to preprocess? answer at 19:09
12:23 why does binary search 2x? answer at 20:14
14:17 time complexity of SIMD algo? answer at 20:47
Lessons:
01:13 1: Try it!
03:28 2: Check your loop bounds
03:46 3: Stress test your algorithms
06:09 4: Big O is not the full story
07:42 5: O(n) ≠ O(L)
09:17 6: Preprocess your data
12:09 7: Big O is approximate
13:53 8: Big O is for big inputs
17:44 9: SIMD is awesome
I think that you should mention that big O notation really is really nothing more than a set of functions of a "certain shape", and that the shape of your expression measuring what you want (running time, space complexity, number of steps etc.) is in that set if it is asymptotically dominated by functions in that set.
More importantly, I think it is quite harmful to bring up the notion of the average case runtime for a deterministic algorithm over a distribution of inputs as if it is says anything meaningful - since how do you define the input distribution, and what does it tell you? Nothing at all for a deterministic algorithm.
You can flip the picture and consider an algorithm like the one you describe but where its input is considered a cyclical buffer (just start over when you reach the end...). Then you could have it pick an index uniformly at random and do average case analysis based on the random choice. This is a good example of both probabilistic analysis of running time and of the advantages of a randomized algorithm.
> I think that you should mention that big O notation really is really nothing more than
Plenty of other videos explain it this way. I wanted to explain it a different way which I think is more intuitive.
> I think it is quite harmful to bring up the notion of the average case runtime for a deterministic algorithm
I brought up average time complexity after talking about best and worst case. I wanted to show that, given a best case of O(1) and worst case of O(n), average case is *not* necessarily 'in the middle' of best and worst. Average case time complexity was not O(n/2) or O(sqrt(n)) or something; it was just O(n).
@@strager_ But wait, O(n/2)=O(n). I think intuition is fine, but I think it's important to mention that that it is not the whole picture.
Regarding your use of average case analysis, my point is that, sure, you can perform a thought experiment - but I think it is important to mention that it only has any merits when comparing algorithms w.r.t. a "meaningful" distribution and that it is very hard to define. My concern would be that viewers would be inclined to use uniform distributions over input spaces when they see it done this way when this is not mentioned. Anyways, take it for what it's worth. People seem to like your video, and so do I.
I'm primarily specialized in graphics programming & self taught. Got major ADD issues, and can't even remember the months cuz what's the point?
Your formatting is perfect. It keeps me entertained, and you're good at speaking, all while teaching and representing ideas in simple ways (: sick af dawg, dis gnarly.
> I think it is important to mention that it only has any merits when comparing algorithms w.r.t. a "meaningful" distribution and that it is very hard to define.
Agreed. I didn't make that nuance clear in my video.
Big O is not bad nor a lie, you just have to understand it. It does not tell you how fast an algorithm is. It only tells you how it scales with respect to its input size
Yea, it took me a moment to understand what he was saying in the screen shot, since it never dawned on me that anyone had ever believed the simplified (and therefore incorrect) "myth" as he was using it.
The whole video is a strawman take on Big O. Debunking and correcting claims nobody actually believes, and if they do, they are misunderstanding it.
this video proves that youtube is full of people that teaches from ignorance.
C'mon guys, everyone uses some way to lure people to watch their video.
All in all, great video! A lot can be learned from this video, I think you'd be a great teacher if you aren't already.
@@Meneer456 yeah, why people so toxic and calling him strawman huh? bad day? Why dont you get in front of a camera, lets see how that goes.
The biggest thing to take away is that big O is not equivalent to the actual performance, it is an upper bound for the growth rate of an algorithm. When comparing algorithms look at the behavior when you double the amount of input.
Usually it becomes way more obvious if you decuple it instead of just doubling it 😅.
yea everyone seems to forget this
Isn't that the entire point of big O lol how could anyone forget this
@@tear728 they never actually learn what it is
@@tear728 I can't remember who it was but I recently saw a video where someone said they almost failed their Google interview because *the interviewer* didn't understand this. The question was something like "Algorithm A runs in O(n), algorithm B runs in O(logn). What's faster?" They correctly answered that B is better asymptotically but A might be faster for small inputs, and the interviewer insisted that was wrong.
This is like how they discovered a new “efficient” multiplication algorithm for large numbers but the constants are so huge it’s only faster once you’re multiplying numbers over 4.5 trillion bits long
Yup! Luckily, I haven't needed to deal with data sizes that big. 😅
Yup. Those "galactic algorithms", as theyre called, dont really have any direct real world use; they however are useful for the theortical part of computer science such as by showing that certain bounds can be broken, or by giving a framework of techniques by which practical algorithms can be newly developed or improved further. Theres also the chance that at one point well have to deal with such huge inputs, but as with abstract mathematics the usual way they become useful is by laying the groundworks.
@@RepChris probably if you want to have highly accurate (as accurate as Quantum Physics let's you be) Physics simulations, these can be useful
@@kuhluhOG Yeah, but that falls under the "we might get to that amount of input data at some point" category, as we currently very much do not use even remotely the amount of input in any application to make a galactic algorithm faster; that is somewhat tautological since thats a part of the definition of what a galactic algorithm even is.
@@RepChris it was more of a comment on your sentence "Theres also the chance that at one point well have to deal with such huge inputs"
The worst algorithm I ever wrote had a complexity of 4^n. It worked instantly for anything less then 10, took 30 seconds for n=12 and when it hadn't finished after several minutes when I tried n=13, I decided to abort and look into writing a better algorithm.
As someone with a background in mathematics, this was all... kind of trivial to me. But I see misconceptions about big O notation all the time from programmers so I can imagine this being very much needed. It's really just about limiting behavior, the same tricks mathematicians use to tame infinities, divisions by zero, and all that. It feels absolutely absurd to say it... but the actual specifics of implementation, and real world performance metrics are FAR more important than a cool mathematical trick used for shenangians. Anyway, very good video, and a very needed one to try to push against the absurd hyperfocus on big O time complexity over more relevant discussions about overheads and actual implementation.
This is one of the greatest programming channels, keep up the good work!
The greatest
Every person I've seen using comic sans in a presentation is an absolute beast. Like you have to earn your right to use it.
@wernersmidt3298 It's an honor to be compared to Simon Peyton Jones!
@@strager_ So... are you gonna make a Monad tutorial next? 🙃
7:51 "[...] and, in fact, you have fewer lines than you have bytes"
A file containing only
always has one more line than bytes. In fact, an empty file has no bytes at all, but one (empty) line.
Factual.
When I got taught big O at the university, many people (including me) failed to comprehend the actual meaning of it (some still don't). One could replace, like, 2 lectures worth of material with this one video. Keep up the good work!
Thanks!
I personally always found O notation one of the easiest things in theoretical computer science.
at our uni we weren't even taught what big O actually means (what is explained in this video), instead we were taught to blindly follow all the myths...
The easiest way to understand big O in my opinion is to not just focus on the most significant term, but rather write out the entire time complexity, so instead of O(n^2) it'll be something like 4n^2+80n+20n*ln(n)+35. That kinda helps understand that the constants do matter and can overpower the n and which term is the most significant for a given n. Like O(n) isn't better than O(n^2) if O(n) is 8,000,000n and O(n^2) is n^2 unless n > 8,000,000.
Like take this scenario, you have some counting function that relies on sorting and it's O(n*ln(n)), is it worth the effort in switching to a priority queue to get that O(n)? That depends on what you expect n to be. If this stuff will involve at most a thousand items, probably not unless there's some serious time critically going on (which generally there won't be), if it's going to involve billions then yeah.
Big O notation says that, given a function f and a function g, where f(n) is in O(g(n)), there exists some constants c and k such that f(n)≤c*g(n) for all n>k. I never found it hard to understand or formalize, but I can see why it seems hard to understand. In general, big O notation only matters when you have differing time complexities and large values of n. It tells you that, as n grows larger, a function that is not in O(g(n)) is going to necessarily be larger than a function that is in O(g(n)).
One of, if not the best big O notation video out there. Bravo and thank you!
This channel is pure gold. Clear explanations, starting from simple problems and going to more sophisticated ones, step by step, while not focusing on little details for too long and constantly tossing in some new questions for viewer to think about. Thanks for your effort
And zero content fluff!
It's important to keep in mind that Big O is for classifying the scalability of algorithms, not their performance in a given context. Big O analysis is not always a substitute for a benchmark.
The reason you don't need the log base is because you can convert any log to any other log using a constant factor - which is ignored in the O-notation.
log_e(t) = log_2(t) / log_2(e)
log_2(e) is a constant. In O-notation, log_2(t) and log_2(t)/C are the same, which means log_2(t) and log_e(t) are the same.
Mate, you've got the nack for educating. I'm loving the channel.
I met you on twitch and you even helped me with a terrible code that I wrote, I already knew you were like a genius but what you are doing on this youtube channel surpasses everything!
I'm a physics PhD student who has to code all the time despite having taking very few formal courses on it. I find your videos super helpful and have been learning a lot!
Let me guess, fellow HEP enjoyer?
@@43chord cosmology :)
@@fireballs619 what do you do in cosmology?
@@mastershooter64 CMB analysis mostly
O notation might be for big data but different O complexities have different Ns that count as big. N=50000 may not matter for difference between O(N log N) and O(N log log N) but it is big enough to make a difference for O(N^2).
I still have PTSD from times when I spent hours upon hours waiting for my Windows to update because someone in Microsoft deep down the history decided that number of updates will never be big and used some kind of bubblesort to sort them.
> different Ns that count as big
Agreed! But the notation doesn't give you *any* clue where this point might be.
Actually, there are still some places in Windows like this, that someone have 10 minutes opening Start menu in some conditions. Or Explorer starting to slow down when doing something like deletion in a folder with thousands of files. (Explorer meaning desktop as well.)
You know, I think it's dumb they used bubble sort (at least use insertion sort!), but I really have to wonder how many updates you missed if bubble sort was enough to slow down the time to update significantly.
This was an absolute treat to find in my notifications after work. Thank you
so basically programmers leave out constants that would tell them the full picture, and then they are suprised that it doesnt give them the full picture
Yup!
You are not just a good instructor but also a good video creator/editor.
Explanation: 10/10,
Video Quality for creating graphs and all : 10/10,
Depth of topic: 10/10.
AMAZING...
Didn't learn anything new about Big O, though the reminder is nice. What I did learn, though, was how easy it actually is to use SIMD, at least for simple applications like this.
I've tried looking into it in the past and just got super confused.
I have to say, a video specifically on SIMD would be awesome.
> a video specifically on SIMD would be awesome.
My previous video on hash tables also contained some SIMD stuff. That hash table SIMD was more advanced, but I didn't walk through the code. th-cam.com/video/DMQ_HcNSOAI/w-d-xo.html
I do want to share more SIMD knowledge in future videos. 👍
The portable SIMD struct that nightly rust has is awesome.
@@Dorumin yeah, intel should've named the vbroadcast intructions "splat" instead lol
@@treelibrarian7618 I was talking more about the structured approach to lanes and its ops over intrinsic instructions on small buffers, plus the nice Into/Froms. The names are funny though, what do you think about the swizzle trait? :P
@@strager_ I would love to see it, something I'm very interested in learning more about
Great explanation!
One aspect that isn't talked about much is that due to laws of physics the actual *random* access time (in seconds) to a pre-computed table of size n is O(sqrt(n)), not O(1). As a result there are algorithms that are actually slower in real time for large inputs even though they are predicted to be faster by big-O.
There's a great article called "The myth of RAM" that explains it deeply.
In modern computers, are cache effects what you mean by sqrt(n)?
The simplicity of RAM (Random Access Machine) which we normally use for complexity analysis is indeed deceiving. My case study didn't lend itself to an explanation of RAM (or CPU caches), so I kept it simple and focused on other issues with time complexity.
@@strager_ yes, it's cache but the article I mentioned explains it can't be fixed by making cache bigger because of quantum physics (maximum amount of information in black hole is proportional to its surface area and nothing can be denser).
And yes, I understand that this is introductory video so advanced stuff like this wouldn't fit in there.
If I understood that "great article" correct (I only had a quick view on it) the reason for that O(sqrt(n)) is that memory (RAM) is typically flat - so the double amount of memory needs the double area. If these area is a square when doubling the memory causes that the width of this square is increasing by factor SQRT(2) - and therefore the average physical distance the information has to "travel" is proportional to SQRT(2) of the size of the memory.
As information could not travel faster than the speed of light that really would mean that the access to memory is getting slower the larger the memory is....
On the other side transistor density on chips is still increasing - double memory therefore not mean that the double area is needed - that might will no longer true in the future but for the next years O(1) makes more sense than O(SQRT(n)) for memory access.
@@elkeospert9188 almost, circle is a better representation but when we use big-O we don't write constants, so we drop Pi.
The article has other parts, where the author explains that quantum physics dictates it will never be O(1) and stay O(sqrt(n)). Increased density of transistors helps only by a constant. If you need to process 100PB of data, you can't fit it into RAM and will need to use other storage which will be slower.
@@strager_ Even ignoring physics fundamentals, if you're using virtual memory the pagetable lookups are O(ln n) so the O(1) claim for lookup tables was always suspect. Of course caches - incl TLB - mitigate this in practice, so long as you fit within it.
I didn't know there were misunderstandings about this topic. I guess my uni did a good job (thanks Prof. Schöning
Loved your ending! Take that, Rust Foundation! :) Thank you for the Big O lesson as well.
Obligatory comment noting that in the manually unrolled loop, the code may reach outside of the array bounds, resulting in a crash, if you're lucky. This is still one of the best complexity explanations I've seen.
> the code may reach outside of the array bounds, resulting in a crash
Yes, that is true. I didn't explain this in the video, but when generating the table, I added 7 dummy elements to prevent out-of-bounds issues. github.com/strager/big-oh-lines/blob/537e6a048d10c89ee6ea421c48c4ecc7229d2d01/bol_linelinear8/src/lib.rs#L24-L26
@@strager_ alternatively you can just modulus the SIMD size, and do the non-SIMD naive stuff on the remainder, if you don't want to do array resizing nonsense
I know this was just educational, I know this was supposed to showcase Big O and different approaches to programming. BUT if I dont say this, I'll lose sleep over it so here goes. You can just increment a variable by 1 everytime you encounter a newline while compiling. So when it encounters an error, you have the line number ready. Time complexity O( time to reach error ), Space Complexity O(1). You are designing a compiler here, I'm sure you can add a line or two to the code can't you?
You would need to store that line number somewhere. Storing line+offset or line+column uses more memory than just storing the byte offset. That memory consumption is multiplied by every AST node, so it's actually O(n) space usage.
Also, the line tables I describe in the video are handy for implementing LSP.
my favorite "non linear" algorithm is manipulations within the Disjoint-set data structure - it has an average and worst case of O(a(n)) where a is the inverse Ackermann function -- functionally less than 5 for any number you can write.
Wow that's amazing hahaha. At that point it may as well be constant time. But no, an infinite input size will have an infinite lookup time. Incredible.
I think they use the inverse ackermann function for amortized analysis which basically just treats it as a constant :)
This video is an excellent explanation of Big O notation and what it really means for our algorithms. Great stuff here!
Best big O notation i ever watched, thank you very much. ❤
Hey while we are traversing through the string, as soon as we hit a
char we can store a counter which increments for every line and if we want to show the line number just use the counter variable which will not take any time
definetly possible, but usually in compilers there is a single pre processing step before the representation of the code changes. Compile time errors, as far as I know, are usually not found until the second step of compilation
> we can store a counter which increments for every line
That is true, but in the real world it's usually more complicated than that.
Context: A compiler in practice does not always report errors immediately. It might need to parse some code then report an error in the past. (For example, Rust cannot report a use-of-undeclared-function error until it reads the entire file because the function might be declared after the call.) Therefore the compiler needs to attach location information to each token and AST node.
In addition to line information, a compiler also wants to report column information. The compiler *could* attach two numbers (line and column) to each token and AST node. But some IDEs want byte offsets, so we might need to store three numbers (line, column, and offset from beginning). That's a non-trivial amount of data to add to *every* token and AST node, so memory usage goes up.
To reduce memory usage, a compiler could attach only *one* number (offset) to each token and AST node, and compute the line number and column number as needed. (Or not compute anything if the IDE just needs a byte offset.) Hence the table-based solutions discussed in the video.
But that's for a traditional batch compiler. Nowadays compilers are integrated into editors. Compilers are given *updates*. LSP (Language Server Protocol) updates communicate line and column information *into* the compiler. Having the line table data structure makes this operation much cheaper.
I didn't discuss any of this in the video because I didn't want to bore people who aren't interested in compiler frontends. But it's a good discussion. =]
As a CS student, I truly appreciate this way of explaining theorethical computer science. Thank you so much!
This was randomly recommended to me, and it was one of the best explanations on big O I've seen
I love the build up of information and background of a problem.
Fantastic presentation and explanation! Although I watched you put this together, seeing the final version was even more engaging than I expected. Great work and thank you!
Certified strager classic.
I really like how this video explains how better big O doesn't necessarily mean better performance for data one's working with, while at the same time it doesn't overdramatise things. As video states, big O describes the general shape of performance curve with increased data, but it isn't be all and end all. And as SIMD example shows, even if it has the exact same big O parameters as line table, it makes up for it with what I call "small c", i.e. those "insignificant" constant factors like the slope.
Personally, my intuition is to prefer structures and algorithms with better big O parameters for key operations - the idea being that for most scenarios either the data is so small extra constant costs are negligible, or the data is so big that better big O matters. But like most things, it's not an absolute rule and in the end, it takes real life performance testing to get the most complete picture.
Having started ThePrimeagen’s course on Algorithms & Data Structures, this was a a pleasant addition for me.
.
I like that you did a pop quiz ‘before’ the material. This follows the research of learning effectively as described by Make it Stick and other books.
.
👏👏👏
This week I learned ways to optimize if-statements in Bash Scripts for a CPU prefetcher-unit’s performance, and about misprediction penalties for the branch-priority features.
.
And I’ve only started learning to script/code/program for a couple months.
.
I’m so grateful for the efforts being out into maintaining and contributing via the internet so I could access all this! Whoo Hoo! 🤠
Man, that's one of the most succinct, best explanations I've heard. Keep posting!!!
I've followed the making of on Twitch. Glad this is out!
Wow, I’m actually planning on optimizing a batch processing API for tomorrow and I came across this video. Learned new insights that will help me make decisions on optimization. Thank you for this! 🙌
Really loving the past few videos. Learned a ton. Thanks.
Thanks for this Amazing Video!! I am a fan of your channel, and love the kind of content you have been putting out recently Its really good to see you uploading more often, cant wait to see what you share with us next!
Excellent video! I found it to be perfectly balanced in terms of explanation, providing just the right amount of clarity without becoming overly lengthy or tedious.
This is probably the most helpful time complexity video I've watched to date (and I've watched a lot). Thanks so much.
Everytime I watch your vid, everytime I get a tons of insights from you.
Thanks and please keep going!
I remember there was someone who also made a video about how they were frowned upon by an interviewer for mentioning that Big O is just about how algorithms scales and not comparing how they perform in all cases. It shocked me because it seems almost common knowledge.
the base of the logarithm doesn't matter because it is just a multiple and we are already omitting coefficient!
also as a mathematician I would have loved to see Landau's Big-O notation in its actual form.
we say that f(x) = O( g(x) ) as x -> ∞ if
there exists an M > 0 and an N > 0 such that
if n > N then f(x)
Yeah, I didn't explain clearly why we don't care about the log base.
In my experience Big O is very rarely a useful metric in web frontend code, since frontend deals mostly with interactivity the number of "things" is relatively small and low latency and minimising network traffic is almost always more important than number crunching efficiency. In fact for almost all DOM related things; if the number of objects is even barely approaching a size large enough for Big O to be meaningful; the DOM tree is probably so large it will kill most mobile browsers for excessive memory use. Also since your code runs on any random client you don't control how fast it's hardware is so most number crunching is better done on the server (in which case it's the backend developers problem 😉)
This is a great point!
These are some good informational videos. I really like the fast paced nature of them, focused on the question at hand. I feel I can learn in 20 mins what would take me a few hours of watching some other videos.
Thank you for the helpful information learning these subject alone sometimes feel depress but watching contents like this gives me hope thank you again!!
Your tone and editing make this comfy.
Big O is meant for describing limiting behaviour as it reaches infinity. We never use infinite elements. So there are plenty of algorithms that could have worse Big O notation and be faster because we use fewer elements. I personally never liked Big O notation in Computer Science because it often detracts from practical algorithms. Nice presentation.
On computers, it's really difficult to know what will speed something up sometimes :)
Big O is also meant to describe worst case scenarios. Maybe the worst case is vanishingly rare for your particular use case, so optimizing for Big O might not lead to any speed improvements. Context matters!
At best, Big O can be used as a guidance, but should never replace real analysis and measurements on actual real world data. You lose so much information about an algorithm by trying to describe it with just a few symbols.
@@oscarfriberg7661 even in that case, it's specifically for the case where the input/problem size is sufficiently large
@@shiinondogewalker2809 people who complain about big O's practical value often have no understanding what it means. Big O doesn't need anything in your program to go to infinity. The information it gives you becomes true for input size above N, what N is, depends on your formula. Could be 10, could be 10^10.
People who think "it only matters for infinite value" remind me of debacle at Microsoft, some "practical programmers" who thought the number of updates will never be "sufficiently large" so they used the wrong sort algorithm leading to millions of people waiting hours, days for windows updates 😂
@@oscarfriberg7661 big O tells you the upper bound, while big Omega telling you the lower bound. So any argument against big O citing worst case scenario could be reverted into argument for big Omega, now you optimize for lower bound, for large input your algorithm will always be at least as good as the value you optimize for :))
And to many people big O and big Omega and big theta are all "Big O notations", so when they spout "Big O are useless" they mean any asymptotic notations.
"Big O notation in Computer Science because it often detracts from practical algorithms"
Big O detracts from practical algorithms the same way the ability to count detracts from practical algorithm
There are two stages when you design an algorithm
the theory stage, where you use reasonings in discrete mathematics to derive the right computation then formally prove the time/space complexity (if you skip this stage it's because some smart person already did it for you and wrote it in a textbooks/papers somewhere then other people repeat it into wisdom and eventually it reaches you without you having to open a textbook).
And the implementation stage, where you take into account what the compiler does, what the CPU does.
Big O doesn't care about algorithm or its implementation, it cares about functions f(n), g(n) on (typically) natural numbers. It is a mathematical concept, much like counting, both deal with natural numbers, and both are used to estimate.
Now guess which stage are big O and counting more often used in :))
Probably stage 1, but it does help in stage 2 too, it gives you a direction on which practical things to try when you don't have all the technical details of your machine, what kind of optimization is done on your CPU etc.
because in practice, you don't start out knowing all these technical details. If you already knew like in this video, what kind of sampling distribution your input data have, how fast your algorithm runs on this specific CPU, then why would you go back to look at the more general idea which makes no such assumption about your system? if the aim is to disprove the theory using practical implementation I suggest at least understand the theory first. e.g. big O never implies O(log N) is faster than O(N^2), that's stupid, an algorithm in O(1) is also in O(N^2)...
Or of course you could spend time running your algorithm on billion different input sizes, that's very practical too :))
Now do I need someone who is good at discrete math and algorithm on my team? no I don't. Do I want someone who doesn't understand the basics such as what big O notations tell you... absolutely not, that's a recipe for disaster.
This is the best introductions to Big O I've ever seen or read. Thankyou.
Title is a bit clickbaity but I really appreciate your channel. Not a lot of people go to the lengths you go to to help viewers(/readers) really see the thought process and the knowledge that contributes to it. These videos, like the best documentation and code comments, makes both 100% clear.
Excellent video!
About leaving out the logarith base, you do it because changing the base only changes the function by a constant factor.
Wow, this is as clear, concise, and comprehensive as I could imagine!
I must admit I found your explanation very straightforward and easy to understand.
I was expecting to see SIMD binary search :D
But the video is a great refresher, thanks for it!
I love how you use the shrimp emoji for the left arm muscle flex. Great video!
You are the first one to point this out! 💪😍🍤
information quality over 9000! pure signal, no noise
I love your videos, your explanation is straight forward and comprehensive.
Totally agree, the best channel I bumped into in years!
That is so good, super clear and super interesting ! Love your work
I am so glad I found you from your hash table video. Concise, informative content, with proper suggestions and pop quizzes to keep the viewer entertained. What more could you ever want?!
An extra thicc rust crab sticker, probably
For the array lookup, wouldn't you store the line number at each position not the offset?
You're right! I messed up the explanation. 🤦♀️
Really love your optimization videos and the way you explain. Keep it up my man!
I love the presentation, and I also love the dratini shirt
very important video. I've seen a lot of novice programmers misusing big O notation... people like to put multipliers inside thinking it means anything, but O(1 billionth * n) really is the same as O(1 billion n). I think you could also have mentioned that big O is merely an upper bound. this means anything that is O(n) is also O(n³), and anything that is O(log n) is also O(n!).
> I think you could also have mentioned that big O is merely an upper bound. this means anything that is O(n) is also O(n³), and anything that is O(log n) is also O(n!).
I think this confuses engineers more than it helps. In the world of computer science this matters, but not so much when we're talking about code.
@@strager_ yeah, I guess. coming from a cs background myself, I'm a bit too used to big os, little os, big omegas, little omegas... and theta. which is what most people think big O is.
Really good video! I found this explanation of big o a little unconventional but good. I especially enjoyed your visuals and your dive into SIMD at the end. Thanks!
Creel has made a pretty nice video touching on the subject as well, regarding sorting algorithms.
Banger!
Normally don't comment. Normally find these types of videos boring. Im commenting, i love these videos, i loved the other optimization video you did. In my opinion you should do more of these style, with the same quality, no need to rush to have video quantity. Good job man. 👍
Extremely informative and intuitive presentation. Thank you! :)
However, I'll admit I'm not as familiar with SIMD stuff, so the speed at which that was moved through left me with some questions.
Will probably look into that on my own, but I would love to see something on that if it is a topic you'd want to cover in the future too!
> the speed at which that was moved through left me with some questions.
Thanks for the feedback!
I definitely busted my Big O 😮 watching this video. Amazing!
All software (algorithms) perform transformations on data structures: 1. Choose the correct data structure to distill your data (eg vectors vs hash tables). 2. Choose the correct algorithms to operate on your data (eg. for-loop vs binary search). 3. (Advanced) Note which ones work best toward your design goals.
Amazing dude.... how do you make those graphics? In case you use Adobe Premiere or any other editing software for it... what software do you recommend for drawing graphics like it?
I used Microsoft PowerPoint and Motion Canvas.
I love PowerPoint. I'm new to on Motion Canvas.
People misunderstand Big O. It's *ONLY* supposed to tell you how it scales with more data, not how performant it is. I love the video, it really shows the reality.
The past two videos have been great!
this was a really smart way of making me watch a video on big O notation nice job
That simd Rust library seems really nice, so far I only used intrinsics in C , but those are too much work for casual playing.
Agreed, intrinsics take a lot of effort. But Rust SIMD is missing *many* things and might limit your thinking. Tradeoffs!
I am impressed about the didactic skills you brought into that educational marble!
flawless.
Interesting stuff. Thank you for demystifying some of these concepts.
8:34 You could also be much clever and notice that *the naïve algorithm is doing a part of the job of finding indexes you need to build the table used in the line table algorithm,* so you can *combine both naïve search loop with table building* in a new smarter algorithm. It's basically relying on a table as much as it is known until it is not, then rely on naïve, so the first run is as fast as naïve but the subsequent runs are faster as they don't re-compute what the table keep, building the table only as needed, never more than needed. It's basic bracket memoization. Nothing fancy.
> you can combine both naïve search loop with table building in a new smarter algorithm
That's a good idea! It's worth a try.
You are one of the best computer science educators ive seen.
Ive been self teaching CS, being a former mechanical engineer. So far this video, ans your hash table video have filled in large gaps in understanding concretely, as ive dealved deeper into the low level fundamentals
thanks for making this video, helped me remember a lot of useful stuff from university ^^
big O is a theoretical bound that nulls out the constants. You should take big O with a grain of salt and learn the trade offs. For example, bubble sort is actually faster than quick sort for low N values. This is due to the overhead of the constants (constants caused by recursive function calls mostly). For a sufficiently large N, the time saved by the algorithm itself overcomes the time lost by doing the function overhead, but as N get's smaller, you lose those time savings since the time saved by the algorithm is diminished. There is also worst case situations that come into play. It's all about what you know about your situation. In general big O is a great starting place, but context of your data ALWAYS matters.
quality content at my youtube feed at last!
i just got a job at google thanks to this video. thanks!
Wow, that was quick!
Good job, cuckl0rd!
You are a phenomenal teacher. Thanks for these videos.
This was a much better video than i expected
I love you man, keep on uploading please
Really nice video!
Maybe one thing I would've added is how we could switch to a different algorithm based on the offset. Like "if (offset is very large) binarySearch() else simdLineTable()"
Yup! Hybrid algorithms would give you the best of both worlds (in theory).
my man i have my algorithms midterm in a few hours. timing couldn't have been better
Wanna hear the best thing? It's Levin's universal search. It can solve any computable problem in an optimal time complexity. The only downside? The universal search adds a constant element which scales exponentially with the length of the optimal program. If the optimal program is L bits long and achieves O(f(n)) time complexity, Levin's universal search will achieve O(f(n)) time complexity, but with an added 2^L cycle delay. This delay quickly gets absurd, so the universal search has no practical application.
Gotta love ivory tower computer scientists!
9:10 - Worth mentioning that theoretically (in context) you're already scanning the file once to work out that you need the line number, if your tokeniser is also calculating line numbers then you'd have to take that into account. You would be sacrificing the case where you didn't need any line numbers as you'd be spending some small amount of time to calculate where lines are even if you never needed them. Also, you would be making your code more complex however so make sure that the extra complexity is worth it.
14:00 - The somewhat tongue-in-cheek saying is "Everything is fast when N is small". Of course, the real meta-lesson here is that testing performance on a small set tells you very little.
> if your tokeniser is also calculating line numbers
If you use the data structure from the video, the tokenizer would *not* also calculate line numbers.
> the real meta-lesson here is that testing performance on a small set tells you very little.
Huh? It tells you how your code performs with small data sets! In compilers, small files (
People need to be seeing dis at my university, this is 10/10
Really good video. Thank you!
I have a violent fever and haven't been able to open my mouth or eat in three days. This was a great mental distraction and I definitely learned some things, I even got one of the pop quiz questions!
I'm sorry my video was so bad it gave you a fever. I hope you get better!
What about choosing algorithm every time checking file size? For small and large files - different algorithms
That's an interesting idea! Hybrid algorithms are something I should think about more. The idea seems to work well with sorting, for example.
@@strager_ yeah, it's pretty common in generic sorting algorithms, that can't make any assumptions about the data, but should perform as well as possible in all cases regardless.
for the line numbers, i would prefer the simplicity of only using a single algorithm. we're on the low end of the latency spectrum already. taking 50 ns vs 25 ns to compute line numbers for < 1000 errors doesn't really matter. (it's different for the sorting algorithm, cause some user may want to sort millions of tiny arrays, and that should still work.)
It's worth it with almost every divide and conquer-style algorithm due to the constant factors that O(n) notation doesn't show and the way caches improve performance on small datasets.
But you don't just pick one algorithm at the start, instead you keep dividing the problem into smaller and smaller chunks, and you don't stop at the base case of 1 element. You stop earlier, once you reach some minimum number of elements that's better handled by the naive algorithm. Then you use the naive algorithm on those small chunks, and let the main algorithm make the higher-level decisions.
@@tiarkrezar it's just another algorithm for choosing better algorithms
@@strager_ Another idea is to "optimize" binary search by checking inside the binary search function if the search range contains less than (8 or 10 or something like that) elements and if so do a linear search.
On the other side.
Calculating line numbers in small files is anyway "fast enough" - even if the normal binary search is a little bit slower than linear search for small amount of lines it does not matter.
Quality content! 😀