I came into this video thinking maybe there was some additional nuance to Big-O that I wasn't aware of, instead I learned that interviewers at Google don't necessarily know what they're talking about. Excellent breakdown of how Big-O works.
Most technical interviews are like that, that's why regardless of your skill there will always be some factor of luck i.e "You can reject good candidates, but never hire a bad candidate even if its your bias".
@@LMB222 The Google bloke was ignoring the explanation, as if it wasn't given. He might not have been wrong for the set of conditions being "nothing matters except for what I want to hear", but for more rational conditions, he was indeed very wrong. It's quite amazing really, the similarities of his behaviour then, and Google's now. Maybe he and those like him took the reins at some point and grew it into this fat, deaf giant of a data harvesting corporation we, uh, know and love.
I love how this is something they teach at Computer Science year 1 and still a Google employee tried to outsmart you on this, while you were absolutely right! Great video!
I think I had an overview of this in AP Computer Science. Crazy. But then again I argued with my PI in grad school about how this reaction chamber was getting oxygen as embers literally formed inside a CVD setup. Needless to say, I left without a degree.
I was really terrified right now, I was like, "well, he is right, what the f..., does this google employee knows something I don't? f... I don't know shit". Caught off guard
I’ve been doing code for 24 years in large corporations and smaller companies. And these little videos warm my heart. Nothing new for me here, but clarity and simplicity of explanations is brilliant. Thank you.
Yeah I never fully understood Big-O notation except that it's supposed to show whats faster but thats about it and school didn't do a good job explaining it to me
Very practical example: You can make Quicksort faster by switching to a different sort in the recursion step when the size is small enough (like, below 10). I think it's called hybrid quick sort.
I always use what is called a "network sort" for small N. Network sorts are essentially a hard coded optimal sequence of compare-and-exchange operations, optimal in both number of comparisons ("length"), as well as maximized parallelism ("depth") .. best case, worse case, all case times are equal as well. I will only accept subjective refutations to network sorts being the best possible sorts for small N, for example they have the longest description (sequence of hard-coded taps) and thats a subjective downside, vs say the conciseness of a merge sort description.
Yeah. The sort in the STL part of the C++ standard library generally uses introsort which switches from quicksort to heapsort when the recursion depth exceed a current level (the logarithm of the size); and if the data set has fewer elements than a certain threshold it just uses insertion sort.
That interview story really gets my blood boiling. I can't help but feel like there's systemic misinformation about optimization, people often deduce performance based on assumptions rather than observations. Implementation details are everything when it comes to performance, its the reason an optimized brute force algorithm can easily out perform a poorly implemented "smart" algorithm, while simultaneously being cleaner. Not that smart algorithms are bad, but you should actually verify optimization rather than assume it must be better.
I've seen a pattern with a lot of engineers who haven't actually done any real performance work, but really want to act like experts on the subject. This was just another case of that. I don't think of myself as an expert either, but I've taken point on it for multiple game titles, and was hired at Google for that.
This reminds me of a story. In 1933, the animator Milt Kahl went out with a stop watch with the goal of measuring the speed that people walk at. He discovered regular people walk at 12 frames(That would be a step every 1/2 a second). But other animators would argue with him, and say people take a step every 8 frames.(1/3 a second) This was because 8 frames was easier to divide and required less work from the animator. All they had to do was test it for themselves and see.
@@simondev758 Your understanding was perfectly on point - with the main "gotcha" being that once the "known good" domain-space of the simpler algorithm with poor scaling is passed, you definitely want to have a fallback algorithm which scales better with greater inputs. not even necessarily a ready implementation with active processing time due constant checking, as much as a very helpful suggestive comment As otherwise you end up with a "pie in the face" moment where presumptions of past limitations - and optimizations around those past limitations - results in FUTURE changes to those limitations (by other people) running hilariously suboptimal algorithms within those new limits. BUT WAIT these are all dreamy-dreams of codebases that are NOT REAL nevermind what I just said don't implement fallback algorithms in your spaghetti code in case people want to suddenly expand the limitations of your thingymabob it's never gonna happen the future doesn't exist you'll never be there we only live here and now in REALity
@@TheDoomerBlox well this sure hasnt happened before :DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
My favorite example of "O(n) is a meaningless metric when n is small". Key observations: 1) multiplying 2 numbers (as we learn in school) takes O(n^2) steps; 2) multiplying 2 numbers is pretty much the same as calculating a convolution of those 2 numbers (after splitting them into 2 lists of digits); 3) you can calculate convolution using Fourier transform (takes O(n^2) steps) + n times single-digit(-ish) multiplication (takes O(n) because we multiply single digits which is O(1) lookup) + inverse Fourier transform (takes O(n^2) steps); 4) ... but you can use fast Fourier transform instead to reduce complexity from O(n^2) to O(n log n) steps; 5) considering facts 1-4, using FFT you can multiply 2 numbers using O(n log n) + O(n) + O(n log n) == O(n log n) steps, which is faster than using school multiplication algorithm. All 5 facts are mathematically provable, nothing wrong there. ... but for sOmE WeIrD rEaSoN, computers don't use FFT to multiply numbers, at least until numbers are bonkers huge. Hmm, probably software and hardware engineers just don't know about this neat trick
I watched a video a while back about the Karatsuba algorithm being the first to break O(n^2) steps, which performs integer multiplication in O(n^log3) steps, but it's only faster than schoolbook long multiplication for huge N-bit integers. I was thinking about it through this whole video.
I was a TA in a Computer Science class a few years ago, and when I did a presentation on BigO notation, I made a point to remind people that there's there's always (at least) a constant k scaling up the time costs of one algorithm or the other. For example, k*x > x^2 when k is very large. I illustrated this by drawing a very large letter "K" on the board in front of the formula.
Yeah, a lot of people look at Big O as a way to sort algorithms from best to worst. All it describes is how well it scales to increasingly large data sets.
As the end of the video tells you, just never rely on big-O alone. I did that once, choosing a hashtable for a situation where data is frequently looked up but rarely added or removed, because it is O(1) (it's a little worse for collisions, but the table would never be loaded by more than 50%, so collisions would be very rare). I didn't even consider other solutions. Years later, just for fun, I wanted to test how much slower a sorted list would be at lookup, after all it is O(log2 n), only to find out that it beat the hashtable by a long shot; it was much faster. So I calculated at what point the hash table would take over, and it turns out I have to have over 300 entries before the hashtable gets faster (I benchmarked that and the benchmark confirmed it). The thing is: Due to other technical limitations, it was impossible to ever have more than 256 entries, and even numbers over 50 would be pretty rare. Why was the hashtable so slow? Because of the hashing. In the time it took to hash the data for a search, a binary search of a sorted list would have found the entry long ago. However, a good hashing is indispensable, because with a bad hashing there are many collisions and the hashtable then becomes even slower.
You should never rely on big-O alone PROVIDED you are a software engineer or applying asymptotic analysis to algorithms you write to be used. Asymptotic analysis is applied to computer science from a theoretical perspective, it is used in the theoretical study of algorithmics and data structures. It wasn't ever meant to have any claim on the practicality or non-practicality of specific implementation of algorithms. Algorithmics is a theoretical/mathematical subject, and asymptotic notation certainly has its place there. The problem occurs when say software engineers (understandably) want to use the same general ideas of "faster" and "slower" implementations, but slowly fudge the meaning of big-O so much that it is no longer accurate. And then claim they understand big-O notation.
@@moatef1886 I mean, the alternative is that you rely on gut instincts and misinformation. It's better to use big O responsibly then not to use it at all just because of some pseudo disciplinary boundary.
@@moatef1886 IMO software engineering not being a 'real' engineering discipline (read: Not relying on physical systems, not being primarily done by engineers, not requiring an engineering background, etc.) implies that it is just an application of computer science in an ordered, process - driven, formal manner. If that is your definition of software engineering, and mine is, then there is no disciplinary boundary. Software engineering would be akin to laboratory work in the same way experimentation would be for a physicist or chemist.
Red-Black trees or similar binary search will be faster than hash tables at smaller number or elements yes. In my experience, in C++ the unordered_map will be slower than the map depending in the context. Could still be faster after thousands of elements.
A good hash may depend on the data being hashed. It might even be the case that no hashing is required. If you know the normal data we'll enough, you might be able to use some internal aspect of the data as a hash and still get the benefit of a hash. With 256 entries, a single byte in the middle of compressed data is sufficient as a hash. Even a CRC-8 on a smallish (but unique) text field would likely suffice, and be computationally efficient. Knowing this ahead of time is called domain knowledge.
Occasionally, based on context, an O(x²) operation like bubble sort, will reliably outperform an O(nlogn) operation like merge sort, regardless of the size. One example - you have a huge list that is almost sorted. You know that one entry was added to the end of the list recently. You don't know where it belongs. A properly written bubble sort will get the list in sorted order in O(n) time, despite the fact that bubble sort is an O(n²) algorithm. A merge sort will still take O(nlogn) time, with a significant extra load applied to set up. Knowing the situation changes the relative speed of operations.
In a database, a btree table always returns results that are sorted in accordance with the table. However, an isam table returns older entries that are sorted by key in accordance to when the table was last maintained, but any newer entries added to an isam table after the last table maintenence procedure happen at the end of an unsorted table listing, and if there are no new entries, the listing will be sorted on key. Even with a btree table the database administrator in theory might change the table's key, but perhaps in a minor way. In either situation, a listing can be mostly sorted but not completely sorted, and a bubble sort technique could be the fastest sort type.
I'm really glad that you explain the difference and fallacies for Big O notation for small numbers too! I've been bitten by this misconception of co-workers too, especially when CPU cache-friendly algorithms with worse Big O notation are just way faster for certain use cases
Not just Cache friendly, but preload-friendly may matter. Quickort (and most other O(nlogn) algorithms) may be better in big-O notation, but every compare requires loading from main memory in a way that cannot be predicted easily by preload operations. Even Level 2 cache won't help if the data is big enough. Bubble Sort may be worse, but the data will likely already have been preloaded into cache by the time it is needed. Depending on the specifics, bubble sort might even use so little of a multi threaded CPU that an entire other thread can run at full speed. Yes, these times are constants, but we can be talking hundreds of times faster (plus potentially bogging down an entire CPU and RAM channel doing nothing but reading/writing memory)
Your channel is so under rated. I’m a cs student and hated data structures and algorithms before watching your videos... now I can’t get enough. Thank you Simon.
Programmers and engineers can get touchy about Big-O. I also once mentioned to a senior dev that O notation is accurate as n approaches infinity and they got pretty emotional about it. Probably because I have no professional experience? So I backed up and pointed out that even if an algorithm is O(n) and n was equal to 1 million if our target platform is powerful (like the average cellphone) it will still execute quickly. But that backfired because he never considered that Big-O was being applied to a hypothetical machine and argued that O notation was "just math". I wasn't in a job interview but things still got awkward.
@@FlareGunDebate Or it might be because of some programmers that LOVE micro-optimization and stupid games like "instruction counting". The so called "engineer", but in reality computing is a science, you have to observe, you take the performance report, then you formulate an hypothesis, then you change the system to observe and repeat until you get your target performance adequate for the task at hand. Trying to theorize performance of a physical system (aka, an actual computer) using formulas is a big no, because there is a lot of unknown variables in the system that isn't captured by a simple model like Big-O. Big-O is just a tool. Silly Google engineer was wrong, and Simon was right.
@@monad_tcp don't know what flare gun said but you should have an idea of the big o when you create an algorithm IF you the algorithm may (even unexpectedly) be used for large inputs. It's a programmers first rule not to optimize first but I'm pretty sure that only means when we aren't competent. For example, throwing a kd tree at a problem without really understanding the kd tree will result in bad code. However have another programmer proficient in kd trees work on it and boom, efficient method. To be fair, from the Google engineers pov, small inputs barely affect runtime. If you have slow code that barely runs and has barely any inputs usually it won't be a problem compared to even efficient algorithms on really large inputs. Therefore it's more important to consider the larger inputs. (Naturally this isn't a one size fits all argument, if no arguments are large there is no reason to worry about efficiency. [Though it really doesn't matter about speed when your only making a calculator lol])
@@monad_tcp Ironically actual engineers are very aware that nothing you design will end up working exactly how you think in the field and half the job is knowing which level of detail is actually worth dealing with.
You know you're a _real_ professional when you have temper tantrums over a technical discussion with a peer. On a brighter note: Your content is amazing. It's very well polished, presented with a perfect balance between brevity and detail without losing neither educational quality nor entertainment value. Thank you for your hard work, I automatically click your vides when they show up on my feed.
I had a quite similar interview with the big G many years ago, making almost the same observation and was met with the same disbelief. I wasn't hired, but I had a quite long and productive career anyway. This was a really clear explanation, and yes sometimes the details and specifics of the constants can't be swept under the big O carpet.
I dunno if that's what I'd recommend. I'd guess that most of the time, you'll do better by going beyond and showing you've got some depth to your knowledge. But sometimes, like in this case, it'll backfire.
In my last on-site interview for Google, I ended up correcting my interviewer, as he was claiming things about tail-call elimination that would not have applied to the code on the whiteboard that we were discussing. Then we ended up drawing stack diagrams. Then I worked there for 5 years or so. So, at least some anecdata that shows that it's useful.
Thanks for a good counter example! Most interviewers I've spoken to, at Google and at other FAANG, were great, this one happened to make a good story. I still ended up at Google for 7 years myself, so it worked out still.
I have been enjoying your content. You have so much to teach about game design, software development, and mathematics. I appreciate it very much that you have posted on youtube for people to learn for free. I will be buying you a beer, friend!
Wow....the worst part is you were NOT wrong. Big-O is a notation that describes the limiting behavior of a function when the argument tends towards infinity, it doesn't say ANYTHING for small or limited values.
Maybe the interviewer was evaluating his reaction to being faced with someone who is being aggressively wrong. SimonDev passed the test by first trying to explain, not becoming angry, and dropping the subject gracefully. He did get the job offer in the end.
they also very often tend to not realize that even a O(n²) algorithm will tend to be faster on actual hardware than even O(log n) types algorithm for small values of N ( could be up to millions) because of stuff like cache optimisation, memory locality branch predictions, and other kinds of optimizations. they also tend to ignore the time a single iteration take. yes O(log n) will have less iteration than O(n^2) but if a single iteration is thousands of time slower you might prefer the O(n^2) one
@@OFfic3R1K often times when doing binary operations you're using heap allocated objects that may be in random parts of memory. Since you'll most likely be doing pointer dereferences to access the objects the memory controller will need to access random parts of the ram. Ram can take 100-1000 cycles of the cpu waiting for the data to be able to continue processing. However, if instead you had all of the data in one continuous part of memory you could iterate over the data and because you're not doing random jumps the memory controller can see that you're fetching blocks of the same type of data continuously in a stream. Basically if you're accessing data all in a row instead of randomly throughout memory the computer can predict what memory you want before the code execution manages to get there.
I also work in game development, and my philosophy for these types of things is to first consider the size of the data sets the algorithm will be working with, then consider things like Big O, and lastly test at least the top 2 choices I have to verify what I expect to happen. Because of what you point out in this video, often times a "better" algorithm will be inferior in your particular use case. But, it's hard to know for sure until you test it.
@@simondev758 I feel like all of this is ties in nicely with Amdahl's law - "the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used". We can sit back and say "wow; nice optimization here!" or "ah yes, the asymptotic runtime is much better here", but without thinking of the context of the optimization or algorithm they can be exercises in mental gymnastics instead of providing a true benefit to the product. Always refreshing to see videos and comments like the above. I've been focusing on how to retool how I conduct interviews the last couple years (game industry) and I think it's important to keep these types of wisdom in mind.
I always found big-o to be kind of a tool to do initial guess rather than any super accurate analysis. Like, you think of an algorithm, you can in your head calculate the O complexity related to inputs that you don't know how to constrain, and try to gauge if you think the complexity is needlessly large, and if you can do better. For constrained inputs, O(1) is always an option.
@@travcollier I remember reading about how some complex algorithm with only one float as an input found it actually faster to interpret the float data as an integer and look it up in a LUT than it was to do the math on the fly.
@@sycration Yes, it was (might still be) a common optimization for things like trig functions... Especially in old video games where 'good enough' is the precision requirement. ;)
@@travcollier This is kind of the case in games in general as they usually do constrain the input sizes (often via settings). Thus what you really care about is whether the given profile can calculate frames in an acceptable period of time on the target consumer hardware. This of course being due to the fact games are very realtime in nature so ideally you want to cap the frame processing time somewhere. Probably in the range of 16ms on recommended hardware and 33ms on minimum hardware ideally as you have 16.66 ms at 60 fps and 33.33 ms at 30 fps to calculate and draw a frame. Any more and the game is likely to get uncomfortable to play although it depends somewhat on the game too with RTS games the limitation is often less about drawing frames in time as it is time dilation as entity counts grow but the game would still grow too glacial to be fun without such constraints.
Excellent video! Small comment: Big-O notation is actually an upper bound, not the 'lowest' of those upper bounds. So any algorithm that can run in x^2 operations is O(x^2), but it is also technically O(x^3) or O(x^100). This means that on any exam if you're asked for the Big-O complexity of a given algorithm, you can just write down O(x^x^x) and always be technically correct. What MOST people (this video included) describe as Big-O notation is actually Big-Theta notation (tight upper bound), and is a much more useful description in practice.
You might want to rewatch, that's specifically mentioned at 6:09. I mention there's the proper definition of O(n), which is what's covered in the video, and then the sort of "colloquial" use among programmers which corresponds more to sort of Big-Theta.
Thanks for this video! Really interesting to see how optimizations play out at smaller scales. In a recent project of mine, I was considering using a spatial hash grid or some kind of tree structure rather than the naive O(n^2) algorithm. However, I ended up just trying the naive algo out, and since I didn't have too many agents, it performed well enough that I could still get 60 fps. Not sure if that directly relates to the topic of this video but definitely a cautionary experience with premature optimization.
I'm not a mathematician by any means (failed pretty much every math exam I took) and while a lot of the terms getting thrown around might as well have been klingon I was able to completely grasp the concept. I'm in my 60s and teaching myself really basic computer stuff as a hobby, this has actually made me re-think several things I thought I knew (which is always a good thing).
Very good conclusion. Also good to note that the complexity of an algorithm may depend on the structure of the input. For example, a sorting algorithm may be much faster on inputs which are already partially ordered. In the end you need to think very carefully about what kinds of inputs you expect and what the expected average case is and that is what you need to optimize for.
One of the first things I learned about algorithmic complexity when I was just starting out is that small scale and large scale are quite different. It's why the standard method for implementing a quick sort is to use insertion sort under a certain threshold. It's obvious how this concept can be applied to many things, but I guess they didn't send someone who knew what they were doing to interview you.
Very interesting and accessible for a person like me who has learned programming from TH-cam tutorials and has a background in something completely different. It'd be very interesting to see more videos on the subject of efficient/performant code, and then to see you apply these principles in practice on something like JavaScript game code that you wrote.
The example you showed here is one case where the O(n) outperforms the O(log n) algorithm for pretty large examples. I think up to about 10 MB of data, thanks to caching and SIMD, linear search usually outperforms binary search. In a lot of my own time at Google, I noticed people writing slow code by using the asymptotically-fastest algorithm and ignoring this.
Duude... I was totally thinking about this same exact thing like yesterday... I love it when I just randomly think of something interesting and find someone else talking about that same thing.
the sketch-animations are a great aid to the narrative. I am liking the idea at the end and how it relates to intuition, since we keep in mind what is relevant and what resonates, and leave behind the irrelevant bits.
I knew where this was going from the title alone. It actually concerns me that so many people don't understand it, but I've also had to explain it before. I gave the example of a custom function that sorts an array using quick sort, but before starting the function simply sleeps for 5 seconds. That function is still O(log n) - though that alone confuses people at first - yet it's clearly going to perform worse than even a bubble sort on small list. But start scaling up that input and suddenly that 5 second sleep becomes just a fraction of the sorting time, and O(log n) saves the day. I've seen similar with people who code using a variety of OO best practices, but because they don't fully understand why those things are done they end up creating a monstrosity of interconnected classes that can't be untangled. At the end of the day, real word experience really does matter a lot, but equally it seems to be easy to go years and years without ever being challenged on these points, particularly if people job hop. Also, how amusing that TH-cam just randomly suggested I watch this 2 year old video... Almost exactly 2 years old as well!
Wouldn't it be more likely to be challenged on these points since you run into more peers and thus have a higher chance of encountering someone who actually knows wtf they are doing?
The one time I had occasion to write a sort algorithm of my own (in VBA), the arrays I was sorting I knew, a priori, were never going to have more than about four items (in fact that may literally have been the limit, can't remember why). There was no way I was going to bother looking up a fancy sorting algorithm and did the first algorithm that I could do from memory. I totally sympathize with that interview situation.
As a corollary to the sorting analogy people use, I think there's an even goofier variant of linear search that blows binary search even more out of the water, called "sentinel linear search". Instead of checking if you've reached past the end of the array at each step, just put the element you are looking for at the end of the array. You'll never go out of bounds, and you save many comparisons. Just need to check if you caught an actual match or just reached your sentinel.
Its also important to point out that when working with asymptotic bounds they are describing a mathematical behaviour and even when they do provide an initial cost (n) before the algorithm becomes faster the practical implementation almost always has a higher initial cost before it becomes faster.
Just found this channel through the JS MMO video. What an absolute gem. Thank you! Subbed with the bell on, but first I need to go back and watch all of these videos
Thanks for the video! I never understood Big-O properly or so i thought. I basically had the same understanding as the one you presented but my lectures somehow convinced me otherwise and i ended up VERY confused.
Was going to say, that in general, could just use theta instead of O for average run time- but you clarified at 6:29. But I think generally, Software Engineers are only really concerned with worst case- and avoiding that upper bound function (depending on input ofc) .
Amazing video Simon! And loved that story from google, unfortunately there was that discussion, but also great to hear you still got a job there! Looking forward to your next video!
If you like that, I once was tasked with trying figure out why another engineer was taking 3 weeks to implement binary search. There's no trick here, it's exactly as simple as it sounds.
That interview story was irritating. I'm a mathematician and try to emphasize the same points you made in this video to my students when we cover big-O. Your video was very well explained!
If you know the break even point, you can test for it. Sometimes below the break even point, the difference isn't worth the test time. In those cases the lost efficiency isn't re-capturable. So the default would be technique > N break even. After this, I'm going to have to get the weeds off my ankles. Thanks, good video.
Have seen similar things, usually with NP completeness. Like colleagues insisting that you cannot implement say a traveling salesman algo because that's NP complete, although you're sure to have at most five elements. And when you show them the implementation and runtime, you're the guy who "proved all uni wrong" and you're the "coding deity", etc etc. Or they don't look at your implementation and you're a liar and trickster from then on and they won't listen to anything you bring up anymore. -_- Neither outcome is helpful.
That little detail can get overlooked: if you know the size of your input is constant and the thing you're implementing won't be used elsewhere on a different scale, it's O(1) and the Big-O thing becomes kinda moot.
Geez, hard to believe that the hiring engineer at google couldn't comprehend that haha. Like you said, depending on the engineer, Big O can refer to the Typical or worst case performance. They often won't even consider the best case that you can guarantee with known input domain.
Don't feel bad for standing your ground if it is the truth! This concept is also applicable even in non-contrived real-life applications. Sometimes, your input to a particular problem will never be larger than a googol or something. Hence, an algorithm with worse theoretical time complexity can still be better 99.99% of the time compared to an algorithm with better theoretical time complexity. The keyword here is "theoretical". At other times, you have to make a tradeoff between space and time that might make the algorithm with better theoretical time complexity unfeasible to be implemented. Examples that come to mind include hybrid sort algorithms, BVH for ray tracing, and even integer multiplication. For example, using the Schönhage-Strassen algorithm is way too expensive for everyday integer multiplication. It's not always so black-and-white as "O(logN) is ALWAYS better than O(N) and that's that!". The whole point of software engineering is to be able to weigh these different options and tradeoffs and select the appropriate solution for that particular application, guided by testing, profiling, benchmarking, etc. Sometimes, even measurements can lie (though not often), hence the importance of code review, design/architectural discussions, etc. For example, embedded systems might not have the luxury of space (or even power!) to perform heavily recursive algorithms or algorithms that require large data structures such as trees or graphs. Hence, selecting the slightly slower algorithm might be better for that specific case.
As I can remember Big-O notation being presented in the algorithms class (back in the 80s for me), I do recall the instructor touching upon what SimonDev is presenting. It was select your algorithm to your needs. If your data set is small, go ahead with the bubble sort. it was probably easier to implement, test, and validate over a more complex sorting algorithm.
If that's how you handled the interview, it shows some really great character, that you would stay calm and be sympathetic to the misunderstand of someone shutting you down like that, especially in a nervous situation like an interview.
I also know a software engineer who optimizes almost solely on Big-O. However, I work on embedded systems with limited storing and computing capabilities. The amount of data I can receive and process is very limited. I need an algorithm that uses my small CPU very efficiently on these few data points. I don't need an algorithm which could potentially work on millions or billions of data points in a data center application. But even for such big applications, my feeling is that some software engineers too often look solely at the algorithm and not enough to the data. Maybe the data is sorted in a specific way already due to some external restrictions? This definitely changes the real-life behavior of the whole thing. Also, it's important to look at requirements. Do I want to have a guaranteed maximum processing time? Or is the throughput more important? (aka average processing should be fast). For some tasks this might actually result in the same algorithm, but for other tasks it might not. Also: How accurate does the result actually have to be? I know that software engineers like to think in deterministic ways, but if working on real-life data and when we want to create practical results, sometimes it might be more important to be accurate to a few percent accuracy instead of being fully accurate - but the processing time can be vastly smaller on some tasks.
in my comsci fund 3 class we had to write a more efficient sorting algorithm than n^3 (the task at hand was to sort pixels in an image line by line, using like its red value or something) and i technically wrote an O(n) sorting (similar but not radix sort). thing is even with sorting 405,000 pixels the O(n) sort was slower because it required the creation and run through of a 4GB array. i just implemented a heap sort for the lab which got me a 100 but in class just to goof around on a slow week my professor actually attempted to implement my sort. heap sorts ran in like 16ms and mine took 4,000. this was on a very small image, like 635x640. we then tried it on a 3000x3000 image and the heap sort ran in 600ms while my sort took 10,000. so it seems like a failure yet the heap sort went up by 37.5x times longer while my sort only went up by 16x. EVENTUALLY my sort would have worked better but only on images that are 100,000x100,000 or something dumb lmao.
This is so true. I'm just a humble sysadmin, but overhead matters once N gets large enough. Process a data set big enough (300 million records) and a 50ms constant response time from the API for each record adds up! Even if you parallelize it, the API-based processing can still take longer than linear batch processing on a single host.
Showing someone a page on "Galactic algorithms" is good way to convince them that time complexity is not always a good indicator of expected performance.
This is why I test my performance sensitive solutions with different sizes of problems to see if the performance overhead scaling is linear, sub-linear or super-linear.
A good example is standard library implementations of sorting -- many libraries use a hybrid "introsort" type algorithm, which uses quicksort most of the time, but insertion sort on small arrays because it's faster (it also will switch from quicksort to heapsort in certain cases, typically when the depth of the recursion becomes large). And of course, insertion sort is O(n^2) compared to the expected O(n log n) for quicksort.
This is super important in cloud computing where you might saw split your computer across thousands of tiny virtual instances running on small data sets. Like a classic serverless on Lambda solution.
People often slip through the cracks at these large orgs, like your interviewer here. Glad the system worked and you didn't have to have him as a direct colleague in the end.
I had a similar issue in my Google interview, the interviewer stopped told me I didn't seem to know anything about the querySelectorAll function (how I would write my own version of it), got very frustrated with me, and then I asked him to correct my mistake. He did not and then the interview was over as we had ran out of time.
Is there any chance that that last interaction was just them testing how you deal with conflict? Seeing how you would deal with somebody telling you that you are wrong even though you are actaully correct. I heard that they sometimes do tests like that, but maybe it was incorrect info?
I worked at Google for 7 years. I've done countless interviews. I've never done that, nor read feedback from someone who did, no spoke to anybody who planned to test for that, nor was trained in interview training to look for that. What I HAVE run into at Google is an absolute abundance of people with a raging ego, who saw getting in as validation.
@@simondev758 When i first saw the title of your video, I thought you were going to say that you mistakenly said that O(N) is always faster than O(N^2), not realizing it doesnt apply to small N. I was surprised it was the interviewer who didnt know that!
Hah, possibly a little... but it's been 10 years since this happened and Google was/is a huge place so it's unlikely you'll ever run into your interviewers again. But I do like telling repeating this to people because it's a fun little story about not looking down on people and being a complete snob.
Great Video! I'm a ce student and I failed my "languages and complexities class" once and really did a lot of studying on the second try. There are also quite a few more interesting things to this topic. It feels so awkward that the engineer tried to put you down on something like this, especially because it's not super difficult and something students learn pretty early on. Sry for the bad interview but honestly that gives me some confidence that these people are also just humans and not some sort of godlike, super smart, higher entity.
I ended up at Google for ~7 years, senior engineer leading teams. Don't buy into the myth that they're all superhuman. They're not in the slightest. There's some great people there, and some really mediocre ones (I count myself in this group). Honestly, a big problem at Google is just getting people to even do basic work in a reasonable timeframe. I was once asked by another lead to look into why one of their engineers had taken 3 weeks to just implement binary search.
Big-O notation always confused me when I was first introduced to it. I wasn't taught a formal definition right away, but just the informal one where it just means that f looks like some constant multiple of g. What confused me about it was the smaller values. In an Analysis course I took some years later, I was finally taught a proper definition, and was introduced to big-theta, little-o, and more. And the teacher made it extremely clear that big-O (or any-O) notation is meaningless without the phrase "as x tends to a" (where a can be any number including infinity). If that phrase is not included, then we could say the function f(x)=x^2 is both O(x^2) and O(1). The first is true when x tends to infinity, the second when x tends to 0.
Thanks for pointing me in in the direction of what math I should be learning. Always been coding, never had an interest for math. Strange combo but what can you do.
That's pretty normal, see a lot of coders who don't have a mathy/computer science background. Had a bunch who were my bosses, so didn't hurt their careers much heh.
I hated maths in school. I love it now but I struggle endlessly with proper maths notations. It's so much easier when people describe maths through code.
that's also a pretty useful thing to consider: what do you expect from the current problem? like what are its specifications? for example quick sort is obviously good, but mostly when speaking of fully random arrays, but when your typical array to sort is already almost sorted or it's reversed, it's better to use algorithms that are better in their best case
I programmed a game base not too long ago and i had such a bad code that tiny maps (100x100 squares) took way over 8 hours to generate (8 hours equaled 13% generated and getting slower for quite a while), after putting in an emergency optimisation i already shrunk it to 6 minutes and i still know a way to optimize it even further. Thanks for showing the big o because i didnt knew something like that exists
I've had the interview story happen to me. I dabble in a bit of competitive programming where micro-optimizations actually matter, and I study quite a bit of theory as part of my job, so this kind of pedantry actually barks up my tree more often than I'd like. I was asked "Integer to Roman" and after answering, I commented that a.) Naive string concat in Python makes my solution O(n^2); I implemented my solution using list[str] then joined them together to obviate this, which my interviewer argued with me over for 10 minutes, basically saying I was chasing ghosts. b.) The time complexity depends on what you're basing your primitive operation on, so if you observe that 3,999 is the highest Roman Numeral and bound the input there, it's O(1); if you base your number system off of bits and ignore the 4-consecutive-symbols rule, then it is O(log n) because the number of bits in a number grows logarithmically. But no, the interviewer argued with me again and said it's O(n) because at each step you're shaving off another number??? I got the job offer but man, that was definitely the "uhhh ACKSHUALLY 🤓🤓🤓☝️☝️☝️" moment of the year for me.
A lot of sorting algorithms' time complexity (and thus efficiency) can highly depend on what you know about the expected input data. If you're dealing with highly sorted data (e.g. you're appending a few items to an already sorted list), then insertion sort will vastly outperform any "efficient" algorithm like quicksort or mergesort despite technically being O(n^2). If comparisons are costly for whatever reason, then mergesort will outperform quicksort due to making far fewer comparisons. If you use a non-stable algorithm, you might be introducing extra comparisons that could otherwise be avoided with a (less efficient) stable algorithm (e.g. input data is already sorted by time, and it needs to be sorted by something else followed by time). One algorithm could be vastly slower despite the same big O scaling. If space complexity is an issue (which it often is for big data), then you'll benefit from an in-place algorithm, but if dealing with small datasets, doing them in-place might be slower. Unused memory doesn't speed up the CPU. Always assuming the average complexity is just as naive as always assuming the best complexity (in which case bogosort wins hands down). Also, small datasets aren't something that should be dismissed just because it's Google. One day, you'll be writing a function that will typically sort lists with ~10 million items, and another you'll be writing a function that will typically sort lists with ~100 items, but the latter might have to run for 100,000 small lists concurrently. Picking the wrong algorithm may easily set the company back $100k in server costs, at which point practical correctness is valued a lot more than theoretical. tl;dr The interviewer could get their head out of their ass and maybe write some production code once in a while. #doyouevencodebro
Hey, i remember something similar, i remember i made a little rpg game in rpg maker, and in that game every single character grew exponentially stronger when they leveled up, i then made a character who, would start out 10x weaker than the rest, but would became stronger faster, i was reminded of that character by the curve at the end of the video :)
I think the simplest way to explain to somebody how much constants matter in performance analysis to somebody is to show them two implementations of an algorithm, e.g. linear and binary search and then insert something like a wait(Duration::from_seconds(3600)) in the “faster version”, then ask them about the complexity of the algorithm again (it didn’t change), before benchmarking one against the other (you probably don’t want wait until the binary search which waits an hour for each split completes though)
Alternatively, you can make the wait time scale linearly to show how an algorithm scales in general (for some tests it's difficult to see a difference)
Are you sure that guy was an engineer? Quicksort, one of the most used sorting algorithms is a prime example, since its worst-case performance is O(n^2), but on average it's O(n log n), and faster than Merge sort, which is O(n log n) in the worst case. The point all my programming teachers always made was that you have to *benchmark* to actually know the performance of your algorithm. Hardware details and input characteristics can make a huge difference. And you also need to know your priorities. Sometimes the average case doesn't matter at all, if you have a hard time/memory budget (e.g real-time programming like robotics and audio, or limited memory environments like embedded devices)
that's still comparing O(n log n) with O(n log n), because if you hit worst case the quicksort will be very slow. A better example is that a common implementation of quick sort switches to an O(n^2) algorithm such as selection sort once the ranges gets small enough. If O(n log n) always was faster than O(n) you wouldn't be doing that and just keep using quicksort until it's fully sorted
Should jave asked the interviewer about Python's default sorting algorithm, then. It's called timsort, which uses the "slower" algorithms on smaller collection sizes and "faster" ones on big collections. I forget which ones, specifically, but it supports your point.
It is really easy to see how this assumption that lower time-complexities doesn't hold up at smaller scales. Let's say we have an array of 3 elements, defined as x = [1, 2, 3]. We know that x is sorted from smallest to largest, and we need to know what the index of the element marked "3" is. First we do a simple search from start to end O(n), and the computer does the following: Checks element 0. Increment Checks element 1 Increment Checks element 2 return 2, since x[2] == 3 Now let's look at a binary search O(log(n)): Set 0 as the left bound of the search Set 2 for the right bound of the search Add right and left bounds together Divide this by two Check divided index Since its smaller, set left bound as that index plus one Repeat above steps, this time return index since it is equal to 3 Clearly the second method has much more "steps" (even ignoring the fact that division is much more expensive to a CPU than addition/incrementation). Big-O is only a limit as n -> ∞
I know that Bubble sort is slow (O(n^2)), but I was perfectly happy to use it on an Arduino to sort the results of 3..5 thermometers (and use the 2nd largest). Code complexity matters as well.
Yeah, something else a lot of devs don't understand, the more senior you get, the more multidimensional the problem becomes. You're balancing performance, memory, time constraints, maintainability, etc..
My estimation was wrong about 1 thing. I did see the: 'big-O describes better *at larger inputs* '-issue coming. And how that was a lesson that was going to be a big part on this video. However I also had the 'assumption of competency @ google hiring practices' Therefore assumed (when I clicked on the video), that you would be the one making the mistake, not the other way around.
Having worked at both ends of the size scale for 40+ years I really feel for you! That interviewer simply didn't know what he was talking about. The only way to really know is to benchmark & profile your code while working with real (or at least realistic) input sizes. Personally I do care a lot about performance so I tend to keep thinking about a problem for sometimes several days just to be sure I understand enough about it, then I'll write my first implementation. During this stage I almost always realized that some of those initial ideas were simply wrong so I then write the second version. Back to your examples, you can often make a hybrid solution where you switch from n*log n to n*n as soon as n drops below some cut-off value, simply because that will be faster. A classic example is quicksort where, in my experience, the best approach is to switch at somewhere in the 4-8 range, or you can put the cutoff at 3 and simply use branchless code to sort the 2 or 3 items remaining. It is however far more important to always recurse into the smaller partition first and then handle the larger part with tail recursion, i.e. looping.
I came into this video thinking maybe there was some additional nuance to Big-O that I wasn't aware of, instead I learned that interviewers at Google don't necessarily know what they're talking about. Excellent breakdown of how Big-O works.
Most technical interviews are like that, that's why regardless of your skill there will always be some factor of luck i.e "You can reject good candidates, but never hire a bad candidate even if its your bias".
But the Google guy was right for the set of conditions they were in.
@@LMB222 The Google bloke was ignoring the explanation, as if it wasn't given. He might not have been wrong for the set of conditions being "nothing matters except for what I want to hear", but for more rational conditions, he was indeed very wrong. It's quite amazing really, the similarities of his behaviour then, and Google's now. Maybe he and those like him took the reins at some point and grew it into this fat, deaf giant of a data harvesting corporation we, uh, know and love.
Google doesn’t hire smart or qualified people, just the ones that play their little game and put up with their BS
as an interviewer at a tech company, we're just humans too
I love how this is something they teach at Computer Science year 1 and still a Google employee tried to outsmart you on this, while you were absolutely right! Great video!
In my CS studies it actually was my second year (3rd semester). They taught a lot of maths and programming basics before
Currently in my second CS course in first year and learning about big O as we speak lol
It’s not even computer science it’s just common math. Then again I learned about big O in aerospace math… yeah nvm that’s just applied CS, carry on.
I think I had an overview of this in AP Computer Science. Crazy. But then again I argued with my PI in grad school about how this reaction chamber was getting oxygen as embers literally formed inside a CVD setup. Needless to say, I left without a degree.
I was really terrified right now, I was like, "well, he is right, what the f..., does this google employee knows something I don't? f... I don't know shit". Caught off guard
I’ve been doing code for 24 years in large corporations and smaller companies. And these little videos warm my heart. Nothing new for me here, but clarity and simplicity of explanations is brilliant. Thank you.
~20 here
Yeah I never fully understood Big-O notation except that it's supposed to show whats faster but thats about it and school didn't do a good job explaining it to me
@@tekbox7909 pick up a theoretical computer science book my friend! There are good books out there that explain it much better than most schools!
Very practical example: You can make Quicksort faster by switching to a different sort in the recursion step when the size is small enough (like, below 10). I think it's called hybrid quick sort.
Yeah, I think there's another along those same lines called introspective sort
Timsort is a popular one
I always use what is called a "network sort" for small N. Network sorts are essentially a hard coded optimal sequence of compare-and-exchange operations, optimal in both number of comparisons ("length"), as well as maximized parallelism ("depth") .. best case, worse case, all case times are equal as well. I will only accept subjective refutations to network sorts being the best possible sorts for small N, for example they have the longest description (sequence of hard-coded taps) and thats a subjective downside, vs say the conciseness of a merge sort description.
Yeah. The sort in the STL part of the C++ standard library generally uses introsort which switches from quicksort to heapsort when the recursion depth exceed a current level (the logarithm of the size); and if the data set has fewer elements than a certain threshold it just uses insertion sort.
indeed! qsort from stdlib.h in C does this, as the person above me basically said
That interview story really gets my blood boiling. I can't help but feel like there's systemic misinformation about optimization, people often deduce performance based on assumptions rather than observations. Implementation details are everything when it comes to performance, its the reason an optimized brute force algorithm can easily out perform a poorly implemented "smart" algorithm, while simultaneously being cleaner. Not that smart algorithms are bad, but you should actually verify optimization rather than assume it must be better.
I've seen a pattern with a lot of engineers who haven't actually done any real performance work, but really want to act like experts on the subject. This was just another case of that.
I don't think of myself as an expert either, but I've taken point on it for multiple game titles, and was hired at Google for that.
This reminds me of a story. In 1933, the animator Milt Kahl went out with a stop watch with the goal of measuring the speed that people walk at. He discovered regular people walk at 12 frames(That would be a step every 1/2 a second). But other animators would argue with him, and say people take a step every 8 frames.(1/3 a second) This was because 8 frames was easier to divide and required less work from the animator.
All they had to do was test it for themselves and see.
@@simondev758 Your understanding was perfectly on point - with the main "gotcha" being that once the "known good" domain-space of the simpler algorithm with poor scaling is passed, you definitely want to have a fallback algorithm which scales better with greater inputs.
not even necessarily a ready implementation with active processing time due constant checking, as much as a very helpful suggestive comment
As otherwise you end up with a "pie in the face" moment where presumptions of past limitations - and optimizations around those past limitations - results in FUTURE changes to those limitations (by other people) running hilariously suboptimal algorithms within those new limits.
BUT WAIT
these are all dreamy-dreams of codebases that are NOT REAL
nevermind what I just said don't implement fallback algorithms in your spaghetti code in case people want to suddenly expand the limitations of your thingymabob
it's never gonna happen the future doesn't exist you'll never be there we only live here and now in REALity
@@TheDoomerBlox well this sure hasnt happened before :DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
It's dogmatism.
My favorite example of "O(n) is a meaningless metric when n is small". Key observations:
1) multiplying 2 numbers (as we learn in school) takes O(n^2) steps;
2) multiplying 2 numbers is pretty much the same as calculating a convolution of those 2 numbers (after splitting them into 2 lists of digits);
3) you can calculate convolution using Fourier transform (takes O(n^2) steps) + n times single-digit(-ish) multiplication (takes O(n) because we multiply single digits which is O(1) lookup) + inverse Fourier transform (takes O(n^2) steps);
4) ... but you can use fast Fourier transform instead to reduce complexity from O(n^2) to O(n log n) steps;
5) considering facts 1-4, using FFT you can multiply 2 numbers using O(n log n) + O(n) + O(n log n) == O(n log n) steps, which is faster than using school multiplication algorithm.
All 5 facts are mathematically provable, nothing wrong there.
... but for sOmE WeIrD rEaSoN, computers don't use FFT to multiply numbers, at least until numbers are bonkers huge. Hmm, probably software and hardware engineers just don't know about this neat trick
I watched a video a while back about the Karatsuba algorithm being the first to break O(n^2) steps, which performs integer multiplication in O(n^log3) steps, but it's only faster than schoolbook long multiplication for huge N-bit integers. I was thinking about it through this whole video.
Really makes you wonder why we teach kids that slow n^2 algo when the fast Fourier transform is right there!
@@tsawy6 I taught my nephew FFT and he is blazing through his arithmetic with numbers 1-20
@@sohn7767 Thank goodness you didn't teach your nephew regular Fourier transform, that would cause huge slowdown, almost by an order of n
@@sohn7767I mean, duuuuh! It's right in the name. Gotta math fast boiiiii ⚡
I was a TA in a Computer Science class a few years ago, and when I did a presentation on BigO notation, I made a point to remind people that there's there's always (at least) a constant k scaling up the time costs of one algorithm or the other.
For example, k*x > x^2 when k is very large. I illustrated this by drawing a very large letter "K" on the board in front of the formula.
Yeah, a lot of people look at Big O as a way to sort algorithms from best to worst.
All it describes is how well it scales to increasingly large data sets.
As the end of the video tells you, just never rely on big-O alone. I did that once, choosing a hashtable for a situation where data is frequently looked up but rarely added or removed, because it is O(1) (it's a little worse for collisions, but the table would never be loaded by more than 50%, so collisions would be very rare). I didn't even consider other solutions. Years later, just for fun, I wanted to test how much slower a sorted list would be at lookup, after all it is O(log2 n), only to find out that it beat the hashtable by a long shot; it was much faster. So I calculated at what point the hash table would take over, and it turns out I have to have over 300 entries before the hashtable gets faster (I benchmarked that and the benchmark confirmed it). The thing is: Due to other technical limitations, it was impossible to ever have more than 256 entries, and even numbers over 50 would be pretty rare. Why was the hashtable so slow? Because of the hashing. In the time it took to hash the data for a search, a binary search of a sorted list would have found the entry long ago. However, a good hashing is indispensable, because with a bad hashing there are many collisions and the hashtable then becomes even slower.
You should never rely on big-O alone PROVIDED you are a software engineer or applying asymptotic analysis to algorithms you write to be used. Asymptotic analysis is applied to computer science from a theoretical perspective, it is used in the theoretical study of algorithmics and data structures. It wasn't ever meant to have any claim on the practicality or non-practicality of specific implementation of algorithms. Algorithmics is a theoretical/mathematical subject, and asymptotic notation certainly has its place there. The problem occurs when say software engineers (understandably) want to use the same general ideas of "faster" and "slower" implementations, but slowly fudge the meaning of big-O so much that it is no longer accurate. And then claim they understand big-O notation.
@@moatef1886 I mean, the alternative is that you rely on gut instincts and misinformation. It's better to use big O responsibly then not to use it at all just because of some pseudo disciplinary boundary.
@@moatef1886 IMO software engineering not being a 'real' engineering discipline (read: Not relying on physical systems, not being primarily done by engineers, not requiring an engineering background, etc.) implies that it is just an application of computer science in an ordered, process - driven, formal manner. If that is your definition of software engineering, and mine is, then there is no disciplinary boundary. Software engineering would be akin to laboratory work in the same way experimentation would be for a physicist or chemist.
Red-Black trees or similar binary search will be faster than hash tables at smaller number or elements yes. In my experience, in C++ the unordered_map will be slower than the map depending in the context. Could still be faster after thousands of elements.
A good hash may depend on the data being hashed.
It might even be the case that no hashing is required. If you know the normal data we'll enough, you might be able to use some internal aspect of the data as a hash and still get the benefit of a hash.
With 256 entries, a single byte in the middle of compressed data is sufficient as a hash. Even a CRC-8 on a smallish (but unique) text field would likely suffice, and be computationally efficient.
Knowing this ahead of time is called domain knowledge.
Occasionally, based on context, an O(x²) operation like bubble sort, will reliably outperform an O(nlogn) operation like merge sort, regardless of the size.
One example - you have a huge list that is almost sorted. You know that one entry was added to the end of the list recently. You don't know where it belongs. A properly written bubble sort will get the list in sorted order in O(n) time, despite the fact that bubble sort is an O(n²) algorithm. A merge sort will still take O(nlogn) time, with a significant extra load applied to set up.
Knowing the situation changes the relative speed of operations.
In a database, a btree table always returns results that are sorted in accordance with the table.
However, an isam table returns older entries that are sorted by key in accordance to when the table was last maintained, but any newer entries added to an isam table after the last table maintenence procedure happen at the end of an unsorted table listing, and if there are no new entries, the listing will be sorted on key.
Even with a btree table the database administrator in theory might change the table's key, but perhaps in a minor way. In either situation, a listing can be mostly sorted but not completely sorted, and a bubble sort technique could be the fastest sort type.
I'm really glad that you explain the difference and fallacies for Big O notation for small numbers too!
I've been bitten by this misconception of co-workers too, especially when CPU cache-friendly algorithms with worse Big O notation are just way faster for certain use cases
Not just Cache friendly, but preload-friendly may matter.
Quickort (and most other O(nlogn) algorithms) may be better in big-O notation, but every compare requires loading from main memory in a way that cannot be predicted easily by preload operations. Even Level 2 cache won't help if the data is big enough.
Bubble Sort may be worse, but the data will likely already have been preloaded into cache by the time it is needed.
Depending on the specifics, bubble sort might even use so little of a multi threaded CPU that an entire other thread can run at full speed.
Yes, these times are constants, but we can be talking hundreds of times faster (plus potentially bogging down an entire CPU and RAM channel doing nothing but reading/writing memory)
As a Civil engineer currently working on a highway bridge I have found this dismissing of coefficients and scalars so liberating! We are Big O(n)!
Your channel is so under rated. I’m a cs student and hated data structures and algorithms before watching your videos... now I can’t get enough. Thank you Simon.
Programmers and engineers can get touchy about Big-O. I also once mentioned to a senior dev that O notation is accurate as n approaches infinity and they got pretty emotional about it. Probably because I have no professional experience? So I backed up and pointed out that even if an algorithm is O(n) and n was equal to 1 million if our target platform is powerful (like the average cellphone) it will still execute quickly. But that backfired because he never considered that Big-O was being applied to a hypothetical machine and argued that O notation was "just math". I wasn't in a job interview but things still got awkward.
Yeah not really sure what it is, but it's a weirdly touchy subject with some people heh
@@simondev758 it might be because a lot of programmers try to pass as mathematician. Who knows.
@@FlareGunDebate Or it might be because of some programmers that LOVE micro-optimization and stupid games like "instruction counting". The so called "engineer", but in reality computing is a science, you have to observe, you take the performance report, then you formulate an hypothesis, then you change the system to observe and repeat until you get your target performance adequate for the task at hand. Trying to theorize performance of a physical system (aka, an actual computer) using formulas is a big no, because there is a lot of unknown variables in the system that isn't captured by a simple model like Big-O.
Big-O is just a tool. Silly Google engineer was wrong, and Simon was right.
@@monad_tcp don't know what flare gun said but you should have an idea of the big o when you create an algorithm IF you the algorithm may (even unexpectedly) be used for large inputs.
It's a programmers first rule not to optimize first but I'm pretty sure that only means when we aren't competent. For example, throwing a kd tree at a problem without really understanding the kd tree will result in bad code.
However have another programmer proficient in kd trees work on it and boom, efficient method.
To be fair, from the Google engineers pov, small inputs barely affect runtime. If you have slow code that barely runs and has barely any inputs usually it won't be a problem compared to even efficient algorithms on really large inputs. Therefore it's more important to consider the larger inputs. (Naturally this isn't a one size fits all argument, if no arguments are large there is no reason to worry about efficiency. [Though it really doesn't matter about speed when your only making a calculator lol])
@@monad_tcp Ironically actual engineers are very aware that nothing you design will end up working exactly how you think in the field and half the job is knowing which level of detail is actually worth dealing with.
You know you're a _real_ professional when you have temper tantrums over a technical discussion with a peer.
On a brighter note: Your content is amazing. It's very well polished, presented with a perfect balance between brevity and detail without losing neither educational quality nor entertainment value. Thank you for your hard work, I automatically click your vides when they show up on my feed.
Hah!
I had a quite similar interview with the big G many years ago, making almost the same observation and was met with the same disbelief. I wasn't hired, but I had a quite long and productive career anyway. This was a really clear explanation, and yes sometimes the details and specifics of the constants can't be swept under the big O carpet.
That might explain why some internal sort implementations use Insertion Sort for small arrays and real Quick Sort implementation for big ones :)
I'm considering this studying, as this is tangentially related to what I'm supposed to be doing right now :)
So essentially, when in an interview, just play dumb if your interviewer is not receptive to learning.
I dunno if that's what I'd recommend. I'd guess that most of the time, you'll do better by going beyond and showing you've got some depth to your knowledge. But sometimes, like in this case, it'll backfire.
so, just like school?
In my last on-site interview for Google, I ended up correcting my interviewer, as he was claiming things about tail-call elimination that would not have applied to the code on the whiteboard that we were discussing. Then we ended up drawing stack diagrams. Then I worked there for 5 years or so. So, at least some anecdata that shows that it's useful.
Thanks for a good counter example! Most interviewers I've spoken to, at Google and at other FAANG, were great, this one happened to make a good story. I still ended up at Google for 7 years myself, so it worked out still.
@@simondev758 that is when you whip out the definition of Big-O and demonstrate your point mathematically.
I have been enjoying your content. You have so much to teach about game design, software development, and mathematics. I appreciate it very much that you have posted on youtube for people to learn for free. I will be buying you a beer, friend!
Awesome, thanks! Been a bit slow about new videos (doing a house reno, can see on my IG), but hoping to have a new one on Monday.
Wow....the worst part is you were NOT wrong. Big-O is a notation that describes the limiting behavior of a function when the argument tends towards infinity, it doesn't say ANYTHING for small or limited values.
Maybe the interviewer was evaluating his reaction to being faced with someone who is being aggressively wrong. SimonDev passed the test by first trying to explain, not becoming angry, and dropping the subject gracefully. He did get the job offer in the end.
@@Wambueducation Correct.
they also very often tend to not realize that even a O(n²) algorithm will tend to be faster on actual hardware than even O(log n) types algorithm for small values of N ( could be up to millions) because of stuff like cache optimisation, memory locality branch predictions, and other kinds of optimizations.
they also tend to ignore the time a single iteration take.
yes O(log n) will have less iteration than O(n^2) but if a single iteration is thousands of time slower you might prefer the O(n^2) one
Watch last few minutes 😁
@@simondev758 haha yea hadn't finished lol
You generalize, but can you give a specific example?
@@OFfic3R1K often times when doing binary operations you're using heap allocated objects that may be in random parts of memory. Since you'll most likely be doing pointer dereferences to access the objects the memory controller will need to access random parts of the ram. Ram can take 100-1000 cycles of the cpu waiting for the data to be able to continue processing. However, if instead you had all of the data in one continuous part of memory you could iterate over the data and because you're not doing random jumps the memory controller can see that you're fetching blocks of the same type of data continuously in a stream. Basically if you're accessing data all in a row instead of randomly throughout memory the computer can predict what memory you want before the code execution manages to get there.
@@OFfic3R1K, many quick sort algorithms will use an O(n²) algorithm for small enough arrays.
I also work in game development, and my philosophy for these types of things is to first consider the size of the data sets the algorithm will be working with, then consider things like Big O, and lastly test at least the top 2 choices I have to verify what I expect to happen. Because of what you point out in this video, often times a "better" algorithm will be inferior in your particular use case. But, it's hard to know for sure until you test it.
Definitely, profiler always has final say
@@simondev758 I feel like all of this is ties in nicely with Amdahl's law - "the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used". We can sit back and say "wow; nice optimization here!" or "ah yes, the asymptotic runtime is much better here", but without thinking of the context of the optimization or algorithm they can be exercises in mental gymnastics instead of providing a true benefit to the product.
Always refreshing to see videos and comments like the above. I've been focusing on how to retool how I conduct interviews the last couple years (game industry) and I think it's important to keep these types of wisdom in mind.
I always found big-o to be kind of a tool to do initial guess rather than any super accurate analysis. Like, you think of an algorithm, you can in your head calculate the O complexity related to inputs that you don't know how to constrain, and try to gauge if you think the complexity is needlessly large, and if you can do better.
For constrained inputs, O(1) is always an option.
If the problem is finite, a lookup table is always theoretically possible ;)
Memory vs time optimization are different things.
isn't O(1) the 'only' option for constrained inputs?
@@travcollier I remember reading about how some complex algorithm with only one float as an input found it actually faster to interpret the float data as an integer and look it up in a LUT than it was to do the math on the fly.
@@sycration Yes, it was (might still be) a common optimization for things like trig functions... Especially in old video games where 'good enough' is the precision requirement. ;)
@@travcollier This is kind of the case in games in general as they usually do constrain the input sizes (often via settings). Thus what you really care about is whether the given profile can calculate frames in an acceptable period of time on the target consumer hardware. This of course being due to the fact games are very realtime in nature so ideally you want to cap the frame processing time somewhere. Probably in the range of 16ms on recommended hardware and 33ms on minimum hardware ideally as you have 16.66 ms at 60 fps and 33.33 ms at 30 fps to calculate and draw a frame. Any more and the game is likely to get uncomfortable to play although it depends somewhat on the game too with RTS games the limitation is often less about drawing frames in time as it is time dilation as entity counts grow but the game would still grow too glacial to be fun without such constraints.
Excellent video! Small comment: Big-O notation is actually an upper bound, not the 'lowest' of those upper bounds. So any algorithm that can run in x^2 operations is O(x^2), but it is also technically O(x^3) or O(x^100). This means that on any exam if you're asked for the Big-O complexity of a given algorithm, you can just write down O(x^x^x) and always be technically correct.
What MOST people (this video included) describe as Big-O notation is actually Big-Theta notation (tight upper bound), and is a much more useful description in practice.
You might want to rewatch, that's specifically mentioned at 6:09. I mention there's the proper definition of O(n), which is what's covered in the video, and then the sort of "colloquial" use among programmers which corresponds more to sort of Big-Theta.
Thanks for this video! Really interesting to see how optimizations play out at smaller scales.
In a recent project of mine, I was considering using a spatial hash grid or some kind of tree structure rather than the naive O(n^2) algorithm. However, I ended up just trying the naive algo out, and since I didn't have too many agents, it performed well enough that I could still get 60 fps. Not sure if that directly relates to the topic of this video but definitely a cautionary experience with premature optimization.
Great advice! A lot of programmers get caught up optimizing parts of the code that barely register on the profiler, wastes a lot of development time.
I'm not a mathematician by any means (failed pretty much every math exam I took) and while a lot of the terms getting thrown around might as well have been klingon I was able to completely grasp the concept. I'm in my 60s and teaching myself really basic computer stuff as a hobby, this has actually made me re-think several things I thought I knew (which is always a good thing).
That's awesome, good luck and keep at it!
Very good conclusion. Also good to note that the complexity of an algorithm may depend on the structure of the input. For example, a sorting algorithm may be much faster on inputs which are already partially ordered. In the end you need to think very carefully about what kinds of inputs you expect and what the expected average case is and that is what you need to optimize for.
As always, profiler is king.
One of the first things I learned about algorithmic complexity when I was just starting out is that small scale and large scale are quite different. It's why the standard method for implementing a quick sort is to use insertion sort under a certain threshold. It's obvious how this concept can be applied to many things, but I guess they didn't send someone who knew what they were doing to interview you.
I feel like a lot of people have made this mistake, but just aren't willing to admit to it
Very interesting and accessible for a person like me who has learned programming from TH-cam tutorials and has a background in something completely different. It'd be very interesting to see more videos on the subject of efficient/performant code, and then to see you apply these principles in practice on something like JavaScript game code that you wrote.
The example you showed here is one case where the O(n) outperforms the O(log n) algorithm for pretty large examples. I think up to about 10 MB of data, thanks to caching and SIMD, linear search usually outperforms binary search. In a lot of my own time at Google, I noticed people writing slow code by using the asymptotically-fastest algorithm and ignoring this.
Duude... I was totally thinking about this same exact thing like yesterday... I love it when I just randomly think of something interesting and find someone else talking about that same thing.
One of the best explanations of time complexity i've seen, thank you!
Beautiful explain. Just spent 4 weeks in lectures learning what was beautifully explained in 10 mins
It's hard to put an upper bound on how good this video is.
the sketch-animations are a great aid to the narrative. I am liking the idea at the end and how it relates to intuition,
since we keep in mind what is relevant and what resonates, and leave behind the irrelevant bits.
I knew where this was going from the title alone. It actually concerns me that so many people don't understand it, but I've also had to explain it before. I gave the example of a custom function that sorts an array using quick sort, but before starting the function simply sleeps for 5 seconds. That function is still O(log n) - though that alone confuses people at first - yet it's clearly going to perform worse than even a bubble sort on small list. But start scaling up that input and suddenly that 5 second sleep becomes just a fraction of the sorting time, and O(log n) saves the day.
I've seen similar with people who code using a variety of OO best practices, but because they don't fully understand why those things are done they end up creating a monstrosity of interconnected classes that can't be untangled. At the end of the day, real word experience really does matter a lot, but equally it seems to be easy to go years and years without ever being challenged on these points, particularly if people job hop.
Also, how amusing that TH-cam just randomly suggested I watch this 2 year old video... Almost exactly 2 years old as well!
It's super weird that Google suddenly picked this video up. I made it almost 2 years ago, but now suddenly YT is pushing it.
Wouldn't it be more likely to be challenged on these points since you run into more peers and thus have a higher chance of encountering someone who actually knows wtf they are doing?
I have been confused about this concept for a long while now and you explained it beautifully
The one time I had occasion to write a sort algorithm of my own (in VBA), the arrays I was sorting I knew, a priori, were never going to have more than about four items (in fact that may literally have been the limit, can't remember why). There was no way I was going to bother looking up a fancy sorting algorithm and did the first algorithm that I could do from memory. I totally sympathize with that interview situation.
This is a fantastic video, and the way it was broken down visually along side the explanation of the definitions was very helpful
As a corollary to the sorting analogy people use, I think there's an even goofier variant of linear search that blows binary search even more out of the water, called "sentinel linear search". Instead of checking if you've reached past the end of the array at each step, just put the element you are looking for at the end of the array. You'll never go out of bounds, and you save many comparisons. Just need to check if you caught an actual match or just reached your sentinel.
Thanks for putting this together! It was a great encapsulation of the notation.
Its also important to point out that when working with asymptotic bounds they are describing a mathematical behaviour and even when they do provide an initial cost (n) before the algorithm becomes faster the practical implementation almost always has a higher initial cost before it becomes faster.
Just like manufacturing processes.
Thanks for the video. To be honest, i was just here to refresh my knowledge for upcomming interviews, but eventually realized something new.
Actualy, you outsmarted the google guy but he will never note that anyway great explanation dude
That's ok, I still made it in.
I bet that Google engineer got learnt from his coworkers after you pointed out that it's called asymptotic notation for a reason.
That google engineer looked it up anxiously that night and never mentioned this interview again.
Just found this channel through the JS MMO video. What an absolute gem. Thank you! Subbed with the bell on, but first I need to go back and watch all of these videos
Thanks for the video! I never understood Big-O properly or so i thought. I basically had the same understanding as the one you presented but my lectures somehow convinced me otherwise and i ended up VERY confused.
"If I got anything wrong, please let me know...(long pause) Politely." That cracked me up. Great info. Thanks for making the video!
Gotta remind people sometimes heh
Was going to say, that in general, could just use theta instead of O for average run time- but you clarified at 6:29.
But I think generally, Software Engineers are only really concerned with worst case- and avoiding that upper bound function (depending on input ofc) .
Amazing video Simon! And loved that story from google, unfortunately there was that discussion, but also great to hear you still got a job there! Looking forward to your next video!
Your videos are really eductional. This one taught me something I didn't know.
That google interview story instantly took a huge bite out of my imposter syndrome, thanks.
If you like that, I once was tasked with trying figure out why another engineer was taking 3 weeks to implement binary search.
There's no trick here, it's exactly as simple as it sounds.
That interview story was irritating. I'm a mathematician and try to emphasize the same points you made in this video to my students when we cover big-O. Your video was very well explained!
If you know the break even point, you can test for it. Sometimes below the break even point, the difference isn't worth the test time. In those cases the lost efficiency isn't re-capturable. So the default would be technique > N break even. After this, I'm going to have to get the weeds off my ankles. Thanks, good video.
Yep, this matches my experience. The first time I encountered this, I was so puzzled why it was slower
I appreciate the Office Space reference 😉 great video as always.
Have seen similar things, usually with NP completeness. Like colleagues insisting that you cannot implement say a traveling salesman algo because that's NP complete, although you're sure to have at most five elements. And when you show them the implementation and runtime, you're the guy who "proved all uni wrong" and you're the "coding deity", etc etc.
Or they don't look at your implementation and you're a liar and trickster from then on and they won't listen to anything you bring up anymore.
-_-
Neither outcome is helpful.
People are weirdly dogmatic.
That little detail can get overlooked: if you know the size of your input is constant and the thing you're implementing won't be used elsewhere on a different scale, it's O(1) and the Big-O thing becomes kinda moot.
The TSP general case isn't NP-Complete, it's NP-Hard. Factorization is NP-Complete
@@salerio61 factoring is not known to be NP-hard (and our best guess is that it isn't)
Geez, hard to believe that the hiring engineer at google couldn't comprehend that haha. Like you said, depending on the engineer, Big O can refer to the Typical or worst case performance. They often won't even consider the best case that you can guarantee with known input domain.
Don't feel bad for standing your ground if it is the truth!
This concept is also applicable even in non-contrived real-life applications. Sometimes, your input to a particular problem will never be larger than a googol or something. Hence, an algorithm with worse theoretical time complexity can still be better 99.99% of the time compared to an algorithm with better theoretical time complexity. The keyword here is "theoretical".
At other times, you have to make a tradeoff between space and time that might make the algorithm with better theoretical time complexity unfeasible to be implemented. Examples that come to mind include hybrid sort algorithms, BVH for ray tracing, and even integer multiplication. For example, using the Schönhage-Strassen algorithm is way too expensive for everyday integer multiplication.
It's not always so black-and-white as "O(logN) is ALWAYS better than O(N) and that's that!". The whole point of software engineering is to be able to weigh these different options and tradeoffs and select the appropriate solution for that particular application, guided by testing, profiling, benchmarking, etc. Sometimes, even measurements can lie (though not often), hence the importance of code review, design/architectural discussions, etc. For example, embedded systems might not have the luxury of space (or even power!) to perform heavily recursive algorithms or algorithms that require large data structures such as trees or graphs. Hence, selecting the slightly slower algorithm might be better for that specific case.
Well said!
As I can remember Big-O notation being presented in the algorithms class (back in the 80s for me), I do recall the instructor touching upon what SimonDev is presenting. It was select your algorithm to your needs. If your data set is small, go ahead with the bubble sort. it was probably easier to implement, test, and validate over a more complex sorting algorithm.
this is so easy to understand, even though i was lost halfway through it all came together at the end
If that's how you handled the interview, it shows some really great character, that you would stay calm and be sympathetic to the misunderstand of someone shutting you down like that, especially in a nervous situation like an interview.
somehow you reallymade it look simple and thats just great
Thank you for your video, and congratulations on surviving that interview despite the part where it went off the rails.
Great job as always Simon.
Cheers! Thanks for touching on that subject! Very useful with a clear statement!
Glad it was useful!
Super interesting video, happy to see your channel growing fast haha!
I also know a software engineer who optimizes almost solely on Big-O. However, I work on embedded systems with limited storing and computing capabilities. The amount of data I can receive and process is very limited. I need an algorithm that uses my small CPU very efficiently on these few data points. I don't need an algorithm which could potentially work on millions or billions of data points in a data center application. But even for such big applications, my feeling is that some software engineers too often look solely at the algorithm and not enough to the data. Maybe the data is sorted in a specific way already due to some external restrictions? This definitely changes the real-life behavior of the whole thing. Also, it's important to look at requirements. Do I want to have a guaranteed maximum processing time? Or is the throughput more important? (aka average processing should be fast). For some tasks this might actually result in the same algorithm, but for other tasks it might not. Also: How accurate does the result actually have to be? I know that software engineers like to think in deterministic ways, but if working on real-life data and when we want to create practical results, sometimes it might be more important to be accurate to a few percent accuracy instead of being fully accurate - but the processing time can be vastly smaller on some tasks.
in my comsci fund 3 class we had to write a more efficient sorting algorithm than n^3 (the task at hand was to sort pixels in an image line by line, using like its red value or something) and i technically wrote an O(n) sorting (similar but not radix sort). thing is even with sorting 405,000 pixels the O(n) sort was slower because it required the creation and run through of a 4GB array.
i just implemented a heap sort for the lab which got me a 100 but in class just to goof around on a slow week my professor actually attempted to implement my sort. heap sorts ran in like 16ms and mine took 4,000. this was on a very small image, like 635x640. we then tried it on a 3000x3000 image and the heap sort ran in 600ms while my sort took 10,000. so it seems like a failure yet the heap sort went up by 37.5x times longer while my sort only went up by 16x. EVENTUALLY my sort would have worked better but only on images that are 100,000x100,000 or something dumb lmao.
Sounds like you stumbled onto your own galactic algorithm: en.wikipedia.org/wiki/Galactic_algorithm
This is so true. I'm just a humble sysadmin, but overhead matters once N gets large enough. Process a data set big enough (300 million records) and a 50ms constant response time from the API for each record adds up! Even if you parallelize it, the API-based processing can still take longer than linear batch processing on a single host.
This is why interviews should have multiple interviewers.
Showing someone a page on "Galactic algorithms" is good way to convince them that time complexity is not always a good indicator of expected performance.
This is why I test my performance sensitive solutions with different sizes of problems to see if the performance overhead scaling is linear, sub-linear or super-linear.
A good example is standard library implementations of sorting -- many libraries use a hybrid "introsort" type algorithm, which uses quicksort most of the time, but insertion sort on small arrays because it's faster (it also will switch from quicksort to heapsort in certain cases, typically when the depth of the recursion becomes large). And of course, insertion sort is O(n^2) compared to the expected O(n log n) for quicksort.
It appears to me merge sort is more popular than quick sort, at quick glance. But indeed in hybrids like timsort or powersort.
I love this so much....... The way you explained it all was exactly how my brain absorbs it lol
This is super important in cloud computing where you might saw split your computer across thousands of tiny virtual instances running on small data sets. Like a classic serverless on Lambda solution.
People often slip through the cracks at these large orgs, like your interviewer here. Glad the system worked and you didn't have to have him as a direct colleague in the end.
A smart interviewer might still do this to test how you'd deal with nonsense.
I've done the interview training at Google, and conducted who knows how many interviews myself. This was totally off the rails.
I had a similar issue in my Google interview, the interviewer stopped told me I didn't seem to know anything about the querySelectorAll function (how I would write my own version of it), got very frustrated with me, and then I asked him to correct my mistake. He did not and then the interview was over as we had ran out of time.
Is there any chance that that last interaction was just them testing how you deal with conflict? Seeing how you would deal with somebody telling you that you are wrong even though you are actaully correct. I heard that they sometimes do tests like that, but maybe it was incorrect info?
I worked at Google for 7 years. I've done countless interviews. I've never done that, nor read feedback from someone who did, no spoke to anybody who planned to test for that, nor was trained in interview training to look for that.
What I HAVE run into at Google is an absolute abundance of people with a raging ego, who saw getting in as validation.
@@simondev758 When i first saw the title of your video, I thought you were going to say that you mistakenly said that O(N) is always faster than O(N^2), not realizing it doesnt apply to small N. I was surprised it was the interviewer who didnt know that!
LOL, I feel like this video was just a round-a-bout way to call out that snob at Google.
Hah, possibly a little... but it's been 10 years since this happened and Google was/is a huge place so it's unlikely you'll ever run into your interviewers again.
But I do like telling repeating this to people because it's a fun little story about not looking down on people and being a complete snob.
I saw what you did with the Office Space reference. Nicely done
Great Video! I'm a ce student and I failed my "languages and complexities class" once and really did a lot of studying on the second try. There are also quite a few more interesting things to this topic.
It feels so awkward that the engineer tried to put you down on something like this, especially because it's not super difficult and something students learn pretty early on.
Sry for the bad interview but honestly that gives me some confidence that these people are also just humans and not some sort of godlike, super smart, higher entity.
I ended up at Google for ~7 years, senior engineer leading teams. Don't buy into the myth that they're all superhuman. They're not in the slightest. There's some great people there, and some really mediocre ones (I count myself in this group). Honestly, a big problem at Google is just getting people to even do basic work in a reasonable timeframe. I was once asked by another lead to look into why one of their engineers had taken 3 weeks to just implement binary search.
Big-O notation always confused me when I was first introduced to it. I wasn't taught a formal definition right away, but just the informal one where it just means that f looks like some constant multiple of g. What confused me about it was the smaller values. In an Analysis course I took some years later, I was finally taught a proper definition, and was introduced to big-theta, little-o, and more. And the teacher made it extremely clear that big-O (or any-O) notation is meaningless without the phrase "as x tends to a" (where a can be any number including infinity). If that phrase is not included, then we could say the function f(x)=x^2 is both O(x^2) and O(1). The first is true when x tends to infinity, the second when x tends to 0.
Thanks for pointing me in in the direction of what math I should be learning. Always been coding, never had an interest for math. Strange combo but what can you do.
That's pretty normal, see a lot of coders who don't have a mathy/computer science background. Had a bunch who were my bosses, so didn't hurt their careers much heh.
I hated maths in school. I love it now but I struggle endlessly with proper maths notations. It's so much easier when people describe maths through code.
that's also a pretty useful thing to consider: what do you expect from the current problem? like what are its specifications? for example quick sort is obviously good, but mostly when speaking of fully random arrays, but when your typical array to sort is already almost sorted or it's reversed, it's better to use algorithms that are better in their best case
I programmed a game base not too long ago and i had such a bad code that tiny maps (100x100 squares) took way over 8 hours to generate (8 hours equaled 13% generated and getting slower for quite a while), after putting in an emergency optimisation i already shrunk it to 6 minutes and i still know a way to optimize it even further. Thanks for showing the big o because i didnt knew something like that exists
I've had the interview story happen to me. I dabble in a bit of competitive programming where micro-optimizations actually matter, and I study quite a bit of theory as part of my job, so this kind of pedantry actually barks up my tree more often than I'd like.
I was asked "Integer to Roman" and after answering, I commented that
a.) Naive string concat in Python makes my solution O(n^2); I implemented my solution using list[str] then joined them together to obviate this, which my interviewer argued with me over for 10 minutes, basically saying I was chasing ghosts.
b.) The time complexity depends on what you're basing your primitive operation on, so if you observe that 3,999 is the highest Roman Numeral and bound the input there, it's O(1); if you base your number system off of bits and ignore the 4-consecutive-symbols rule, then it is O(log n) because the number of bits in a number grows logarithmically. But no, the interviewer argued with me again and said it's O(n) because at each step you're shaving off another number???
I got the job offer but man, that was definitely the "uhhh ACKSHUALLY 🤓🤓🤓☝️☝️☝️" moment of the year for me.
A lot of sorting algorithms' time complexity (and thus efficiency) can highly depend on what you know about the expected input data.
If you're dealing with highly sorted data (e.g. you're appending a few items to an already sorted list), then insertion sort will vastly outperform any "efficient" algorithm like quicksort or mergesort despite technically being O(n^2).
If comparisons are costly for whatever reason, then mergesort will outperform quicksort due to making far fewer comparisons.
If you use a non-stable algorithm, you might be introducing extra comparisons that could otherwise be avoided with a (less efficient) stable algorithm (e.g. input data is already sorted by time, and it needs to be sorted by something else followed by time). One algorithm could be vastly slower despite the same big O scaling.
If space complexity is an issue (which it often is for big data), then you'll benefit from an in-place algorithm, but if dealing with small datasets, doing them in-place might be slower. Unused memory doesn't speed up the CPU.
Always assuming the average complexity is just as naive as always assuming the best complexity (in which case bogosort wins hands down).
Also, small datasets aren't something that should be dismissed just because it's Google. One day, you'll be writing a function that will typically sort lists with ~10 million items, and another you'll be writing a function that will typically sort lists with ~100 items, but the latter might have to run for 100,000 small lists concurrently. Picking the wrong algorithm may easily set the company back $100k in server costs, at which point practical correctness is valued a lot more than theoretical.
tl;dr The interviewer could get their head out of their ass and maybe write some production code once in a while. #doyouevencodebro
Well said!
great as always simon!
The best way to know performance is not by guessing but by stress testing
But you're right on the principal of O(f)
Hey, i remember something similar, i remember i made a little rpg game in rpg maker, and in that game every single character grew exponentially stronger when they leveled up, i then made a character who, would start out 10x weaker than the rest, but would became stronger faster, i was reminded of that character by the curve at the end of the video :)
I think the simplest way to explain to somebody how much constants matter in performance analysis to somebody is to show them two implementations of an algorithm, e.g. linear and binary search and then insert something like a wait(Duration::from_seconds(3600)) in the “faster version”, then ask them about the complexity of the algorithm again (it didn’t change), before benchmarking one against the other (you probably don’t want wait until the binary search which waits an hour for each split completes though)
Alternatively, you can make the wait time scale linearly to show how an algorithm scales in general (for some tests it's difficult to see a difference)
Are you sure that guy was an engineer? Quicksort, one of the most used sorting algorithms is a prime example, since its worst-case performance is O(n^2), but on average it's O(n log n), and faster than Merge sort, which is O(n log n) in the worst case.
The point all my programming teachers always made was that you have to *benchmark* to actually know the performance of your algorithm. Hardware details and input characteristics can make a huge difference.
And you also need to know your priorities. Sometimes the average case doesn't matter at all, if you have a hard time/memory budget (e.g real-time programming like robotics and audio, or limited memory environments like embedded devices)
Agreed, and yep, totally was an engineer. Internally, you have access to data like who interviewed you, their job and level, etc.
that's still comparing O(n log n) with O(n log n), because if you hit worst case the quicksort will be very slow. A better example is that a common implementation of quick sort switches to an O(n^2) algorithm such as selection sort once the ranges gets small enough. If O(n log n) always was faster than O(n) you wouldn't be doing that and just keep using quicksort until it's fully sorted
Should jave asked the interviewer about Python's default sorting algorithm, then. It's called timsort, which uses the "slower" algorithms on smaller collection sizes and "faster" ones on big collections. I forget which ones, specifically, but it supports your point.
It is really easy to see how this assumption that lower time-complexities doesn't hold up at smaller scales.
Let's say we have an array of 3 elements, defined as x = [1, 2, 3]. We know that x is sorted from smallest to largest, and we need to know what the index of the element marked "3" is.
First we do a simple search from start to end O(n), and the computer does the following:
Checks element 0.
Increment
Checks element 1
Increment
Checks element 2
return 2, since x[2] == 3
Now let's look at a binary search O(log(n)):
Set 0 as the left bound of the search
Set 2 for the right bound of the search
Add right and left bounds together
Divide this by two
Check divided index
Since its smaller, set left bound as that index plus one
Repeat above steps, this time return index since it is equal to 3
Clearly the second method has much more "steps" (even ignoring the fact that division is much more expensive to a CPU than addition/incrementation). Big-O is only a limit as n -> ∞
I know that Bubble sort is slow (O(n^2)), but I was perfectly happy to use it on an Arduino to sort the results of 3..5 thermometers (and use the 2nd largest). Code complexity matters as well.
Yeah, something else a lot of devs don't understand, the more senior you get, the more multidimensional the problem becomes. You're balancing performance, memory, time constraints, maintainability, etc..
My estimation was wrong about 1 thing.
I did see the:
'big-O describes better *at larger inputs* '-issue coming.
And how that was a lesson that was going to be a big part on this video.
However I also had the 'assumption of competency @ google hiring practices'
Therefore assumed (when I clicked on the video), that you would be the one making the mistake, not the other way around.
Having worked at both ends of the size scale for 40+ years I really feel for you! That interviewer simply didn't know what he was talking about.
The only way to really know is to benchmark & profile your code while working with real (or at least realistic) input sizes. Personally I do care a lot about performance so I tend to keep thinking about a problem for sometimes several days just to be sure I understand enough about it, then I'll write my first implementation. During this stage I almost always realized that some of those initial ideas were simply wrong so I then write the second version.
Back to your examples, you can often make a hybrid solution where you switch from n*log n to n*n as soon as n drops below some cut-off value, simply because that will be faster.
A classic example is quicksort where, in my experience, the best approach is to switch at somewhere in the 4-8 range, or you can put the cutoff at 3 and simply use branchless code to sort the 2 or 3 items remaining. It is however far more important to always recurse into the smaller partition first and then handle the larger part with tail recursion, i.e. looping.
Love how this was a long way of saying, "you know that thing years ago, I was right!"
I'm often wrong, so it's important to hang on to the few times I'm right.
brilliant video, thanks for the insight!