Binary Search Algorithm - Computerphile

Computerphile

มุมมอง 153 949

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 31 ต.ค. 2023
Back to basics as Dr Mike Pound explains a simple but incredibly useful algorithm, binary search.
#algorithm #ComputerScience
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharanblog.com
Thank you to Jane Street for their support of this channel. Learn more: www.janestreet.com

ความคิดเห็น • 264

@craftmechanics6483 6 หลายเดือนก่อน ⁺³⁵²
I love teaching this to my students as a guessing game.
I will think of a number from 1 to 100 and give them 8 guesses, but after each guess tell them if their guess was correct or lower/higher.
After a few rounds of playing, most people figure out that the optimal strategy is guessing somewhere in the middle, and essentially discover the algorithm on their own.
After that, we implement the gussing game on a computer.
I find this a very fun way to learn about binary search.
@DavidLindes 6 หลายเดือนก่อน ⁺¹⁸
i'm imagining doing that in a classroom, and... yeah, sounds fun! Cool!
@Armageddon2k 6 หลายเดือนก่อน ⁺¹⁹
I'm totally stealing this idea!
@mariolol8333 6 หลายเดือนก่อน ⁺¹⁴
imagine your teacher naming himself craftmechanics6483 with a minecraft pfp on youtube XD
@DarrenYung 6 หลายเดือนก่อน ⁺¹
I've been doing this exact same lesson for years too! Once you've introduced variables, conditionals and looping then you have the basic building blocks of algorithms. Binary Searching with the guessing game is a nice easy way to introduce this fundamental algorithm.
@nashsok 5 หลายเดือนก่อน ⁺²
I audited MIT's 6.0001 (Introduction to Computer Science and Programming in Python) course through OCW and the professor did the exact same demo to introduce binary/bisection search!
@emrekantar5003 6 หลายเดือนก่อน ⁺³⁶⁷
These men make me feel as if I am in a coffee house with my best friends talking
@queueeeee9000 6 หลายเดือนก่อน ⁺⁹
Dr. Pound and Dr. Grimes from the numberphile videos are my absolute favorite!
@paulsutherland3813 6 หลายเดือนก่อน ⁺⁸
Dr Pound sounds and acts identically to someone who went to my secondary school in London, cheeky lad in IT class who aced all his assignments while getting his computer to run games we weren't supposed to have access to. So, yes.
@njd9828 6 หลายเดือนก่อน ⁺²
You spelt pub wrong.
@aaabucus3104 6 หลายเดือนก่อน
Anybody who sits around and talks like this has no friends in the first place and that's just science!
@queueeeee9000 6 หลายเดือนก่อน ⁺³
@@njd9828 you spelt pubic wrong
@genelee7869 6 หลายเดือนก่อน ⁺⁷⁵
I'd love to see a video explaining the concepts behind how the different sorting algorithms work (Heap, Merge, Shell, etc)
@JanB1605 6 หลายเดือนก่อน ⁺³
I'd love a video on hash maps!
@Loki- 6 หลายเดือนก่อน ⁺¹
@@JanB1605I loved my prof I had for computer science 2 which was lots of this. He was like this video, plus comedy, and simplified PowerPoint.
@jimwinchester339 5 หลายเดือนก่อน
They exist here on TH-cam. I've seen them.
@user-zl4kk7wi5u 6 หลายเดือนก่อน ⁺³⁴
Dr. Pound is excellent and entertaining as always. I found myself smiling and giggling throughout the whole video due to the way he handles his small mistakes and jokes around. Yet he also imparts wisdom!
@KurtisRader 6 หลายเดือนก่อน ⁺¹³
The point made at 14:30 is arguably the most important part of this video. Specifically, it is less important to know how to write, from scratch, a particular algorithm than it is to know that different algorithms have different tradeoffs. Knowing how to pick the best algorithm (and data structures) for a particular situation is more important than being able to implement an algorithm on a whiteboard. One of my interview questions explores whether an applicant understands that there are many sorting and searching algorithms and the nature of the data being manipulated, and other constraints, is important in deciding which algorithm to use.
@konstantinrebrov675 6 หลายเดือนก่อน ⁺⁹⁸
We need more videos explaining in low level details how various common algorithms work, such as file I/O, B trees, image compression, files representation (PDF, audio, video, archive, ELF), Huffman coding, hashing algorithms, md5sum, dynamic programming. I mean walk through the pseudocode line by line explaining what it does, and draw the memory diagrams, like what is happening to the data, how the bits or bytes are stored, loaded, moved around.
@ross951 6 หลายเดือนก่อน ⁺⁷
Textbooks are your best bet for finding and understanding (easy to quickly reread sentences) that kind of material.
@konstantinrebrov675 6 หลายเดือนก่อน ⁺³
@@ross951 Lectures are better because then we would be able to see exactly how all the bits and bytes are moved around in the memory.
@Aidiakapi 6 หลายเดือนก่อน ⁺⁴
They've done videos on many of the topics you've mentioned.
@chessthecat 6 หลายเดือนก่อน
That's not what Computerphile is. Sounds like you should start your own channel.
@konstantinrebrov675 6 หลายเดือนก่อน
@@chessthecat Who are you to say?
@Gudsfralsare 6 หลายเดือนก่อน ⁺¹⁴³
In these times when AI is all the hype, these are the videos we need. The fundamental algos from the 60's are here to stay and run the world.
@thatchessguy7072 5 หลายเดือนก่อน ⁺²
AI was used to help find an improvement on binary search recently. It’s covered in keynote talk of the 2023 CPP con.
@theblinkingbrownie4654 5 หลายเดือนก่อน
@@thatchessguy7072wHAT?
@philroo1 5 หลายเดือนก่อน
@@thatchessguy7072 🤞 all watched over by machines of loving grace
@vatbub 6 หลายเดือนก่อน ⁺¹⁴
One thing to add: Once the data is sorted, and you need to add new data to it, binary search can actually tell you where you need to insert the new data in order to keep the data sorted. Therefore, the real issue is getting the data sorted in the first place (as mentioned in the video), but once it is sorted, binary search helps you find stuff fast and helps you keep your data sorted even when you modify it.
@NihongoWakannai 6 หลายเดือนก่อน ⁺⁸
Adding a new piece of data to the middle of a giant array would require changing a lot of memory though. Which is why we might use something like a binary tree instead.
@pacoalsal 6 หลายเดือนก่อน
Depending on the problem the performance benefit of keeping the data adjacent in memory might offset the overhead of shifting it around.
@doomanime61 6 หลายเดือนก่อน ⁺⁵
I can sit and hear Dr Mike Pound talking all day
@michaelwilson5742 4 หลายเดือนก่อน ⁺¹
Yeah, everyone on the corridor can sit and hear him talking all day
@Richardincancale 6 หลายเดือนก่อน ⁺¹⁵
One of my favourite books from almost 50 years ago - Sorting and Searching by Donald E Knuth - tells you all you need to know!
@GordonjSmith1 6 หลายเดือนก่อน ⁺⁴
This was great! It really highlighted the need for the development of 'logical' learning, and not just 'proceedual' or 'fact' learning. Actually a very 'key vlog', in all the excellent vlogs that have been made.
@kevinramadhan9871 6 หลายเดือนก่อน
I used to watch you all the time when i was in uni around 3-4 years ago. Just stumbled upon this video and glad you're still going strong! keep up the videos! it's always a treat to watch, even if i'm already familiar with the topic
@traywor1615 6 หลายเดือนก่อน ⁺⁹³
10:25 just wanna note here for anyone trying to implement this algorithm for very large arrays, be aware of the integer overflow. (l + r) // 2 will first calculate l + r and then try to divide it by two. however if your list is lets say 2/3rds of the size of your integer and if you then add l + r and l and r are quite large, then they will overflow. Then your m will be messed up and your binary search will be incorrect. And this is not a contrived scenario, there was this exact kind of bug in the java SDK, for 9 years!
I was actually surprised that Mike didn't mentioned this :O
@jlappala 6 หลายเดือนก่อน ⁺²³
We try to teach the l+(r-l)//2 -method to avoid this.
@solqr6521 6 หลายเดือนก่อน ⁺²²
It was in Python, where there wouldn’t be overflow
@johnvriezen4696 6 หลายเดือนก่อน
@@solqr6521He should still write the code the l + (r-l)//2 way because code will inevitably be found and copied to a platform that has 32 bit integers (e.g. C) and then break.
@raphaelfranzen9623 6 หลายเดือนก่อน ⁺¹
@@solqr6521 Why wouldn't there be overflow in Python ? An integer has only a finite size , so if you add more than it can hold , it should wrap around (overflow).
@krajsyboys 6 หลายเดือนก่อน
@@raphaelfranzen9623 Integer overflows don't happen if you are using just python because they allocate arbitrary amount of data to the variables, so if a number should have overflowed, python allocates more momory to store the variable in
@daveingerslev 5 หลายเดือนก่อน ⁺²
Finding theft (or damage) on CCTV footage is a fantastic real world use for binary search. As long as something visually changes (i.e. goes missing or gets broken), then it turns the task of watching the cause into a 5 min task instead of hours/unjustifiable.
@coolwarfare 6 หลายเดือนก่อน ⁺¹
Thank you this channel is going to save me before GCSEs
@thuokagiri5550 6 หลายเดือนก่อน ⁺¹⁴
prof Pound is the Richard Feynman of Computer science
@olafurfo89 6 หลายเดือนก่อน ⁺⁷
Currently doing my second year in computer science and you made me realize when I'd want to use BST in projects.
Sort in morning, many lookups throughout the day.
Makes so much sense but I hadn't realized it, Well done !!!
@logosimian 6 หลายเดือนก่อน ⁺¹
I already knew everything in this video, but I still watched it.
@agoatmannameddesire8856 6 หลายเดือนก่อน ⁺⁵
More Dr Pound videos on search / sorting!
@chrisogonas 3 หลายเดือนก่อน
Simply and elegantly illustrated! Thanks, folks 👍
@sambenjamin7515 6 หลายเดือนก่อน
Always happy to see a Mike video! Great refresher.
😀
@SparkysWidgets 6 หลายเดือนก่อน ⁺⁵
I changed a linear search on a calibration step for a product on the production line to binary searching. This small change netted millions of minutes of production line weekly(test used to take minutes and we produced over 1million units a week) back to the CM. The main point being is it doesn't have to be a large data set it could be a small known dataset but that each takes time to search or in this case "test".
@robblake8999 6 หลายเดือนก่อน ⁺⁶
It's an excellent point to remember that sorting is O(n log n), wheras a linear search is O(n). Linear search could definitely prevail for a small number of lookups.
@IceMetalPunk 6 หลายเดือนก่อน
That's *if* your data isn't already sorted, of course :)
@turdwarbler 5 หลายเดือนก่อน
@@IceMetalPunkNo, for a small number of items, the exact number is hardware dependant a linear search will beat everything apart from a direct index access. linear searches on arrays are very cache friendly, as you walk the array, the CPU is already pre fetching 64 bytes ahead of you. We do this all the time for fast lookups of small tables.
@IceMetalPunk 5 หลายเดือนก่อน
@@turdwarbler Cache lookups aren't considered in algorithmic runtime analysis.
@turdwarbler 5 หลายเดือนก่อน
@@IceMetalPunk Well thats where theory and reality differ. In HF LL trading we use linear lookups all the time because they are faster period, for small arrays. Even without a cache they would still be faster, no set up time, no complicated logic, simples.
@IceMetalPunk 6 หลายเดือนก่อน ⁺²
Wow, I can't believe there's never been a binary search Computerphile video before now! 🤯
@artofcomputing-art 6 หลายเดือนก่อน
Hell yes! Here comes another great Computerphile video 🎉
@erichobbs4042 6 หลายเดือนก่อน ⁺²
The point about knowing when to use an algorithm is the most important bit of knowledge to take away from this video. I've forgotten how to exactly write plenty of algorithms, but those things can always be looked up.
@gerokatseros 6 หลายเดือนก่อน
I really like the way you explain everything ... making it so simple
@Lion_McLionhead 6 หลายเดือนก่อน
Pounder can make even this basic algorithm sound complicated.
@Chocoholic1337 6 หลายเดือนก่อน ⁺¹
Love the algorithm explanation videos! Mooooore!!
@sandervdrie9182 6 หลายเดือนก่อน
thanks Dr Mike and thanks the Computerphile team, for and good video would, have loved to have had a lesson from Dr Mike
@georgelza 6 หลายเดือนก่อน
love these discussions.
@GilesBathgate 6 หลายเดือนก่อน ⁺¹²
Don't forget you can also use binary search to implement a container that is sorted on insert. (although there are other approaches)
@tiarkrezar 6 หลายเดือนก่อน ⁺¹
Constructing a list that way is really inefficient. Even if the search is O(log(n)), each insertion will be O(n). At that point, just use some kind of a tree data structure instead.
@rmsgrey 6 หลายเดือนก่อน ⁺¹
@@tiarkrezar It depends how your data is stored. For example, if you normally only store data in even-numbered positions, but insert into odd-numbered positions, then redo the entire list when you hit a collision, then insertion is only O(n^0.5).
But, yeah, a naive array implementation of a sorted list will have problems with insertion speed.
@GilesBathgate 6 หลายเดือนก่อน ⁺²
@@tiarkrezar I did say there are other approaches. Insertion sort, Self-balancing BSTs like AVL trees and Red-Black trees, etc...
@felipeabraga 6 หลายเดือนก่อน
Nice, was studying about it last monday... Thanks
@Salara2130 5 หลายเดือนก่อน
Well that was a nice refresher of the basics ^^
@Loki- 6 หลายเดือนก่อน ⁺¹
14:15 My analogy for going through computer science is that we add a bunch of tools to a toolkit and learn when to use which tools.
@sid6645 6 หลายเดือนก่อน ⁺¹
Binary searches over some state space with a non trivial checking condition (like some monotonic equation) is also fascinatingly useful. Also if you rectify the state by changing basis or something. One of my favourite algorithms!
@wybren 6 หลายเดือนก่อน ⁺⁵
I love binary search. If look up times are really low binary search is as good as unbeatable because of computational simplicity.
@-parrrate 6 หลายเดือนก่อน
for most problems, I prefer a strict L
@atkelar 6 หลายเดือนก่อน ⁺³
I love to show these examples in my programming courses (day job); one of the arguments that sometimes is held against more complicated algorithms is that "yes, less steps but each step is significantly more complicated"; but then you do the math and compare 100M steps vs. 23 and see that they would have to be quite a bit more complicated to make the 23 step version perform worse than the linear search one. And I also encounter people who would solve a "find in 10 items" problem with such an algorithm... Cost/Benefit analysis gets quite complicated if you don't have an idea about the number of items involved, or the number of executions required.
@NihongoWakannai 6 หลายเดือนก่อน ⁺¹
I think it's always important to learn that there is more to performance than just big O. With webdev being so popular, people focus a lot on scalability. But there are lots of situations with very predictable dataset sizes thus optimizing functions themselves becomes more important
@KeyboardSourceError 6 หลายเดือนก่อน
Thank you Dr. Pound!
@allenkwan8310 6 หลายเดือนก่อน ⁺³
This is the CS equivalent of asking a band to play the classics. Nice.
@andrewharrison8436 6 หลายเดือนก่อน ⁺¹
Built into COBOL as the SEARCH ALL command - often overlooked but fast if you know about it.
@LuigiElettrico 5 หลายเดือนก่อน
Nice explanation, thanks.
@hashtagPoundsign 6 หลายเดือนก่อน ⁺⁷
There are 10 types of people, those who watch Binary Search videos, and those who don't.
@housellama 6 หลายเดือนก่อน ⁺⁵
There are 10 types of people in the world: people who understand binary, those who don't, and those who know this joke is trinary.
@nosuchthing8 6 หลายเดือนก่อน ⁺¹
Clever
@LupinoArts 6 หลายเดือนก่อน ⁺⁷
The box next to "16" could contain 16 again...
@conorstewart2214 4 หลายเดือนก่อน
This is also how a type of analogue to digital converter (ADC) works, it is successive approximation (SAR). Interestingly when used for ADCs it corresponds directly to binary. The first step where you have a comparator with half the max voltage and the input voltage, the result from the comparison is the most significant bit (MSB) of the value, then when you perform the next comparison it becomes the next bit and that continues on until it reaches the closest that the ADC can get. This method relies on a way to create all the intermediate voltages for the comparison so SAR ADCs contain digital to analogue converters (DAC) too.
Also due to the nature of analogue voltages, similar to real numbers, the search will only very rarely be able to end early, since a comparison between two analogue voltages or between two real numbers is very unlikely to result in them being equal and may be impossible if the comparator only output less than or greater than. This does mean that these ADCs and real numbers will pretty much always take the same number of steps to complete.
@kuro4841 6 หลายเดือนก่อน ⁺¹
would it be an improvement to first check the first and last index to see if the given number is even in range before looking through the boxes?
also would it be advisable to place the first lookup point based on where between highest and lowest it is?
so if its closer to the lower bounds, then maybe starts dividing a little sooner like at 40% of the dataset?
would this add too much performance overhead or would it possibly an improvement?
i know that division is a comparatively costly operation. I would be super interested on someone elses input on this :)
@nosuchthing8 6 หลายเดือนก่อน ⁺²
The phone book analogy is SO powerful
@LucasundAaron 6 หลายเดือนก่อน ⁺¹
in python i prefer using l=0 r=len(arr)
now right is exclusive and points on invalid greater parts.
the only +1 indexing will happen in the main loop now.
you only check the end of valid array is 1 pos further with:
„while l+1
@Bianchi77 4 หลายเดือนก่อน
Nice info, thanks :) 👍
@Lampe2020 6 หลายเดือนก่อน ⁺¹
If the list is in order you can even make it simpler to determine whether you search further by opening the first and last box and if the target number isn't between those in value it's not going to be inbetween those boxes either.
@LiarsPendulum1 6 หลายเดือนก่อน ⁺¹
I am teaching myself programming and was literally practicing writing this kind of code yesterday, working on data sorting algorithms. So glad this is being written in Python. I started with C and this is a great window into how C maps on to Python.
@JuliusUnique 6 หลายเดือนก่อน
simple yet efficient way of searching, doing it myself that way since I basically can think
@mightytitan1719 6 หลายเดือนก่อน
I watched your video of steganography can you make that in detail to see how algorithm works
@MuhammadRaza-yd6sg 6 หลายเดือนก่อน
Been just thinking about this lately
@mjbirdClavdivs 6 หลายเดือนก่อน ⁺¹
I think you get the color on your hands from touching the paper before the ink has dried. How humid are the rooms you're in?
@heliospace 6 หลายเดือนก่อน ⁺¹
This is the first time an algorithm seemed useful
@cscscscss 6 หลายเดือนก่อน ⁺⁴
Great video! I understood it very well.
@akiraincorrect4968 6 หลายเดือนก่อน ⁺¹
Did you watch it on 13x ?
@cscscscss 6 หลายเดือนก่อน
@@akiraincorrect4968 Yes
@MegaDonaldification 6 หลายเดือนก่อน
First of all, thank you for agreeing that the mere fact one is enhancing their skills makes it a career move. Most employers are notorious for despising early beginnings.
@o-hogameplay185 5 หลายเดือนก่อน
would it be a good thing to check the last element of the array, and see if it is bigger than the number we want to serach? because if it is smaller, than the list do not contain the number, so it can just return with false
@JeffKaylin-ft5cx 6 หลายเดือนก่อน
If the search gets REALLY big, I assume the data will be read in a record of fixed length. Would checking the Last data in the record be useful? Like if you're in the library and you may have to switch rooms or floors, should you check the last book so you won't have to come back to that room later when you notice you were SO close?
@ANoBaka 6 หลายเดือนก่อน
If you for some reason have knowledge of the distribution of the values in the collection, perhaps by calculating it while sorting it, would it be worth it to use the distribution to find the number faster by making more educated guesses?
@FalcoGer 5 หลายเดือนก่อน
@7:00 l+r might be an integer overflow. Depending on your language that might be undefined.
@gsestream 6 หลายเดือนก่อน
make it faster by giving ranges for each of the boxes, thats then only one step to get to the right bix, which can be sub-divided to another range boxes, its better than log(n) of binary search, having ranges is better than just having the numbers in order, if you have different boxes for all different numbers, then the search is O(1), its bucket search, essential is that you pre-calculate the ranges and not during the search. works more like heap, which does the processing on the background and not during the searches/updates. if you have unique box for each different number, at a known index, which you can jump to, directly, then it works like radix sort for sorting. radix search for buckets might be super duper fast.
@BrankoDimitrijevic021 6 หลายเดือนก่อน ⁺¹
Interestingly enough, linear search is likely to be faster than binary search for a small array of integers, due to prefetching and branch prediction mechanisms of modern CPUs.
@diagorasofmel0s 6 หลายเดือนก่อน ⁺³
This was so informative and capturing, surprisingly my attention span gets a bit longer when prof Mike Pound speaks lol.
@j7ndominica051 6 หลายเดือนก่อน
A typical table will have entries in added in chronological order, and the primary key doesn't have a particular meaning. Does making presorted copies of the database for every column pay off in practical applications?
@Jet-Pack 6 หลายเดือนก่อน
If you don't just half the array but instead assume an even distribution and do a linear interpolation between the item left, right and the searched number... how much does it speed up the algorithm and what is that called? Interpolation search?
@user-vv3tc7yk7f 5 หลายเดือนก่อน
Ideally, if a student was learning this, even tho it seems that iteration is the most obvious way to implement this, would u encorage them to make a recursive functin?
@bishboria 6 หลายเดือนก่อน ⁺²
I taught this to my 1st year students less than a week ago.
@GTCrais 6 หลายเดือนก่อน ⁺¹
Not only did you swap X for A, but you also called your list variable "lst", which in that font looks like "1st", which makes it even more confusing :D
@bryan69087 6 หลายเดือนก่อน
MORE MIKE POUND
@user-lw7fj4pq9c 6 หลายเดือนก่อน
So I built this with the explanation before I found out he had actual python code in there, and I did it slightly differently. Why use floor division when you can just do a binary shift on "m"? Wouldn't a binary shift be exponentially faster considering that computers aren't really optimized for division?
@pandaqwanda 6 หลายเดือนก่อน
i love Computerphile :)
@Mode_na 6 หลายเดือนก่อน ⁺⁴
Shouldn't you check the first and last item in a sorted list to make sure the number you're searching for is in range? It should be more efficient by potentially circumventing the need of running a search algorithm all together.
@f.f.s.d.o.a.7294 6 หลายเดือนก่อน ⁺²
Only if that is frequently the case.
@ligmoes 6 หลายเดือนก่อน
Indeed. when its out of range it shrinks in a linear fashion until the ranges swap and returns false but also works for when is in range and is not on the list. However, that is a matter of implementation.
Here's same function with your solution:
def binary_search(lst, a):
l = 0
r = len(lst) - 1
if a > lst[r] or a < lst[l]: return print("Out of Range")
while l lst[m]:
l = m + 1
elif a < lst[m]:
r = m - 1
else:
return True
return False
@neilmehra_ 6 หลายเดือนก่อน ⁺¹
It doesn't really impact performance as log time complexity is minuscule.
@alibabascience9406 6 หลายเดือนก่อน
we need to sort array in the first yes?
@CoreyOgburn 6 หลายเดือนก่อน
Fun fact: if you're searching through a list of unknown size (maybe a stream of data) that's ordered then you can do a binary search by checking powers of 2 in a similar way. If the value is greater than m, m *= 2.
@lasersimonjohnson 6 หลายเดือนก่อน ⁺¹
Love a good Pound...ing 😁
@aikumaDK 6 หลายเดือนก่อน
O(log2(n))
I wonder how much this would change, on average, if you start by seeing if your number is smaller than the first entry or bigger than the last entry.
@TwoTeaTee 6 หลายเดือนก่อน ⁺¹
Calculating M can cause overflow. So, a better version could be -> M= FLOOR(L+ (R - L) / 2)
@seapearl3175 27 วันที่ผ่านมา
So maybe reading the r and do a rough percentage estimation might be even faster, right.
@Atrix256 6 หลายเดือนก่อน
Do interpolation search next!
@FinepixF30 5 หลายเดือนก่อน
This looks easy on a sorted array but what about unsorted data? sorting it first in memory takes up computation cycles not considered here.
@peterboccia 6 หลายเดือนก่อน
Great video, but I sincerely appreciate when I studied it in C, more pure to me. But anyway it's the same. Great job
@cannaroe1213 5 หลายเดือนก่อน
If you use the values of l and r (call them L and R), then you can do a weighted binary search which usually performs a little bit better than one that defines m = (l+r)//2
So at 3:13 a weighted binary would say a = 17 , l = 4 (0-based), L = 16, r = 7, R = 9000 (unchecked in video so i made it up)
Because a is very close to L and very far from R, you find m as usual but then bias it more in favour of L than R to the point it shakes out as "5" and so finds a immediately, and you don't have to make L = 10 :P
@JeanNoelAvila 6 หลายเดือนก่อน
What about n-ary search and what is the optimal n?
@silaspoulson9935 6 หลายเดือนก่อน ⁺¹
Interesting video, thanks mike! Surprised python doesn't mark variables as sorted
@traywor1615 6 หลายเดือนก่อน ⁺⁵
I guess there would be lots of overhead and bookkeeping, since every operation on the list could destroy the sorted property.
@goguhu 6 หลายเดือนก่อน
And obvious if you have a type of distribution (random, bell curve, ...) you could estimate the location based on the first/last value in the range to check ...
@jeromethiel4323 6 หลายเดือนก่อน
You don't need to sort all of the time. Before you put new data into the list, you search first, then insert the data where it belongs in the list. So you are using a binary search to insert an item into an already sorted list. Which is much more efficient than adding data, then resorting.
Heck, i implemented this in basic on an Apple ][ back in the day, because sorting routines were just slow. Searching and copying to make a hole for the new data was faster.
@neilmehra_ 6 หลายเดือนก่อน
You don't really want to be using a list if you have a large number of insertions, since what you're describing has a time complexity of O(q(n + logn)). A better approach is using a Red-Black Tree or Balanced BST, for O(log(n)) insertion and query.
@justovision 6 หลายเดือนก่อน
I think you need to follow up with bubble sort. Recursion is maybe the most important thing to learn.
@garcipat 5 หลายเดือนก่อน
If you use a random function it probably will have numbers evenly distributed of order them by the numbers. that means you could also just number them from 1-1000000. it would take the same amount of time more or less i think.
@martijn3151 6 หลายเดือนก่อน ⁺⁹
In theory, yes binary search is faster. In practice a linear search might actually be much much faster. The one thing that makes a big difference is cache. Computers read ahead memory. So it greatly helps if all entries are stored sequentially. If the binary search is implemented in a tree, the entries probably are all over the place in memory, not sequentially, causing cache miss after cache miss. This will severely impact the performance.
Obviously when searching billions of entries, binary is most likely to win. But in a lot of real life situations, the lists are much smaller. Small enough to fit in a cache line. Making the search almost instant.
So never just assume what you learn in computer theory is fact, use different setups for your problem and measure(!) to determine what solution works best.
@Quarky_ 6 หลายเดือนก่อน ⁺⁶
Well, if someone thinks binary search is always faster, then they didn't really learn anything did they? The algorithm clearly gives you a speedup by halving the search space at every step, then obviously it is faster only for large numbers. One might even try to quantify this with: for n = 2^m, if m
@Pyrazahn 6 หลายเดือนก่อน ⁺¹
Another reason why linear search can be faster for small lists is that modern cpus heavily rely on pipelining and branch prediction. Basically for every if-else, the cpu guesses which of the two paths will be executed next and already executes the first instructions of that path in the background. If the guess was correct, then the results are already there for free. But if the guess was wrong, it is much more expensive to take the other path.
In a linear search, if the cpu predicts "element not found, move to next item in list", then that will be correct in all but one case. But in a binary search, the expected results are 50% left and 50% right - so branch prediction fails 50% of the time instead of just once.
@martijn3151 6 หลายเดือนก่อน
@Quarkly_ No need to be pedantic here. Log n is always smaller than n. Also for small numbers. Using binary search on a list of 10 items, worst case is 3 steps. Worst case linear search is 10 steps. Yet, as I pointed out, linear might be much faster in that case even though theory says binary search is at least 3 times faster.
So, at least to me, this isn’t trivial.
@AAjax 6 หลายเดือนก่อน
Interesting! So what would the break-even size, between binary search and linear search? I had a look and the x64 cache line is typically 64 bytes, which wouldn't seem to allow for a very useful amount, but this is all new to me.
@krzychxyz8358 6 หลายเดือนก่อน
Well, the break-even size depends on a lot of things. From your code efficiency, the programming language, the compiler you use all the way to the specifics of the data set (some data types use more computation power to compare than others). But i've never seen a case where that is above 60, so I guess it won't ever get big.
@BeesAndSunshine 6 หลายเดือนก่อน ⁺⁴
One way you could make a presorted list of random numbers is to use randint to increase an int at the index in an array equal to the generated number. Then once you have generated the quantity of integers you want you go through that array to set the values of a new array inserting x 0s where x is the value at index 0, x 1s where x is the value at index 1, etc. This would be slower than generating an unsorted list but is significantly faster than generating an unsorted list and then sorting it, because technically it's still an O(n) algorithm, and no sorting algorithm is O(n).
@Yewtewba 6 หลายเดือนก่อน
Assuming int size 4 bytes, generating ints up to 2 billion, it looks like you're trying to create about a 7.5 gigabyte array. It's infeasible for large numbers. It would run on O(m+n), where m is the size of the largest element, whereas generating and sorting would most likely run on nlog(n), where n is the number of elements.
It's important to recognize you're still essentially just sorting a random list, just with your own custom sorting algorithm.
Edit: I didn't know it had a name, it's called "counting sort", and it's apparently commonly used as part of Radix Sort. Both of those might provide you with an interesting read.
@idonno87 6 หลายเดือนก่อน ⁺¹
My take would be to create a list starting with a random number between 0 and, say 100, then add a number by adding a random number from 0 to 100 to the previous list. Runs in O(n), not great, but hey.
@Yewtewba 6 หลายเดือนก่อน
@@idonno87 add numbers one at a time in the correct order? Should be O(nlogn), binary search to find the insertion point over n elements.
@Tumbolisu 6 หลายเดือนก่อน
No *comparison-based* sorting algorithm is O(n). Radix-Sort does not use comparisons and achieves O(n). It can't be used for every type of data, but it works perfectly on integers.
@dimitrys.6563 6 หลายเดือนก่อน
What about « ponderating » the « middle » by adding two extra steps at the beginning, looking for boundaries values. With that known, instead of taking the middle for next « check » we could check closer from the left or right if the needed number is closest to the given boundary.
Do you think it would gain probability to go faster ? If yes, how much ?
Second idea : to build the random order list, could be as idea to build it sequentially ba pushing new item in list which would be « new pushed item is last one plus a random number between 1 and … let say… 3 ». You would get an ordered list without doubles and spaced from 1 to 3.
@TheKilledDeath 6 หลายเดือนก่อน ⁺¹
The first idea would work as long as the numbers are uniformly distributed among the list. Which probably happens with how this list was produced (lots of random = linear distribution).
But in any case where the list is not uniformly distributed it would get a lot slower. And lots of lists are not uniformly distributed. Take a look at Benford's Law for an example.
@pierreabbat6157 6 หลายเดือนก่อน
How fast is the algorithm in Julia, compared to Python?
@redaragon88 6 หลายเดือนก่อน
13:06 ;-; bit painful to immediately convert this lovely numpy array to a list, to sort it immediately after, instead of using the (much faster) np.sort first haha
@astropgn 6 หลายเดือนก่อน
If you try it, don't convert to list and then sort it, like it was done in this video. Instead, sort the array first using numpy (np.sort()). It is way faster because numpy has less overhead. It always knows the type of every element so it doesn't make type comparisons like python does in the background.
@pettere8429 6 หลายเดือนก่อน ⁺¹
Or just made the random numbers smaller and run a cumulative sum.
@emilemil1 6 หลายเดือนก่อน
Could have made a fake list that just returns a value y(x) when accessing index x. That way you're not memory bound since the values can be calculated as they are accessed.
@gatecreed8384 6 หลายเดือนก่อน
Genius
@mikefochtman7164 2 หลายเดือนก่อน
Somewhat related, "How do you keep the list sorted?" Why, use a binary tree of course! Somewhat like picking up a hand of playing cards one at a time. Find the spot where it belongs using binary search, and insert the new item where it belongs.
Yes, I know there are more sophisticated trees, (i.e. red-black, avt, etc...), but the crux of the matter is, if you insert items in order in the first place, then you never have to sort the list.

ต่อไป

เล่นอัตโนมัติ

Hacking Out of a Network - Computerphile