For those who might get the idea of storing age as a data field literally, never do that! Store the date of birth instead, and calculate the age at the time of the output, otherwise your data will be invalid in less than a year.
It's amazing how nicely you explained this! You didn't just dump some code on us, you explained the whole thinking process, making adjustments on the way, teaching us how to think the hashtable, not just how to copy it. I am so looking forward to watch your other videos, I really hope they will help me improve my data structures implementing abilities. Online college classes weren't too favorable for me and I am having a hard time doing my assignments in time.
He did exactly the opposite. Started off good but messed it throughput the way. Not the best at teaching I guess, however it was a valiant effort to show people how to be shit at teaching.
Man, I’m surprised you don’t have a million subscribers yet. This is the best channel on programming out there. I am currently reading “the Linux programming interface” which is a massive book and I always find myself coming to your channel if I don’t quite understand a certain topic! I hope a lot more people will recognize the value you provide here!
These videos make me fall in love more and more into programming. Although programming was my love at first sight. Great, clear and fun explanation. It was a pleasure to code along. As a beginner it was a miracle to me that you got only 1 segmentation fault (as I am not used to that lol).
i dont mean to be off topic but does anyone know of a way to get back into an Instagram account?? I was dumb forgot my password. I would love any assistance you can offer me
Initially, I was looking for a simple explanation for hash tables for my CS50 class; they barely touched on it and it felt a bit ambiguous; yet, you are making it almost water-clear simple, thank you good sir for this great content and deep knowledge. I actually took an excessive tour in your very informative channel that I completely forgot about the problem set. Love you
the speed is a little faster for me, like i wasnt able to keep up with the speed of typing. but i understood the whole concept. so this video is a miracle for me.
I saw your video because i had no other choice for hashing implementation in C. I was scared of you being fast So I had to watch it at 0.8 x.Now I have implemented my first hash code because if your help.Thank You so much . God Bless You. And one more thing You are really HandSome.
I'm really impressed how casually you're hacking this code. I was always afraid of implementing my own dictionary. Maybe I'll give it a try next time I need a dict. Well done! Of course you've still got lots of issues: - dynamic table size - table memory management (create / destroy / grow / shrink) - abstract linked list node as struct for carrying arbitrary data - create hash function for arbitrary data - ...
Leaving a comment is like, helping others that need help as this increases the reach of the video as well as the like, so make a habit of commenting on videos you find helpful even if it's just a period '.' and also, remind others with that
My favorite "hash" structure is either an array (fixed size) or (when allocating) a binary tree. If I use an array I add a binary search. Sure, you keep on moving memory when inserting or deleting, but [a] searching is O(log n) and [b] the code is pretty straight forward. Binary trees also feature a O(log n) lookup, but due to the pointers, it requires about twice the memory. Both structures hardly suffer from performance degradation. My favorite hash is FNV-1a. It's got both 32bit and 64bit versions, easy to implement, very fast and collisions are very rare. E.g. costarring collides with liquid, declinate collides with macallums, altarage collides with zinke. I think you catch my drift. ;-) I did "classical" hash tables, but buckets are simply too much of a hassle - so I can't bother anymore. I find myself using the binary search/hashtable cross-over most of the time. Very often "good is good enough".
Hello Jacob, I am already subscriber of your channel and I find it very informative everytime I watch your videos. Though i have been working in C from last 16 years, but still I learn something new every time I watch your videos. Keep up your good work and keep adding good stuffs as you have been doing, for your fans like me.
Everything is going so quickly I had to slow it down to 0.75x playtime, so then you sounded drunk so I thought I'd have a beer too. Now I'm watching Star Wars drunk.
This is so freaking great! Thank you for this! I'm a python programmer but I am always taking a peek at C because I have unaddressed urges to dabble in low-level programming sometime (maybe for C extensions to optimize my python projects). What a great way to learn C - making hash tables.
Scripting language programmers, including the unix shell, are a bit spoiled because the interpreters ships with excellent built-in hash-like data structures, like Bash associative arrays, Perl hashes, and Ruby and Python dictionaries. Even Windows Powershell ships with a fast associative array implementation. It's very useful, and practically mandatory, for certain algorithms in C to implement hashtables, and I wonder how the gcrypt library "stacks" up against simpler home-brewed hashing functions. You can't memoize a function without a implementing a hashtable in your program.
the more i have experience in c and c++ the less i think about it as low level programming. There should be another term for that. Most of time when programming in C you are disconnected from cpu architecture.
Thank for this great tutorial. For future videos, please give an additional second or two after writing a function to allow the viewer time to pause to see the code. It's extremely distracting with all the Visual Studio Code popups that cover the code you're writing as well and it's sometimes tough to find a split second to pause the video to see the code before you jump to something else.
I was coding in parallel with him and I had no problem pausing videos. I was suprirsed I wasn't getting segmentation faults all the time, but that's because every time I would wrote code which ended up being different than his, I would rewrite that part of the code lol
@@futuza I usually do pause videos, but the cut happens quickly from when the last line of code is written and the next scene begins, so it's more challenging than should be to pause just in time
This was really helpful for me, thanks! I started my project using arrays for hash table and I could already tell midway I was gonna have a hard time doing anything with the elements.
The best tutorial so far on hash function in C. Thank you. How do we come up with an optimal hash function for the data structure? Is maximum randomness the target?
Is there a procedure one can follow to find a good/optimal hash function? I usually use something like Effective Java's hashcode impl like the one mentioned in [1] and assume it's good enough. [1] stackoverflow.com/questions/113511/best-implementation-for-hashcode-method-for-a-collection
FYI: There is a thing called a "perfect hash." This is a hash that is tailored for a specific set of inputs, and will produce distinct values for each of them. If you know *all* the possible inputs in advance -- like when you are parsing language keywords -- then perfect hashes can be useful. (Ask the Duck about "gperf" for a software package that provides these functions.) For any other scenario, you are looking at doing the best you can with what you've got. Some "cryptographic hash" functions are really good at providing seemingly random output for small changes in input, but they are slow. Now you're trading off speed vs. behavior. Most commonly-used hash functions opt for some simple rules: use 2**n buckets, use prime numbers to multiply, etc. If you really want to find good hash functions, look at the source code for programming languages that provide "associative array" or "hash" or "dictionary" data types: awk, perl, ruby, python, java, c++. You can find hash functions that have been looked at and tweaked by a lot of people over many years, which hopefully provides a good performance vs. behavior.
@@austinhastings8793 Just one note. you do not use simply 'prime numbers', to multiply, you need a number that is relatively prime to the table size. If you have a size 100 table, you do not want to use 5 as your multiplier as it will map 100 objects into 20 spaces which will give more collisions. Now 3 or 7 in that case will be better.
I never thought about hash functions, or tables, in that way; which is surprising because i used to be quite enthusiastic about brown rocks from morrocco.
Been a while since i done hash tables, used a few before in work though, i do prefer external chaining think they were called buckets, quadratic is the way to go though, in large structures you can get bad clustering so what i used to do was have a variable quadratic based on the amount of objects with room to add more then just rehash when it filled up to much, the rehashing was inefficient but it did keep the structure performant for lookups and clustering to a minimum. Nice video
As a C++ programmer I use the std::map and std::unordered_map for this purpose but this video will be useful for me if in some situations I would not be allowed to use them. Thanks!
very well explained. the only thing i think could've been clarified better was the optimization with "deleted". it was brushed off as obvious but i had to pause and think why
Wow I don’t even know C and I understand this video perfectly! Clear concise explanation ! Probably also because every language I know is based on C in some shape or form haha.
Not mentioned is that clients using this facility to retrieve & modify any 'person record' must NOT change the value(s) in the field(s) used by the hash function. Correcting the spelling of "Jane" to "June" would likely leave that person's record filed in the wrong location, never to be found again... Because delete() returns the ptr to the record, to modify safely is to delete, {modify} and re-insert. The "Open" version mistakenly used "TABLE_SIZE" for strncmp(). Corrected to "MAX_NAME" in "linear probing" version.
Not gonna lie I feel so overwhelmed but I think that's because I am still a noob. Great video you explained everything so well I just have a skill issue, hope I'll get better.
Super clear explanation! The playback speed is such a great feature.. I normally use it while trying to pick up on a fast guitar lick and here it helps to slow your speedy typing down. Easier to follow ;-) Thanks
this was a good video and thanks for the effort, but it was by far one of the hardest videos to follow for me, most of it comes from me but imo there are things that can improve the quality of videos in the future, first please close the file browser tab if it is not needed and capture scene is small so more of the code is visible, second please scroll a bit slow or all and all slow the explanation process so noobs like me can follow, and last but not least programming videos without source code imo are half useful and of course, you can opt to put that as a feature for a paid option but I think that would reduce the impact of your work. all and all thank you so much and keep up the good work.
one more benefit of external chaining is that if your hash function is not random enough, it can be easily diagnosed just by looking at the linked lists's lengths in proportion to the hash table's empty cell count.
Thanks for the video, man! I'm doing Harvad's CS50 course, we have to use a hashtable in one of the problems and your explanation is much clearer and more in depth than the one provided in the course =) I have a silly question that is not relationed to hashtables itself, but I think will help me understand more of C in general: why in this video you used strnlen/strncmp and not the usual strlen and strcmp? What do the 'n' in the strnlen method stands for? I've read a bit of it online but couldn't understand it well.
Good job noticing the "n" in strnlen! I was also wondering why MAX_NAME was passed to the function and came here to post the question in the comments. Glad I went trough them first. I am also doing the CS50 and try to go line by line with Jacob's hash first to better undestand the implementation of the data structures. His examples were quite useful troughout the course. The stnlen() function though cannot be used by the makefile CS50x implemented as default since strnlen seems not to be a part of the standard C99. You can compile the code differently of course, but for the sake of playing around in CS50x I'm not sure that is necessary. In real world implementation strnlen is quite important as Jacob explains in the video.
Tries can't replace hash tables. A Trie is a prefix tree. They don't use the same logic. Yes, two nodes can share the same parent but that doesn't make it a hash table.
@@sameerplaynicals8790 I'm referring to application. EDIT: And in case that that's not clear, when I have a choice between trie and hash (and I always do for my particular application), I choose trie. So, your first statement is only true for certain requirements/uses, and that also applies to the reverse: Hash tables can't replace tries... depending on the requirements/uses.
Excellent description. Since C is now considered an "unsafe" language, it would be interesting to hear your thoughts on the security deficiencies of external chaining (or indeed linear addressing!).
C is not in and of itself an unsafe language, 'safe' languages keep you from shooting yourself in the foot, so to speak. But there is an efficiency cost to do that.
It seems like the best option would be to have a hash function that does provide a unique output for each input rather than trying to deal with collisions.
As a beginner programmer, I very much appreciate the video. Unfortunately keyboard sounds were a bit much and had me constantly turning volume up and down in to lower keyboard sounds but then hear your voice
Thanks for the comprehensive tutorial, EXTREMELY helper for understanding hash table with practices. [QUESTION]: For the hash_table_lookup function, why are you using *hash_table_lookup (the address)? I am a bit confuse on that. Otherwise, great work! Looking forward for the next update!
Terrific, Jacob. My first attempt at creating a hash function produced collisions on single letter strings ('a', 'b', etc.), so I figured my algorithm (checksum multiplied by a large prime reduced by the table size) was defective. I then started experimenting with the 'b2sum' utility (published by the FSF) to produce a digest, but I still got collisions after applying the modulus operator to reduce the size of the array. After seeing your approach here, I'm no longer wary of creating a linked list to handle collision overflow. Iterating though a 10-item linked list is cheap computationally. It seems to me if you've got tons of memory like most machines do since the 'naughts, why can't you just have a 64K-element array, and skip the downsizing of the index when computing the hash?
Yes, you can make any size list as the base for a hash table. However, unless you're in an extremely speed-critical path, I would use a hash tree with dynamic buckets. For example, your root list could have 16 entries, and you use the first 4 bits of the hash to index it. On insert, when a linked list would get over a defined size, you replace them with another 16-element list and use the next 4 bits of the hash. This gives you a tree of up to 8 (or 16 for 64-bit hashes) levels of nodes and linked lists as leaves. If your hash function is worth its salt, you get MAX_INT buckets with very small lists in a structure that doesn't start out using max_int * pointer_size bytes.
@@HenryLoenwind Thx. Never occurred to me to create a tree of linked lists. I guess this is why people take a formal course in advanced data structures. Next I will explore how Python and Ruby implement Dictionaries and hashes, because lookup speed is often critical, especially in function memoization.
For those who might get the idea of storing age as a data field literally, never do that! Store the date of birth instead, and calculate the age at the time of the output, otherwise your data will be invalid in less than a year.
thank you for reminding me
Its better to check every frame if the year changed
@@urisinger3412 I'm sure you know that not everyone was born on January 1, right?
@@RustedCroaker then ask for age in years and days
@@urisinger3412... or just date of birth as I said in the first comment.
The fastest guy to solve a segfault on earth x)
Absolutely.
It's not hard lol, you just get used to it after some time in c
@@dimi5862 yeah now I ve reached this level its been 10 month since the comment 😂
I'm dead 💀
there is no segfault, only bad code
It's amazing how nicely you explained this! You didn't just dump some code on us, you explained the whole thinking process, making adjustments on the way, teaching us how to think the hashtable, not just how to copy it. I am so looking forward to watch your other videos, I really hope they will help me improve my data structures implementing abilities. Online college classes weren't too favorable for me and I am having a hard time doing my assignments in time.
Glad it was helpful!
He did exactly the opposite. Started off good but messed it throughput the way. Not the best at teaching I guess, however it was a valiant effort to show people how to be shit at teaching.
Man, I’m surprised you don’t have a million subscribers yet. This is the best channel on programming out there. I am currently reading “the Linux programming interface” which is a massive book and I always find myself coming to your channel if I don’t quite understand a certain topic! I hope a lot more people will recognize the value you provide here!
Good move to go through Linux Programming Interface, which you can always refer to, for Programming on the Linux Platform.
He is held back by his loud keyboard ;)
@@Daniel95221 mario or jacob ? or both !
Probably the best video i've seen about hash tables recently
probably the only channel I watch at speed of .75! Thanks for the great tutorial!
These videos make me fall in love more and more into programming. Although programming was my love at first sight. Great, clear and fun explanation. It was a pleasure to code along. As a beginner it was a miracle to me that you got only 1 segmentation fault (as I am not used to that lol).
best typing sound of all tutorials on youtube
In 11:44 in strncmp() function You need to put MAX_NAME instead of TABLE_SIZE ;
Thanks a lot Jacob that was super useful :)
I instantly saw this as well lol.
yeah was confused for a bit there as to why put the table size
i love that you started on data structures thank you so much this is helping me in my courses a lot
Glad I could help.
i dont mean to be off topic but does anyone know of a way to get back into an Instagram account??
I was dumb forgot my password. I would love any assistance you can offer me
Initially, I was looking for a simple explanation for hash tables for my CS50 class; they barely touched on it and it felt a bit ambiguous; yet, you are making it almost water-clear simple, thank you good sir for this great content and deep knowledge.
I actually took an excessive tour in your very informative channel that I completely forgot about the problem set.
Love you
Thanks. I'm glad it helped.
deym! You code really fast. I'm getting movie hacker vibes whenever I hear you typing the code
It’s fastforwarded
@@soroushmasoodian Nope, he utilized the text editor well and has fast hands.
he definitely fastforwards at some points
This is the best video on Hash Tables that i ever encountered. Thank you so much for making it so clear to understand.
what Mario said.. "Amazing no one teaches programming like you do" - Must've taken a lot of hard work on your part.. Love the content.
Thanks.
The other day I read the chapter in CLRS about hash tables and it left me quite confused at some points now everything is clearer thanks alot !
Glad I could help.
Love the video, but i absolutely love watching you code with that keyboard sound. Its so satisfying
Wow, this is an excellent tutorial. Trying to brush up on my C and this content is exactly what I wanted!
the speed is a little faster for me, like i wasnt able to keep up with the speed of typing. but i understood the whole concept. so this video is a miracle for me.
This is the best video on TH-cam on this topic. Thanks very much sir
I saw your video because i had no other choice for hashing implementation in C. I was scared of you being fast So I had to watch it at 0.8 x.Now I have implemented my first hash code because if your help.Thank You so much .
God Bless You. And one more thing You are really HandSome.
Wow. There are comments, and then there are comments. I'm glad you enjoyed it. :)
I'm really impressed how casually you're hacking this code. I was always afraid of implementing my own dictionary. Maybe I'll give it a try next time I need a dict. Well done!
Of course you've still got lots of issues:
- dynamic table size
- table memory management (create / destroy / grow / shrink)
- abstract linked list node as struct for carrying arbitrary data
- create hash function for arbitrary data
- ...
I just love how your keyboard sounds
Leaving a comment is like, helping others that need help as this increases the reach of the video as well as the like,
so make a habit of commenting on videos you find helpful even if it's just a period '.'
and also, remind others with that
This mad lad is actually writing his own hash functions! Egads!
My favorite "hash" structure is either an array (fixed size) or (when allocating) a binary tree. If I use an array I add a binary search. Sure, you keep on moving memory when inserting or deleting, but [a] searching is O(log n) and [b] the code is pretty straight forward. Binary trees also feature a O(log n) lookup, but due to the pointers, it requires about twice the memory. Both structures hardly suffer from performance degradation.
My favorite hash is FNV-1a. It's got both 32bit and 64bit versions, easy to implement, very fast and collisions are very rare. E.g. costarring collides with liquid, declinate collides with macallums, altarage collides with zinke. I think you catch my drift. ;-)
I did "classical" hash tables, but buckets are simply too much of a hassle - so I can't bother anymore. I find myself using the binary search/hashtable cross-over most of the time. Very often "good is good enough".
Hello Jacob, I am already subscriber of your channel and I find it very informative everytime I watch your videos. Though i have been working in C from last 16 years, but still I learn something new every time I watch your videos. Keep up your good work and keep adding good stuffs as you have been doing, for your fans like me.
Thanks. I'm glad it's helped.
Everything is going so quickly I had to slow it down to 0.75x playtime, so then you sounded drunk so I thought I'd have a beer too. Now I'm watching Star Wars drunk.
This is so freaking great! Thank you for this! I'm a python programmer but I am always taking a peek at C because I have unaddressed urges to dabble in low-level programming sometime (maybe for C extensions to optimize my python projects). What a great way to learn C - making hash tables.
You're welcome. Let me know if there are other topics you would like to see on here.
C is amazing!
Scripting language programmers, including the unix shell, are a bit spoiled because the interpreters ships with excellent built-in hash-like data structures, like Bash associative arrays, Perl hashes, and Ruby and Python dictionaries. Even Windows Powershell ships with a fast associative array implementation. It's very useful, and practically mandatory, for certain algorithms in C to implement hashtables, and I wonder how the gcrypt library "stacks" up against simpler home-brewed hashing functions. You can't memoize a function without a implementing a hashtable in your program.
the more i have experience in c and c++ the less i think about it as low level programming. There should be another term for that. Most of time when programming in C you are disconnected from cpu architecture.
That was super awesome. Working on a small project that uses all this info. THANK YOU.
I finally get it. For a long time I've been 'accepting' that hash tables have faster look ups than arrays without understanding why that is.
Thank for this great tutorial. For future videos, please give an additional second or two after writing a function to allow the viewer time to pause to see the code. It's extremely distracting with all the Visual Studio Code popups that cover the code you're writing as well and it's sometimes tough to find a split second to pause the video to see the code before you jump to something else.
I was coding in parallel with him and I had no problem pausing videos. I was suprirsed I wasn't getting segmentation faults all the time, but that's because every time I would wrote code which ended up being different than his, I would rewrite that part of the code lol
You can run the video slower!
Why though? You can just pause the video? Or the play speed.
@@futuza I usually do pause videos, but the cut happens quickly from when the last line of code is written and the next scene begins, so it's more challenging than should be to pause just in time
He probably wants you to pay for his Patreon to view the code.
This was really helpful for me, thanks! I started my project using arrays for hash table and I could already tell midway I was gonna have a hard time doing anything with the elements.
best introduction video Ive seen in the internet for that matter. thank you!
The best tutorial so far on hash function in C. Thank you. How do we come up with an optimal hash function for the data structure? Is maximum randomness the target?
Usually. If you're trying to minimize collisions, then maximum randomness (and fast) is usually the goal.
Is there a procedure one can follow to find a good/optimal hash function? I usually use something like Effective Java's hashcode impl like the one mentioned in [1] and assume it's good enough.
[1] stackoverflow.com/questions/113511/best-implementation-for-hashcode-method-for-a-collection
FYI: There is a thing called a "perfect hash." This is a hash that is tailored for a specific set of inputs, and will produce distinct values for each of them. If you know *all* the possible inputs in advance -- like when you are parsing language keywords -- then perfect hashes can be useful. (Ask the Duck about "gperf" for a software package that provides these functions.)
For any other scenario, you are looking at doing the best you can with what you've got. Some "cryptographic hash" functions are really good at providing seemingly random output for small changes in input, but they are slow. Now you're trading off speed vs. behavior. Most commonly-used hash functions opt for some simple rules: use 2**n buckets, use prime numbers to multiply, etc.
If you really want to find good hash functions, look at the source code for programming languages that provide "associative array" or "hash" or "dictionary" data types: awk, perl, ruby, python, java, c++. You can find hash functions that have been looked at and tweaked by a lot of people over many years, which hopefully provides a good performance vs. behavior.
@@austinhastings8793 Just one note. you do not use simply 'prime numbers', to multiply, you need a number that is relatively prime to the table size. If you have a size 100 table, you do not want to use 5 as your multiplier as it will map 100 objects into 20 spaces which will give more collisions. Now 3 or 7 in that case will be better.
I never thought about hash functions, or tables, in that way; which is surprising because i used to be quite enthusiastic about brown rocks from morrocco.
Very nice!👍👍👍Thank you! As an undergrad, we mostly used sedgewick's algorithms book(in java). Nice to see it done in c!
Are you a professional teacher because you explained this perfectly. Subscriber and waiting for more data structures and C videos.
Thanks. Glad you enjoyed it. And, yes, I'm a professor by day.
I was lately studying data structures wondering my head around what witchcraftery hash tables use and this video just pop out in my recommends lol
Man I love the way you teach
Thanks.
Been a while since i done hash tables, used a few before in work though, i do prefer external chaining think they were called buckets, quadratic is the way to go though, in large structures you can get bad clustering so what i used to do was have a variable quadratic based on the amount of objects with room to add more then just rehash when it filled up to much, the rehashing was inefficient but it did keep the structure performant for lookups and clustering to a minimum. Nice video
Thanks. Glad you enjoyed it. And, thanks for the added perspective.
Hash tables is a favourite. watch out for modulo bias!
As a C++ programmer I use the std::map and std::unordered_map for this purpose but this video will be useful for me if in some situations I would not be allowed to use them. Thanks!
Been bumping across channels looking for... this! Best hash table tutorial vid out there right now. Thank you!
You're welcome. Thanks for the support.
Your thoughts is super fast! lol It's helpful for me to do cs50 problem set 5, thanks!
lmao, I'm also doing it for pset5. What do you think was the hardest pset you've done in cs50 (except for tideman)?
Simply awesome.. C is even more beautiful with your code ..
Thanks. Glad you're enjoying it.
First time i used 0.75x to get smth clear is here.
I appreciate it.
very well explained. the only thing i think could've been clarified better was the optimization with "deleted". it was brushed off as obvious but i had to pause and think why
great explanation, of a hash table by starting out with a simple example and building on it.!!!!
Woot! Was that sped up a bit? Strangely enough it seems like it makes it easier to understand. ( I'll be watching it again )
Wow I don’t even know C and I understand this video perfectly! Clear concise explanation ! Probably also because every language I know is based on C in some shape or form haha.
You've helped me tremendously in my computer science course, thank you.
Welcome. Glad I could help.
Great video, i loved how you avoided using pointers to keep things simple
Not mentioned is that clients using this facility to retrieve & modify any 'person record' must NOT change the value(s) in the field(s) used by the hash function. Correcting the spelling of "Jane" to "June" would likely leave that person's record filed in the wrong location, never to be found again...
Because delete() returns the ptr to the record, to modify safely is to delete, {modify} and re-insert.
The "Open" version mistakenly used "TABLE_SIZE" for strncmp().
Corrected to "MAX_NAME" in "linear probing" version.
Not gonna lie I feel so overwhelmed but I think that's because I am still a noob.
Great video you explained everything so well I just have a skill issue, hope I'll get better.
Wow - I mean the typing speed is out of this world
I was looking for this exact thing. Thank you very much for Explaining this concept in the most effective way.
U deserve more popularity n views sir....
it takes time, but someday ALOT of people will start subscribing to u
Super clear explanation! The playback speed is such a great feature.. I normally use it while trying to pick up on a fast guitar lick and here it helps to slow your speedy typing down. Easier to follow ;-) Thanks
this was a good video and thanks for the effort, but it was by far one of the hardest videos to follow for me, most of it comes from me but imo there are things that can improve the quality of videos in the future, first please close the file browser tab if it is not needed and capture scene is small so more of the code is visible, second please scroll a bit slow or all and all slow the explanation process so noobs like me can follow, and last but not least programming videos without source code imo are half useful and of course, you can opt to put that as a feature for a paid option but I think that would reduce the impact of your work.
all and all thank you so much and keep up the good work.
one more benefit of external chaining is that if your hash function is not random enough, it can be easily diagnosed just by looking at the linked lists's lengths in proportion to the hash table's empty cell count.
Good point. Thanks.
You are an excellent teacher. Thank you for these videos.
You're welcome.
Thanks bro... Love from india
Thank you so much! To me this is the best explaination ! It actually helps me a lot with my project at university.
This is just blowing me mind
Thanks for the video, man! I'm doing Harvad's CS50 course, we have to use a hashtable in one of the problems and your explanation is much clearer and more in depth than the one provided in the course =) I have a silly question that is not relationed to hashtables itself, but I think will help me understand more of C in general: why in this video you used strnlen/strncmp and not the usual strlen and strcmp? What do the 'n' in the strnlen method stands for? I've read a bit of it online but couldn't understand it well.
Thanks, Joao. Glad it helped. For the question, check out this video about strings and security concerns. th-cam.com/video/7mKfWrNQcj0/w-d-xo.html
Good job noticing the "n" in strnlen! I was also wondering why MAX_NAME was passed to the function and came here to post the question in the comments. Glad I went trough them first.
I am also doing the CS50 and try to go line by line with Jacob's hash first to better undestand the implementation of the data structures. His examples were quite useful troughout the course.
The stnlen() function though cannot be used by the makefile CS50x implemented as default since strnlen seems not to be a part of the standard C99. You can compile the code differently of course, but for the sake of playing around in CS50x I'm not sure that is necessary. In real world implementation strnlen is quite important as Jacob explains in the video.
Not a fan of hash tables (I prefer tries), but I really appreciate your information and how you present it. Well done!
Thanks. Tries are coming. 😀
Tries can't replace hash tables. A Trie is a prefix tree. They don't use the same logic. Yes, two nodes can share the same parent but that doesn't make it a hash table.
@@sameerplaynicals8790 I'm referring to application.
EDIT: And in case that that's not clear, when I have a choice between trie and hash (and I always do for my particular application), I choose trie. So, your first statement is only true for certain requirements/uses, and that also applies to the reverse: Hash tables can't replace tries... depending on the requirements/uses.
Thanks for sharing, you just earned another subscriber
Awesome, thank you!
Thank you for giving this tutorial...man you are awesome.....
Thanks. Glad it was helpful!
Your videos have taught me a lot !! THANK YOU !
Fantastic explanation Jacob.
You r a BIG TOP G.
i agree
This video is very helpful to me, thank so much!
Great video, very helpful. One thing that I didn't fully understand is how to determine the size of the hash table. If anyone know, please do tell :)
Very clear and great example!!!
Very good and clear explanations. Thanks. One small note: please talk more slowly. For not native English speakers it is way too fast. Thanks again
If you click on the "Gear" icon, there is a "Playback Speed" option. Set it to 0.75 and try listening to it that way.
@@austinhastings8793 Thanks man! I know that option but it is not clear as the original
You should make an ASMR video. Your typing sound is oddly satisfying. also TBH, Nice explanation sir!
Excellent description. Since C is now considered an "unsafe" language, it would be interesting to hear your thoughts on the security deficiencies of external chaining (or indeed linear addressing!).
C is not in and of itself an unsafe language, 'safe' languages keep you from shooting yourself in the foot, so to speak. But there is an efficiency cost to do that.
@@jimnoeth3040 Since in practice we are addicted to shoot ourselves in the foot, same difference.
Super creative! Thank you for opening my mind limits! :D
i will patiently wait for stacks and binary lists
Stacks are coming. Binary Trees are coming.
🎉🎉
less amount of likes because of your fast speech, is someone running after you? 😃
Thanks for your effort btw!
Your videos are great. I need to watch them to do CS50's data structure problem.
One little optimization that can be done, is to avoid writing DELETED_NODE if the next table entry is NULL.
Best explanation.....don't need better than this. thumsp up!!!!
Nice Explanation, your video help me to do my assignment!
This video is SOOOO helpful to me!!! Thank you so much!
Awesome-ly put! Great explanations
It seems like the best option would be to have a hash function that does provide a unique output for each input rather than trying to deal with collisions.
Fantastic coverage of this! Thanks so much!
Phenomenal video. Very clear explanation and code, also, I had a feeling as if I was watching 3blue1brown, but for programming.
Explained well and good example. Thanks!
Mans is a BEAST
As a beginner programmer, I very much appreciate the video.
Unfortunately keyboard sounds were a bit much and had me constantly turning volume up and down in to lower keyboard sounds but then hear your voice
Yeah, I've tried to improve my audio setup in my more recent videos. Hopefully, they're heading in the right direction.
@@JacobSorber glad to hear it. I feel bad even giving criticism of this video as I greatly appreciate it. But it caused me pain lol
Thanks for the comprehensive tutorial, EXTREMELY helper for understanding hash table with practices. [QUESTION]: For the hash_table_lookup function, why are you using *hash_table_lookup (the address)? I am a bit confuse on that. Otherwise, great work! Looking forward for the next update!
One comment, I don't know if someone else mentioned it, but for best randomization, the table size should be a prime number.
20:33 lol @ "when I'm concerned about performance I use linked lists". worse is better when it comes to caches
I love the way you explain. Thanks a lot. God bless you
Glad I could help.
What keyboard are you using? Great video btw 👍🏻
+1 Also wants to know
Just my laptop keyboard
kudos to all of who can understand and follow along
Terrific, Jacob. My first attempt at creating a hash function produced collisions on single letter strings ('a', 'b', etc.), so I figured my algorithm (checksum multiplied by a large prime reduced by the table size) was defective. I then started experimenting with the 'b2sum' utility (published by the FSF) to produce a digest, but I still got collisions after applying the modulus operator to reduce the size of the array. After seeing your approach here, I'm no longer wary of creating a linked list to handle collision overflow. Iterating though a 10-item linked list is cheap computationally.
It seems to me if you've got tons of memory like most machines do since the 'naughts, why can't you just have a 64K-element array, and skip the downsizing of the index when computing the hash?
Yes, you can make any size list as the base for a hash table. However, unless you're in an extremely speed-critical path, I would use a hash tree with dynamic buckets.
For example, your root list could have 16 entries, and you use the first 4 bits of the hash to index it. On insert, when a linked list would get over a defined size, you replace them with another 16-element list and use the next 4 bits of the hash. This gives you a tree of up to 8 (or 16 for 64-bit hashes) levels of nodes and linked lists as leaves. If your hash function is worth its salt, you get MAX_INT buckets with very small lists in a structure that doesn't start out using max_int * pointer_size bytes.
@@HenryLoenwind Thx. Never occurred to me to create a tree of linked lists. I guess this is why people take a formal course in advanced data structures. Next I will explore how Python and Ruby implement Dictionaries and hashes, because lookup speed is often critical, especially in function memoization.