strager had a whole video on hash tables and it turns out that a better hash function based on the understanding of the keys going into the table equals a MUCH faster hash table! 🎉
That video is completely miselading or at the very least it seems it has mislead you. Whats being implemented in this video is a general purpose hash table, what the video you mentioned shows is a perfect hash table that only works with a predefined, hardcoded set of words because he needed that for a JavaScript compiler. Those are two completely different problems and tbh his solution is quite dumb because for such a specific problem and such a limited set of keywords if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
@@marcossidoruk8033 Is it really faster tho? If that is the case, then a genera purpose language with good mata-programming features (like D) can easily create a library that does that!
it’s pretty obvious strager didn’t really know what he was talking about and just made the video for the clickbait. Did you see the thumbnail? On his stream today he said he didn’t even go to college! How anybody can take software engineering advice from him is beyond me.
> if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table. You should leave a comment on my video with your suggestion.
Thank you for the helpful video and C videos in general. I encourage and promote the understanding that C has advantages over OO, even though people may not understand that yet (OO is merely a code format, and an inflexible one at that).
If you want to make a generic hash table, you need a lot of helper functions that knows the type that you would need to pass along, that gives a lot of extra parameter. Or you could just pass along a point to a struct with all those extra functions, each of them function pointers for that type. Some of the extra pointers could be print_obj, destroy_obj, initialize_obj, copy_obj, assign_obj etc.
What I really enjoyed programming and found incredibly useful during my 1.5 years of C assignments was to write my own vector implementation. A basic one is only about 50 lines of code. Because my uni also requires us to free() *all* allocated memory manually, I was then able to write void *my_malloc(size_t count, size_t size, char *description): a malloc() wrapper that stores the new address in one of those vectors. I could then call print_allocations() and free_allocations() at the end of my main()! Very nice during debugging.
@@Urre5 I presume it is because in most of our projects we don't have any loops that would force us to free memory. So they just want to make sure we are aware of how to use free() properly. On some systems the OS might not do it for us at the end of the program too.
@@sanderbos4243 yeah I was hoping it's because of the latter part, but they should be explicit, and even in particular teach you not to free on systems which will clean things up, because otherwise you'll waste the users time when exiting the program. Then again if it's a nice arena or something where you have all your allocations it shouldn't take too long anyway
@@Urre5 Totally agree, we pretty much learn to use malloc() and free() however we like, as long as we don't have leaks. We're not told basic performance stuff like it maybe being braindead to use malloc() and free() unnecessarily all over the place, and without telling us about stuff like big O. Every exercise is a PDF, and our school (look up Codam or 42 school) deliberately doesn't have any teachers nor books we have to read, so everyone helps each other, and we spend a ton of time reading up online. It's incredibly freeing since the school is open 24/7 and you aren't required to be there for that many hours per week, but it isn't for everyone, since it's your own responsibility to become an awesome programmer. Oh, and it's completely free. :)
wow thank you for strcspn, i've been looping through my character arrays for a long time to try and format them into null-terminated strings without any return or newlines. Sweet function
I thought you stopped syaing without further ado. hahah. very good video. The reason I am good at c is because I have all your videos I can reference. :D
Just had to implement an open addressing hash table using linear probing and and double hading to reduce clustering - and I validated that, yes, double hasing does reduce clustering and the second has function can be very cheap and practically no cost. In my case the has table is allicated up front to some size and does not have to be increased in size over operational life time. Only keys are stored in the has table so a lookup returns an index. So the data resolved to is kept in a separate array that is of the same max entries size as the hash table itself. So the very same hash table can be used to lookup different data structure values depending on context - something that is the case in my domain.
Is the calloc call at 13:16 not incorrect? I thought it was number of elements then size of t. Right, it still will allocate the same amount of memory, I just would have expected that you would need to cast to make the compiler happy. What am I missing?
@@austinraney From the manpage, I cannot find any evidence that swapping the arguments could ever mess things up. I would however argue that it could potentially hurt performance, because the `calloc` implementation might use that extra bit of information you provide by specifying what's the amount of items and what's the size. For example, it might try to align the memory such that no single item in the array will cross a page boundary. For this, it would need to know what's the element size. Interestingly, I haven seen more people supplying the arguments in the wrong order than doing it correctly.
Update: I checked the libc implementation on my machine, and found that it doesn't care: It multiplies the values, makes sure no overflow happened, and then just goes on to allocate a block of memory as large as the result of the multiplication. Other libc implementations might differ.
@@Uerdue thanks for doing some digging and sharing! It’s much appreciated! I was curious in particular about potential page alignment problems like you mentioned.
Hi there, for a School project i do have to use a double linked list to record frequent items ( for each node i ll put the density and the time it was updated) however the problem is that i have to "organize my double linked list as a hash table using a function H" How is it possible? It also said that i do have to make the pointer to points to the next in term of time of updating
Just to be cheeky, the thing is, you don't have to type text from video anyway, you can use optical character recognition, I like to do `spectacle -r -b -o /tmp/screenshot.png && tesseract /tmp/screenshot.png stdout --psm 6 | xclip -sel clip` set to a keybind, then you can just paste into your favourite editor ;)
Just know that malloc and calloc are implemented using a heap structure, which is much more complex than a hash table. Why not creating this case study without malloc and calloc. Through static memory allocation. The usage of malloc and calloc slow down the logic dramatically. Also, if your hash table size and the structure size can be binary calculated through bit shifts, the key calculation can be made using a rotating random binary calculation. Which will result in blazingly fast key calculation.
I have a question may be relevant or entirely unrelated. I'm not actually sure. In short, I have a problem, the solution to which could well be a hash table. My keys are known to be unique and equal length, that is to say in terms of bytes no more than a few, or not strings, but rather integers, if you so desire. The question the becomes, is there a special category of hash functions (or other method), which can convert these keys into values in a given range, naturally ranging from 0-(n-1) for n items? Mind you I can think of other solutions for doing what I need to do, which is basically a runtime defined and/or modified switch case like functionality, but none I can think of are entirely well suited. Thanks for your efforts.
in my experience, it generally isn't necissary on modern computers. The odds that so much of your computer's total memory is being taken up by other processes, so much so that the program fails to allocate a comparatively small amount of memory, is small enough on modern computers that you can reasonably rule it out. Now, is it good practice to check anyway? Yes, absolutely. But it ultimately doesn't end up making that big of a difference
Those look like ligatures. You need a font with code-oriented ligatures, and you need an editor which supports ligatures. I don't know what font Sorber uses, but Fira Code is a popular font which has ligatures.
What I'd like to see is a vid on the new lattice based encryption algorithm, be one I'd definitely save for later and I'm sure a number of peops here would end up using it in future jobs or existing ones if they have them.
It's not a bad start, but it would help if you made it slightly more generic. You could use _Generic and specialize on a few known types, or you could use a union and associate traits with whatever user defined data gets passed in. It would help to have the user pass in a hashing function and a comparison function and have flags to determine if data should be copied or merely pointed to.
Very interesting video dr. Jacob, I'm glad you decided to expand on the last video.
I'm going to have to watch this 10 times.
(TYSM for the amazing material < 3)
strager had a whole video on hash tables and it turns out that a better hash function based on the understanding of the keys going into the table equals a MUCH faster hash table! 🎉
That video is completely miselading or at the very least it seems it has mislead you.
Whats being implemented in this video is a general purpose hash table, what the video you mentioned shows is a perfect hash table that only works with a predefined, hardcoded set of words because he needed that for a JavaScript compiler.
Those are two completely different problems and tbh his solution is quite dumb because for such a specific problem and such a limited set of keywords if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
Folks, look at the number of thumbs up SimGunther received, then look at marcossidoruk ones: welcome to the software industry.
@@marcossidoruk8033 Is it really faster tho? If that is the case, then a genera purpose language with good mata-programming features (like D) can easily create a library that does that!
it’s pretty obvious strager didn’t really know what he was talking about and just made the video for the clickbait. Did you see the thumbnail? On his stream today he said he didn’t even go to college! How anybody can take software engineering advice from him is beyond me.
> if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
You should leave a comment on my video with your suggestion.
Thank you for the helpful video and C videos in general. I encourage and promote the understanding that C has advantages over OO, even though people may not understand that yet (OO is merely a code format, and an inflexible one at that).
If you want to make a generic hash table, you need a lot of helper functions that knows the type that you would need to pass along, that gives a lot of extra parameter. Or you could just pass along a point to a struct with all those extra functions, each of them function pointers for that type.
Some of the extra pointers could be print_obj, destroy_obj, initialize_obj, copy_obj, assign_obj etc.
What I really enjoyed programming and found incredibly useful during my 1.5 years of C assignments was to write my own vector implementation. A basic one is only about 50 lines of code. Because my uni also requires us to free() *all* allocated memory manually, I was then able to write void *my_malloc(size_t count, size_t size, char *description): a malloc() wrapper that stores the new address in one of those vectors. I could then call print_allocations() and free_allocations() at the end of my main()! Very nice during debugging.
Did they say why you had to free stuff at the end of main
@@Urre5 I presume it is because in most of our projects we don't have any loops that would force us to free memory. So they just want to make sure we are aware of how to use free() properly. On some systems the OS might not do it for us at the end of the program too.
@@sanderbos4243 yeah I was hoping it's because of the latter part, but they should be explicit, and even in particular teach you not to free on systems which will clean things up, because otherwise you'll waste the users time when exiting the program. Then again if it's a nice arena or something where you have all your allocations it shouldn't take too long anyway
@@Urre5 Totally agree, we pretty much learn to use malloc() and free() however we like, as long as we don't have leaks. We're not told basic performance stuff like it maybe being braindead to use malloc() and free() unnecessarily all over the place, and without telling us about stuff like big O. Every exercise is a PDF, and our school (look up Codam or 42 school) deliberately doesn't have any teachers nor books we have to read, so everyone helps each other, and we spend a ton of time reading up online. It's incredibly freeing since the school is open 24/7 and you aren't required to be there for that many hours per week, but it isn't for everyone, since it's your own responsibility to become an awesome programmer. Oh, and it's completely free. :)
@@Urre5you should still free the memory + im not aware why it should waste time
Thank you Jacob for making these videos.
wow thank you for strcspn, i've been looping through my character arrays for a long time to try and format them into null-terminated strings without any return or newlines. Sweet function
I thought you stopped syaing without further ado. hahah. very good video. The reason I am good at c is because I have all your videos I can reference. :D
10:25, you already typedef'd it, you don't need another typedef, rather that's just asking for compile time errors
A 41 minute Jacob Sorber video? I'm in for a ride
Just had to implement an open addressing hash table using linear probing and and double hading to reduce clustering - and I validated that, yes, double hasing does reduce clustering and the second has function can be very cheap and practically no cost.
In my case the has table is allicated up front to some size and does not have to be increased in size over operational life time.
Only keys are stored in the has table so a lookup returns an index. So the data resolved to is kept in a separate array that is of the same max entries size as the hash table itself. So the very same hash table can be used to lookup different data structure values depending on context - something that is the case in my domain.
Looking forward to part 2
Is the calloc call at 13:16 not incorrect? I thought it was number of elements then size of t. Right, it still will allocate the same amount of memory, I just would have expected that you would need to cast to make the compiler happy. What am I missing?
Having thought about it for a second, I guess the calls would be functionally equivalent. Are there ever cases when they aren’t?
@@austinraney From the manpage, I cannot find any evidence that swapping the arguments could ever mess things up.
I would however argue that it could potentially hurt performance, because the `calloc` implementation might use that extra bit of information you provide by specifying what's the amount of items and what's the size.
For example, it might try to align the memory such that no single item in the array will cross a page boundary. For this, it would need to know what's the element size.
Interestingly, I haven seen more people supplying the arguments in the wrong order than doing it correctly.
Update: I checked the libc implementation on my machine, and found that it doesn't care: It multiplies the values, makes sure no overflow happened, and then just goes on to allocate a block of memory as large as the result of the multiplication.
Other libc implementations might differ.
@@Uerdue thanks for doing some digging and sharing! It’s much appreciated!
I was curious in particular about potential page alignment problems like you mentioned.
Yeah, calloc is typically just a multiply, a malloc call, and a memset (or the equivalent). Sorry if I caused any confusion.
Please make a video about terminal (termios.h) and different terminal modes etc
33:12 strdup()!
Very true. Thanks
Hi there, for a School project i do have to use a double linked list to record frequent items ( for each node i ll put the density and the time it was updated) however the problem is that i have to "organize my double linked list as a hash table using a function H" How is it possible? It also said that i do have to make the pointer to points to the next in term of time of updating
Just to be cheeky, the thing is, you don't have to type text from video anyway, you can use optical character recognition, I like to do
`spectacle -r -b -o /tmp/screenshot.png && tesseract /tmp/screenshot.png stdout --psm 6 | xclip -sel clip`
set to a keybind, then you can just paste into your favourite editor ;)
Just know that malloc and calloc are implemented using a heap structure, which is much more complex than a hash table. Why not creating this case study without malloc and calloc. Through static memory allocation. The usage of malloc and calloc slow down the logic dramatically. Also, if your hash table size and the structure size can be binary calculated through bit shifts, the key calculation can be made using a rotating random binary calculation. Which will result in blazingly fast key calculation.
Have you checked Zig yet ? Seems like a nice language to overview and compare to C.
And Nim! :)
wonderful teaching!!
I have a question may be relevant or entirely unrelated. I'm not actually sure.
In short, I have a problem, the solution to which could well be a hash table. My keys are known to be unique and equal length, that is to say in terms of bytes no more than a few, or not strings, but rather integers, if you so desire.
The question the becomes, is there a special category of hash functions (or other method), which can convert these keys into values in a given range, naturally ranging from 0-(n-1) for n items?
Mind you I can think of other solutions for doing what I need to do, which is basically a runtime defined and/or modified switch case like functionality, but none I can think of are entirely well suited.
Thanks for your efforts.
Hey, thank you for putting out these videos! I'm learning a lot from you :) Why aren't you checking for memory allocation failures?
in my experience, it generally isn't necissary on modern computers. The odds that so much of your computer's total memory is being taken up by other processes, so much so that the program fails to allocate a comparatively small amount of memory, is small enough on modern computers that you can reasonably rule it out. Now, is it good practice to check anyway? Yes, absolutely. But it ultimately doesn't end up making that big of a difference
Considering how you use the allocations should've really just used calloc everywhere to avoid runtime issues
strdup() == malloc() + strcpy()
Hi, can you show us how did you custom your vscode (comparaison operators like `!=`, etc...), thx for the video !
Those look like ligatures. You need a font with code-oriented ligatures, and you need an editor which supports ligatures. I don't know what font Sorber uses, but Fira Code is a popular font which has ligatures.
@@strager_ thanks
What I'd like to see is a vid on the new lattice based encryption algorithm, be one I'd definitely save for later and I'm sure a number of peops here would end up using it in future jobs or existing ones if they have them.
Also, why are you using uint32_t instead of just int?
uint32_t is always gonna be unsigned 32 bit, int size will depend on your machine
When you write "tmp = tmp->next" (e.g. lookup function), don't you have to define what "tmp->next" means? Where was that done?
"next" is a field of the struct "entry". You probably missed that 🙂
@@IBelieveInCode Ahh yeah, probably. Thanks
Good Game 🙂
I've just written my own C "Hash Table" module. It's on my channel. Without sound. My english is badly written, but it's worse when I try to speak.
It's not a bad start, but it would help if you made it slightly more generic. You could use _Generic and specialize on a few known types, or you could use a union and associate traits with whatever user defined data gets passed in. It would help to have the user pass in a hashing function and a comparison function and have flags to determine if data should be copied or merely pointed to.
Jacob, the Subscribe button on your WEB page is inop.
points-- for unnecessary use of feof
Jacob, long video's are fine
strdup() again
In delete you're leaking the key memory.
Did you watch the video? 😀 Yeah, I know. The example is unfinished. See you next week.
@@JacobSorber Yeah... the comment was made as I was watching.