How do dictionaries (hashmaps) actually work?

แชร์
ฝัง

ความคิดเห็น • 113

  • @proloycodes
    @proloycodes 2 ปีที่แล้ว +108

    i know quite a bit abt hashmaps, and i have to say, you did a very good job explaining it.
    except the part abt dealing with hash collisions, which it seems you made it look like that's the only way, when in reality it's only one way of dealing with the problem. for starters, it can be a tree or even not any structure at all: whenever a hash collision occurs, it puts the value in the next free slot, or it can do some moving items around before putting the value.
    in fact, there are many algorithms to decide how to move stuff around and even how to get a free slot to put the value in. see for example, Robin hood hashing.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +20

      Very true, I guess I kind of tunnel visionned into just one possibility in order to finish the video as soon as possible. I'm really gonna need to improve my fact checking, the only video of mine I i think doesn't have holes is the flask one :|
      I'll pin this so others can see the nuance.

    • @proloycodes
      @proloycodes 2 ปีที่แล้ว +7

      @@AByteofCode i don't think the "tunnel visioning" is a problem as long as you make it clear that this is just one possibility.

  • @bash0985
    @bash0985 2 ปีที่แล้ว +59

    This is really helpful! Very reminiscent of Fireship's 100 seconds videos, I can totally see you becoming big like that. Also the animation is great.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +7

      Thanks man! Yeah I'd started this channel after watching his second channel video on how he made content, and I wanted to give that style a shot (omg the ctrl-z trick is hot as f*ck tho).

    • @abdelhakimkhabir
      @abdelhakimkhabir หลายเดือนก่อน

      and i'll see you in the next one 😄😄

  • @endemwone
    @endemwone 2 ปีที่แล้ว +9

    These videos are great. Keep them coming.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      Shall do! Also thanks for the kind words!

  • @AByteofCode
    @AByteofCode  2 ปีที่แล้ว +13

    What can I do to make my future videos better? Any feedback (even negative) is greatly appreciated!

    • @zyansheep
      @zyansheep 2 ปีที่แล้ว +7

      _Blazingly Fast_

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +4

      @@THEROOT1111 Yep, sorted that one out lol. Monad and future will be at least not "whispering and reading an erotic novel at the same time" (some other commenter)

    • @proloycodes
      @proloycodes 2 ปีที่แล้ว

      @@AByteofCode perhaps increase the sound levels. Reducible might be a good reference (speaking for myself, dunno abt others)

    • @proloycodes
      @proloycodes 2 ปีที่แล้ว

      @@zyansheep yeah im not sure its actually _that_ fast in practice

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      @@proloycodes I was speaking pretty quietly in those videos, from monads on I'm gonna be nearly shouting at my mic, it sounds a lot better!

  • @ahjsbkdjhavkcjhvac
    @ahjsbkdjhavkcjhvac 2 ปีที่แล้ว

    these videos are so high quality, im genuinely surprised you dont have more
    clout

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว

      Considering my sub count of two weeks ago (30), I'd say that's getting resolved right now :)

  • @AByteOfPi
    @AByteOfPi 2 ปีที่แล้ว +62

    ah yes, smoking hash to do fast coding

  • @chudchadanstud
    @chudchadanstud 2 ปีที่แล้ว +21

    Hashmaps and Dictionaries are not the same thing. Yes a hashmap is a dictionary but a dictionary is not a hashmap. You can implement a dictionary with just about any data structure that lets you insert and remove data. e.g. an array of key-value structs. You can simply traverse using linear searching find the key you're looking for. It's not efficient but it works and is a dictionary.
    A hashmap on the other hand uses a hash function with the key is a parameter to compute a hashcode. The code is used to decided where to store the data in your chosen data structure.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +4

      That is an excellent point! I had cut the difference out of my script but I should have left it in

  • @damonfernandez3051
    @damonfernandez3051 2 หลายเดือนก่อน

    thank you that was so helpful

  • @5014eric
    @5014eric 2 ปีที่แล้ว +6

    I had normally imagined they were done with trees and therefore O(log n) for both reading and writing. That allows for scaling up without reorganising as it grow bigger.
    The next question then is what size is used when creating the new object? Too large and creating a lot of them uses lots of memory. Too small and it has to rejig itself a few times if more and more elements are added.
    I assume the size used would go up by powers of 2 or 4, putting a limit on the proportion of 'wasted' space. Although that extra space makes collisions less frequent.
    What is the most efficient algo may depend on the ratio of reads & writes, not to mention knowledge of how many elements there are going to be.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว

      I'm actually not too sure about specific implementations but I do know python does the rejigging with different sizes thing

    • @Slackow
      @Slackow ปีที่แล้ว

      I remember learning that prime numbers are specifically good at being the size of a hash map, their lack of factors helps avoid collisions, so it's probably not as simple as powers of 2.

  • @TomtheMagician21
    @TomtheMagician21 ปีที่แล้ว

    Wow I actually understand it!!!

  • @MeNowDealWIthIt
    @MeNowDealWIthIt 2 ปีที่แล้ว +1

    If you have to iterate through all of the keys that collided at the mod of the hash, then isn't accessing still linear time (albeit with a nice small leading constant)? IIRC when I implemented hashmap for my data structures class I had to use a tree at each index of the internal array for logarithmic time, but that's still not the constant time you mentioned.

    • @shaileger
      @shaileger 2 ปีที่แล้ว

      I guess he meant to say that the average complexity is O(1). The case you are describing is the worst case scenario O(n) which shouldn't happen very often if the hashing algorithm is decent and even if its happens the lists wouldn't be that long and that's why is usually considered as O(1)

    • @MeNowDealWIthIt
      @MeNowDealWIthIt 2 ปีที่แล้ว

      Ah, I hadn't accounted for, the internal array will usually have more spaces than the number of total items stored.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      If your hash map is going at O(1) time you're doing something seriously wrong. Most languages either force you to say how big it is or automatically readjust for you when it grows. Needing to loop through the end array is just a contingency plan that even when needed, as Shai Léger said, isn't much overhead.

  • @WolvericCatkin
    @WolvericCatkin 2 ปีที่แล้ว +2

    From my understanding, most practical implementation of hash maps, append additional colliding key-value pairs, as nodes of a linked list...? This has the advantage of not requiring any costly reallocations when inserting a value under a colliding key, whilst still managing to provide a close to `O(1)` time complexity, even with the random memory access...

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      Good point! I didn't do as much research as I should have for this video so I definitively missed a fair amount of details...

  • @user-op5vc9qw6o
    @user-op5vc9qw6o 9 หลายเดือนก่อน

    Thanks Bro, I'm trying for the n+1 time to get into APL

  • @Rudxain
    @Rudxain ปีที่แล้ว +1

    0:12 lmao, I got the meme references (except Mike, who's Mike?)
    0:19 to anyone at this part, please note that he said "interpreters". Compiled langs (like Rust and C) don't need hashmaps to lookup vars, because all the variables that will ever be used are known and declared statically at compile-time, so the compiler can replace vars with raw pointers to memory addresses, completely invalidating the need for a "middle-man".
    Dynamic langs like Py and JS usually require hashmaps, because the halting-problem prevents the bytecode-compiler from predicting assignments and declarations

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว +2

      FINALLY! Out of 8k+ views you're the first person to comment about the meme names! Let's just say Mike's last name is "Oxlong" :)
      Also second point very good and true.

    • @Rudxain
      @Rudxain ปีที่แล้ว +1

      @@AByteofCode thanks lol

  • @drsensor
    @drsensor 2 ปีที่แล้ว

    Just food for thought on your next video: "dictionary vs inline cache"

  • @DevChronicle
    @DevChronicle 2 ปีที่แล้ว +4

    Whens next video, one tip it sounds like you're whispering from behind me lol

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      Definitively this week, I've nearly finished the editing for it! And yes, Im whispering a bit less in this one :p

  • @vikingthedude
    @vikingthedude 10 หลายเดือนก่อน

    Can we deal with hash collisions using a hashmap? Like hashmaps within a hashmap

    • @AByteofCode
      @AByteofCode  10 หลายเดือนก่อน +1

      Generally most hash collisions come from just rounding the hash result, so the solution is to have a bigger table. The odds of two different objects with actually colliding hashes is so low that you're better off with a regular list. However, using a different hash function for subtables is a very fun idea :)

  • @zebraturner7103
    @zebraturner7103 6 หลายเดือนก่อน

    Still have no idea what hashmap is, but very interesting 🧐

  • @PecPur
    @PecPur หลายเดือนก่อน

    nice

  • @erikwg3814
    @erikwg3814 2 ปีที่แล้ว +3

    I'm a bit tired of hearing hashmaps are "O(1)" when any implementation I've seen is dependent on the hash function as well as the collision handling. To my understanding, those two cannot possibly be implemented in O(1) unless you have an internal array large enough to uniquely cover every possible hash value thus avoiding collisions.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +2

      You are absolutely correct! Then again, our computers and all are turing complete, even though they do not have enough memory to actually satisfy the definition. These classifications are mathematical estimates assuming the algorithm is executed perfectly, but the list thing is just to overcome technical limitations.

    • @turboblitz4587
      @turboblitz4587 ปีที่แล้ว +1

      Yeah I stumbled over that one too. It's weird, the computation time is not directly dependent on n but kind of stochastically, although a scenario is possible where it is stochastically 1. I guess one limitation of hash maps is knowing the best size of the internal array beforehand. Too big and you waste space, too small and you decrease runtime. How collisions are handled also play a major role in this.

  • @ouadieelouardy1171
    @ouadieelouardy1171 8 หลายเดือนก่อน

    The Next Fireship channel
    keep up ❤❤

    • @AByteofCode
      @AByteofCode  7 หลายเดือนก่อน

      Thank you!

    • @dan110024
      @dan110024 3 หลายเดือนก่อน

      Yeah the 'blazingly fast' comment made me alert and then the presentation style and memes put me off watching it seeing it's pretty much a direct rip of Fireship. Be unique. Make your own style. Don't blatently rip it off someone else.

  • @LeonardoPinto
    @LeonardoPinto 2 ปีที่แล้ว +2

    It also makes you to subscribe subliminally

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      Shhhh don't let the others know :)

  • @matheosmattsson2811
    @matheosmattsson2811 2 ปีที่แล้ว +1

    Good content. Only improvement I suggest is mic quality. It sounds a bit like you are way too close to the microphone

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว

      Yeah I figured out a bit about how to use the mic I have so should be better in future videos (and the monad one)

  • @Alysia.in.Wonderland
    @Alysia.in.Wonderland 4 หลายเดือนก่อน +2

    Love the ethereal sounding voice

  • @blazingazong
    @blazingazong ปีที่แล้ว

    I mean, if my array is the exact size of the largest prime number known to man, it may take up way more storage than necessary, but I think I can avoid most collisions. On another note, doing that might be catastrophic if hackers can find the array size trivially, but at least it’s free of most collisions, right?

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว

      I don't immediately see the danger of hackers knowing the size of your hash map, but I'd be interested to know if you don't mind sharing :)

    • @blazingazong
      @blazingazong ปีที่แล้ว

      I just mean that is they know the size then they have a clue as to what the key will be since the key is usually related to taking the remainder of any value with the given prime number value, and if the size is the same as the key so that there are’t any sizing issues, if you can somehow find the array allocation size then you also know a big chunk of the key.

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว

      @@blazingazong Worst case scenario, the hacker could know a chunk of the key, which is a hash. Already, if you know a hash, unless the key is tiny or you have a lot of computing power, you won't get anything out of it. So if you only know a chunk of the hash, its pretty much worthless. And that's assuming knowing a key is useful. If you have a password dictionary, the actual password hash would be in the value, not the key, and if you have access to the file anyways, you have all the keys already :)

    • @blazingazong
      @blazingazong ปีที่แล้ว

      @@AByteofCode I see what you’re saying on both of those fronts. I suppose you’d have to run the hash code on the client side in order to see any of the code, and at that point you would be able to see the entire hashing algorithm anyways- and also you’d have the entire array of passwords sent to your client which would be hilarious- but even if you did know the key, you still wouldn’t have access to the array behind the hash map anyways. Thanks for the learning!

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว

      @@blazingazong Absolutely no problem! That's why im here :)

  • @ArchonLicht
    @ArchonLicht 2 ปีที่แล้ว +1

    A dictionary doesn't have to be a hash map. Red-black tree can also be used, for example, as in Java's TreeMap (as opposed to HashMap). Yeah, in Java they call Dictionary a Map - because it's shorter I guess.
    While C#, Python, Objective-C and Swift call Dictionary a Dictionary - some more limited languages like Perl and Ruby just call it hash, because hashtable is the only kind of Dictionary they have built in (and I don't even know what do you call it in those languages when it's a hash function like SHA1 or MD5). But truly special is PHP which calls a Dictionary just Array. That's it, Array. I mean Dictionary is also not a proper term - proper one is Associative Array. But calling Associative Array just Array is like calling JavaScript Java.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      PHP always finding a way to make a non sensical decision smh. But yeah you are right that dictionary != hashmap, although the most efficient dictionary algorithm is the hash map

    • @ArchonLicht
      @ArchonLicht 2 ปีที่แล้ว +1

      @@AByteofCode Well, I guess we expect too much from a technology called PERSONAL Home Page :-)

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      ​@@ArchonLicht i thought it stood for pile of hot poop 🤔

    • @ArchonLicht
      @ArchonLicht 2 ปีที่แล้ว +1

      @@AByteofCode Yeah, that sounds about right :D

  • @TomtheMagician21
    @TomtheMagician21 ปีที่แล้ว

    Hold on I'm a bit confused with the collision aspect of it:
    So you could have like a class Cheese, then have an inherited class called Cheddar, then put it into a hashmap which shows its popularity and maybe with whom it is popular idk. But if Brie and Cheddar collide, idk how to find out which is the correct popularity of each Cheese.
    I tried Googling but as you might've guessed, I can't find anything answering about Cheese-related Hashmaps 😂

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว

      I don't understand your question but basically if two keys get the same hash, their values are stored together as a list of tuples, meaning you can look through and every value will be written next to its original key.
      so if Chedder and Brie both get the location 4, you'd have 4: [(Chedder, 78), (Brie, 52)]

    • @TomtheMagician21
      @TomtheMagician21 ปีที่แล้ว

      @@AByteofCode Ohh ok thank you that mostly makes sense, do tuples only store 2 of the same datatype though or could it store a class and an int? Wait could you not store the whole array as a list of tuples or is that what a dictionary is? Wait nvm that would be O(n) wouldn't it ok thank you 😂.

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว

      @@TomtheMagician21 Tuples (in python at least) can store whatever the f**k you want. Other languages may require certain type consistencies, but considering that classes have their own type, you could have (Class, Int) tuples (im guessing, not too sure)

    • @TomtheMagician21
      @TomtheMagician21 ปีที่แล้ว

      @@AByteofCode Oh that's cool then, is a tuple basically an array that you can't change?

    • @AByteofCode
      @AByteofCode  ปีที่แล้ว

      @@TomtheMagician21 Exactly!

  • @kelton5020
    @kelton5020 2 ปีที่แล้ว +1

    Good video. Only critique is that you skipped the part about how it actually finds the element by its hash value.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      If the problem is I didn't explain the hash function, tbh I don't understand it yet but when I figure it out I'll make a video about that. If the problem is how it turns a hash into an index, note that a hash is just binary and can be read as a number, which can be modulo'd with the length of the list. That gives an index in the list and it uses the native array[index] method. Would it have helped if the video contained a re-write of dictionaries?

    • @kelton5020
      @kelton5020 2 ปีที่แล้ว +1

      @@AByteofCode Thanks for the reply. Yeah I was referring to how the actual lookup happens. How you find the hash without just looping through until you find a match

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      @@kelton5020 Oh the hash is basically just a fancy mathematical function that takes a binary input and turns it into a semi random output of fixed length. I don't know much more for now (will be researching it soon) but its just a ton of modulos and bit magic.

    • @TheDuckPox
      @TheDuckPox 2 ปีที่แล้ว

      ​@@kelton5020 I believe you would just plug the result of the (hash % num_of_indices) into the array index and then search the actual id or hash linearly in the inner array/linkedlist.
      Although, I wish the video also explains the possibility of changing the array size.

  • @bigchungus2667
    @bigchungus2667 7 หลายเดือนก่อน

    Did an exam on hashmap while on hash this afternoon

    • @AByteofCode
      @AByteofCode  7 หลายเดือนก่อน

      Thematically consistent :)

  • @SumanRoy.official
    @SumanRoy.official 2 ปีที่แล้ว +5

    Try speaking a bit louder please, it will help

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +4

      Yep, did that with my monad video and I am very happy with the result so will be doing that in the future!

  • @jeremydepassorio664
    @jeremydepassorio664 หลายเดือนก่อน

    more videos on DSA

    • @AByteofCode
      @AByteofCode  หลายเดือนก่อน

      In the works ;)

  • @abdelhakimkhabir
    @abdelhakimkhabir หลายเดือนก่อน

    hey but here how you've came with the length 0:56, why you've set it to 5
    1. i thought that it's the current lenght + 1, but you have an empty map.
    2. i thought it's just a huge number => lead to collisions maybe, and then we can solve them with the table
    3. i thought that the element just appears out of thin air! 🤣

  • @MaxPicAxe
    @MaxPicAxe 2 ปีที่แล้ว

    I'm getting NoMagic vibes with this channel

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      Idk who that is so I'll watch some of his stuff and get back to you about it

    • @MaxPicAxe
      @MaxPicAxe 2 ปีที่แล้ว +1

      @@AByteofCode haha. That's awesome that you replied. Great channel! I'm looking forward to new content

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว +1

      @@MaxPicAxe I'm looking forward to making new content! Ngl thats a good combo. And thanks for the kind words!

  • @sanjaux
    @sanjaux 9 หลายเดือนก่อน

    brb fixing my multiple (lots of) ui elements highlight & select code now

    • @AByteofCode
      @AByteofCode  9 หลายเดือนก่อน

      wdym?

    • @sanjaux
      @sanjaux 9 หลายเดือนก่อน

      ​@@AByteofCode If I could select elements with a selection box and update their positions when dragging them by looking at a hashmap of their positions rather than looping through an array of all the positions finding the i-th index to update it would probably be faster when dragging a lot of elements at the same time right?

    • @AByteofCode
      @AByteofCode  9 หลายเดือนก่อน

      @@sanjaux I'm starting to see it, but wouldn't a position be an (x,y) coordinate amongst a large number of coordinates, so I'm not sure going through a hash table of every coord in your selection area is faster than checking every object, unless you have an absolutely insane number of objects per pixel. (or maybe I didn't see it)

    • @sanjaux
      @sanjaux 9 หลายเดือนก่อน

      ​@@AByteofCode ​ I meant a hash table that stores the coordinates of each object's position and its name/ID *after* the selection was made, not specifically during the selection surrounding it by the selection's coordinates. I'd store the position of the objects afterward so when I need to access them for a position update it would have lower time complexity, at least that was my thought process 🤔

    • @AByteofCode
      @AByteofCode  9 หลายเดือนก่อน

      ​@@sanjaux I see, but what key would you use for the hash table?

  • @navidjalilian5087
    @navidjalilian5087 2 ปีที่แล้ว

    Great video, low voice volume

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว

      Sorry about that. I fixed it for monads and future though, so shouldn't be an issue anymore!

  • @MaximeFRYSOU
    @MaximeFRYSOU 9 หลายเดือนก่อน

    Great job 👍🏽. Just advice but maybe it's me, you may try to articulate and speak a little louder cause it sounds like you're mumbling

  • @jaxetika
    @jaxetika 9 หลายเดือนก่อน +2

    twink femboy

  • @rithulkamesh
    @rithulkamesh 2 ปีที่แล้ว

    Try using innovate...

  • @marknn3
    @marknn3 2 ปีที่แล้ว

    Why you don't have ANY punctuation marks in your script. So tiring to listen to.

    • @AByteofCode
      @AByteofCode  2 ปีที่แล้ว

      I had edited the spaces out but yeah won't be doing that any more lol