06 - Hash Tables (CMU Databases Systems / Fall 2019)

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 ส.ค. 2024
  • Prof. Andy Pavlo (www.cs.cmu.edu/~pavlo/)
    Slides: 15445.courses.cs.cmu.edu/fall...
    Notes 15445.courses.cs.cmu.edu/fall...
    15-445/645 Intro to Database Systems (Fall 2019)
    Carnegie Mellon University
    15445.courses.cs.cmu.edu/fall...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 33

  • @marcoq7160
    @marcoq7160 3 ปีที่แล้ว +29

    0:15 DJ Drop Tables
    0:56 Administrivia
    1:45 Course status
    3:04 Data structures
    4:47 Design decisions (data organization, concurrency)
    6:55 Hash tables
    8:38 Constant factors matter a lot
    9:28 Static hash table
    10:54 Static hash table: assumptions
    12:20 Static hash table: perfect hash function
    13:24 Hash table design decisions: hash function, hashing scheme
    16:12 Today's agenda: hash functions, static/dynamic hashing schemes
    16:43 Hash functions
    18:35 Hash functions: CRC-64, {Murmur,City,XX,Farm}Hash
    20:30 Hash function benchmark (XXHash3 was (still is?) the best)
    22:42 A question on hash collision attack
    24:45 Static hashing schemes (linear probe, robin hood, cuckoo)
    26:23 Linear probe hashing (open addressing)
    29:28 Linear probe hashing - deletes (tombstones)
    35:22 Non-unique keys (separate linked list, redundant keys)
    37:30 Robin Hood hashing
    43:01 Cuckoo hashing
    47:22 Observation: resizing, dynamic hash tables
    48:49 Chained hashing
    50:35 Extendible hashing
    56:59 What is the relationship between a hash table and a buffer pool?
    1:00:10 Linear hashing
    1:07:28 Linear hashing - deletes
    1:10:40 Conclusion

    • @luisponce3580
      @luisponce3580 3 ปีที่แล้ว +4

      beast

    • @marcoq7160
      @marcoq7160 3 ปีที่แล้ว +2

      @@luisponce3580 haha, man, you're doing great! Thanks for these comments, they make me smile :) This is the penultimate timestamp comment I made. But you motivate me to get back to it again!

    • @luisponce3580
      @luisponce3580 3 ปีที่แล้ว +1

      @@marcoq7160 lol, appreciated!

  • @ashishacharya4001
    @ashishacharya4001 4 ปีที่แล้ว +23

    No "Hit it" at the end? I'm disappointed.

  • @garfieldnate
    @garfieldnate 3 ปีที่แล้ว +3

    Java checks if the collection's keys implement Comparable, and if so it uses TreeMaps in the buckets instead of simple LinkedLists. In my undergrad data structures class, the first thing we learned about hashes is "this linear probing thing sucks, use linked lists in the buckets instead" (I never heard the term "chained hashing" before this lecture). So it was really eye-opening to hear the lecturer say, "in practice everyone uses linear probing"! I suspect the JVM default is a balanced generic implementation for many disparate use-cases, which has quite different requirements from what we would use in an RDBMS. In general I feel that this lecture filled lots of holes in my knowledge. Thanks for another great lecture!

    • @NaderHGhanbari
      @NaderHGhanbari 2 ปีที่แล้ว

      I believe his statement about linear probing is in the context of databases, not for general purpose programming.

  • @AndersonSilva-dg4mg
    @AndersonSilva-dg4mg 4 ปีที่แล้ว +9

    thank you Andy

  • @hansu7474
    @hansu7474 4 ปีที่แล้ว +38

    SHA256 is not reversible. But I think professor conflated it with, say, RSA, which is reversible. Lecturers have to convey a lot of information so those small mistakes are negligible. But I'm surprised why students who used SHA256 and MD5 didn't point out the mistake so that everyone who are watching the video could benefit from it.

    • @andypavlo
      @andypavlo 4 ปีที่แล้ว +37

      Yes, you are right. I corrected myself about this in the next lecture: th-cam.com/video/JHZFc4hMGhk/w-d-xo.html

    • @zhjwpku
      @zhjwpku ปีที่แล้ว

      One more point, MD5 and SHA256 are digest algorithms(or hash functions), and RSA and AES are encryption algorithms.

  • @willw2596
    @willw2596 2 ปีที่แล้ว

    Nice lecture. For extendible hashing, when the directory needs doubling due to a bucket split, you don't need to re-map all the the directory entries if using the suffix hash bits as the directory index. Just bulk copy the first half of the directory content to the second half of the new directory would work. Basically the two halves of the directory entries both pointing to the same buckets after the doubling.

  • @monitoringtsdb3884
    @monitoringtsdb3884 2 ปีที่แล้ว

    such fire beats

  • @420_gunna
    @420_gunna 4 ปีที่แล้ว

    feeling blessed to hear my professor say the crc hash "sucks ass" in 2020 year of our lord

  • @xuejianpan3020
    @xuejianpan3020 3 ปีที่แล้ว

    Great lecture!
    But I have some questions:
    1)is chained hashing the most popular one?
    2)Extendible hashing and linear hashing might be used when there are a lot of collisions, but do we need that if we have a good hash function? In practice are they used widely?
    3)are static hash tables actually used in some real systems? If so why are they better than chained hashing?
    Thanks!

  • @cacheman
    @cacheman 4 ปีที่แล้ว +3

    12:33 I will admit to some confusion here; this section makes it sound like perfect hashing is only of theoretical interest, but of course there's plenty of software to generate even a minimal mapping (like gperf, or CMPH) that is used in production software.
    19:03 Using CRC for hash tables, yes probably a bad idea. That said, modern CPUs (both AMD64/SSE4.2 and some ARM variants) include CRC32 instructions which should in some sense be "fast", for their intended use-case.
    22:42 Is he saying he's _never_ heard about this before, or that he's heard of it before? Okay kids, having a seed is not enough; see Aumasson & Bernstein on breaking murmur. More generally, look up 'algorithmic complexity attacks'. It's cool to ignore it, but only when you know what you're doing.

  • @paulchiang4752
    @paulchiang4752 4 ปีที่แล้ว

    Thanks, andy. So the tombstone will be considered when we calculate the fill factor(load factor) since it is a kind of entry?

    • @theUnorthodoxxx
      @theUnorthodoxxx 4 ปีที่แล้ว

      I do not think so. Because we have used it for our easiness only.It is just a special empty entry.

    • @paulchiang4752
      @paulchiang4752 4 ปีที่แล้ว

      I understand what you mean, but in this video, from 30:15 to 31:00, Professor said that "the tombstone is wasting space and contributing to the fill factor". I am not quite sure and thus want to confirm.

    • @AshishNegi1618
      @AshishNegi1618 3 ปีที่แล้ว

      @@paulchiang4752 Think what happens if you add N keys and delete N keys where hash table size is N. Should you consider hash table completely filled ?

    • @zhenghanghu2430
      @zhenghanghu2430 2 ปีที่แล้ว

      Yes, tombstones are considered for the fill factor.

  • @abdelrhmanahmed1378
    @abdelrhmanahmed1378 3 ปีที่แล้ว

    In Robin hode what if we want to move the richer one down by one but the place is already taken ?

    • @abdelrhmanahmed1378
      @abdelrhmanahmed1378 3 ปีที่แล้ว

      i have example where we will have the number of jumps = 3 ,
      if we have A[0],B[0],C[1],D[2] , and now we want to insert M and its location is where A at .
      so M[0] == A[0] , so next B[0]

  • @alfin3644
    @alfin3644 4 ปีที่แล้ว +6

    SHA-256 is not reversible

    • @andypavlo
      @andypavlo 4 ปีที่แล้ว +9

      Ah you are correct! I got this very wrong. I will correct this in the next lecture. Thanks!
      My point about that we don't care about leaking information is the more important part though.

    • @anarionzuo1425
      @anarionzuo1425 4 ปีที่แล้ว

      @@andypavlo It seems some chinese guy has made this reversible...

  • @jimwichen7978
    @jimwichen7978 ปีที่แล้ว

    beep

  • @zeevtarantov
    @zeevtarantov 4 ปีที่แล้ว

    Why is the a lesson on basics of hash functions in a "how to build a RDBMS" course?

    • @AshishNegi1618
      @AshishNegi1618 4 ปีที่แล้ว +3

      First 5 mins of this lecture talks about this.

  • @zeevtarantov
    @zeevtarantov 4 ปีที่แล้ว +1

    SHA-256 is not an asymmetric cryptographic primitive, and it is not reversible, and the author should bleep out the audio of that part if they don't want to take down the whole video.