Faster database indexes (straight from the docs)

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 มิ.ย. 2024
  • Creating faster composite indexes via hashed columns! To learn more about PlanetScale, visit us at planetscale.com.
    Link to the MySQL docs: dev.mysql.com/doc/refman/8.0/...
    📚 Learn more about PlanetScale at planetscale.com/youtube.
    ------------------
    00:00 MySQL documentation
    00:58 Addresses table
    02:17 Hash columns
    04:13 concat vs concat_ws
    05:40 Generated columns
    07:31 The UNHEX function
    09:34 Searching via hash
    11:22 MD5 hash collisions
    12:46 Functional indexes
    ------------------
    💬 Follow PlanetScale on social media
    • Twitter: / planetscaledata
    • Discord: / discord
    • TikTok: / planetscale
    • Twitch: / planetscale
    • LinkedIn: / planetscale
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 319

  • @PlanetScale
    @PlanetScale  8 หลายเดือนก่อน +13

    If you enjoyed this, we also have a free MySQL for Developers course with 50+ videos. Thanks for watching! planetscale.com/learn/courses/mysql-for-developers/introduction/course-introduction

  • @aegif
    @aegif 8 หลายเดือนก่อน +113

    As a student that has taken a dbms course in mysql and hated every part of it, I can absolutely tell that I would love it had it been even a bit like this video and series in general. Thank you Aaron, this content is genuinely incredible and is making me want to actually learn the details of mysql.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +19

      I can't tell you how happy this makes me. It's such a shame to me that so many interesting things are taught in such an uninteresting way.

    • @ahmad-murery
      @ahmad-murery 8 หลายเดือนก่อน +9

      @@PlanetScale Teaching is a skill that not every expert possesses (assuming they were expert in the first place)

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +4

      @@ahmad-murery Extremely good point

    • @godonholiday
      @godonholiday 8 หลายเดือนก่อน +1

      Aaron, this approach is refreshing. Hope you get to some focused use case series, like SQL for Product Managers!

    • @reikooters
      @reikooters 8 หลายเดือนก่อน

      You'll have to learn it one day. There's no system in the world that doesn't use data - it's the entire point of why we build applications. I used to fall asleep in my database classes when I was in university (15 odd years ago) and my mate used to wake me up when the class finished because the lecturer was terrible. But after graduating I spent a lot of time teaching myself about databases for the reason I mentioned earlier.

  • @remedix
    @remedix 8 หลายเดือนก่อน +57

    I don’t usually post on videos but the quality and the details on these is amazing! Thank you Aaron! Keep them coming!

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      Thank you so much, I really appreciate you saying that. We're definitely gonna keep em coming 🫡

    • @andynn6691
      @andynn6691 8 หลายเดือนก่อน +1

      Agreed! I don't even use or like MySQL very much, but the content is often more broadly applicable.

  • @transcendtient
    @transcendtient 8 หลายเดือนก่อน +24

    I appreciate you talking about collisions because the one case that you do have where it happens would be a nightmare if you didn't plan for it no matter how small the chance may be.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +5

      Totally. Especially if you use CRC32 which is going to have massive amounts of collisions!

    • @Pilikio
      @Pilikio 8 หลายเดือนก่อน +4

      Also when using non cryptographic hash function you have to assume that a malicious user may enter data to ptoduce hash collisions on purpose.
      In best case this will lead to an annoying bug, in worst case a severe security vurnability.

    • @sealoftime
      @sealoftime 8 วันที่ผ่านมา

      @@Pilikio It can very-very theoretically cause Hash DoS attack

  • @aoe4_kachow
    @aoe4_kachow 8 หลายเดือนก่อน +18

    I wish the results were rendered in a monospace font so that the md5 hashes lined up neatly

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +7

      You and me both 🫠

  • @reikooters
    @reikooters 8 หลายเดือนก่อน +16

    You missed one big thing. When you add the generated column, you need to also specify the "stored" keyword. This means when inserts/updates are made, the generated column will be computed and stored in the table at the time of writing. The default behaviour is "virtual", which means the generated column is not stored with the data and instead must be computed at the time of reading. This means every time you select from the table, mysql needs to recompute the hash for every row in order to find the one you want, which kinda defeats the purpose of the whole exercise.

    • @monad_tcp
      @monad_tcp 8 หลายเดือนก่อน

      It all depends on how fast is hashing vs disk I/O.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +10

      In this particular example the data is stored even though it is a virtual column. Because we put an index on it, those values are calculated and then written into the btree. Calculating the hash is pretty cheap, but if you have an expensive generated column it might be worth declaring it stored instead of virtual.

  • @simonhartley9158
    @simonhartley9158 8 หลายเดือนก่อน +13

    Lossy compound indexes seems such a good idea, that it should become a standard built-in feature at some point, e.g. a particular index type. You could then have index types like Postgres' GIST which will handle the deduping automatically.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +5

      Yeah that's an interesting idea. It would be neat to see that built in

  • @ayeshk5997
    @ayeshk5997 8 หลายเดือนก่อน +1

    I've been using MySQL for ages now, and yet you still come up with new things and ideas (like this one and the Geo box combined with haverseine calculation). Keep it up!

  • @alexeybalandin4676
    @alexeybalandin4676 8 หลายเดือนก่อน +1

    I'm coming from your planetscale mysql db course and just gotta say - it's really amazing. Just the perfect level of depth, engineering knowledge and ease of use. I've been mostly the ml engineer guy through my career, now I'm switching to full-stack role and your course is really amazing for me to delve into optmizing the database shenanigans :) Thank you very much!

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      I'm so glad to hear that! Thank you for letting me know. I put a lot of work into that course so I'm pleased that you've enjoyed it

  • @ridass.7137
    @ridass.7137 8 หลายเดือนก่อน

    So well explained with the right pace and example, its incredible. Thank you!

  • @ihzakarunia2408
    @ihzakarunia2408 8 หลายเดือนก่อน

    As developer that always use mysql, you very did a great job explaining this topic. Thank you very much

  • @saadkhan3401
    @saadkhan3401 8 หลายเดือนก่อน

    My man this is is very greate and the niche you choose to post this video perticularly is very great the quality and everything I hope you make this as a series as this will help everyone a lot ❤

  • @Hislodin
    @Hislodin 8 หลายเดือนก่อน

    I can see how this could be very useful in optimizations. Thank you for your practical guide on deeply doc-founded knowledge.

  • @n4trojan
    @n4trojan 8 หลายเดือนก่อน

    That's was definitely a great piece of knowledge I got in just 13 minutes. Thank you!

  • @anderskozuch7838
    @anderskozuch7838 8 หลายเดือนก่อน +2

    You are awesome man:) I am self learning web development and you are such a nice source for wholesome and humouristic learning!

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      I love to hear that! Thank you for letting me know. And good luck on your web dev journey!

  • @gazsi72
    @gazsi72 6 หลายเดือนก่อน

    I haven't learned such a useful trick from a youtube video in a long time, thx

  • @mibennettc
    @mibennettc 7 หลายเดือนก่อน

    I saw this video and it literally solved a problem I had at work the next day. Stellar job Aaron!

    • @PlanetScale
      @PlanetScale  7 หลายเดือนก่อน

      Oooo no way! Can you give me any more hints? What'd you end up hashing?

  • @StingSting844
    @StingSting844 8 หลายเดือนก่อน +1

    As someone who is just getting started with databases, this is incredible! Thank you A Aron

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      A A Ron 😂😭 You're welcome!

  • @postnick
    @postnick 5 หลายเดือนก่อน

    Content like this I love - I hope I can find a useful way to do this in my position but my indexes are always Material + location + date usually.

  • @budidarmawan6959
    @budidarmawan6959 8 หลายเดือนก่อน

    i didn't even use MySQL, yet i was pleasantly entertained by your video (i believe some of the knowledge is applicable to other databases as well)
    thank you, keep up the good vid 💪🏻

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Thank you so much!

  • @jimothyus
    @jimothyus 8 หลายเดือนก่อน +5

    These videos are really high quality. You are doing an excellent job

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      Thank you so much for saying so

    • @JohnRoux
      @JohnRoux 8 หลายเดือนก่อน

      Yeah so much this!
      When I see a new video on my feed, I watch it before all my usual content because honestly, this is just so good!

  • @thereasonableprogrammer4921
    @thereasonableprogrammer4921 4 หลายเดือนก่อน

    As someone who likes reading the MySQL docs, printing them out and binding them is hardcore 😂 props Aaron 🎉

  • @pranaygoel
    @pranaygoel 8 หลายเดือนก่อน

    Hi from India!
    It's the best thing I came across today.

  • @jamaly87
    @jamaly87 8 หลายเดือนก่อน +1

    Dude your content is 🔥.

  • @rickmyers7627
    @rickmyers7627 5 หลายเดือนก่อน

    Excellent Aaron! This technique can also be used to create a Cache key to store data in memory (or on disk), if you need to "roll-you-own" caching mechanism. Basically, sort/uppercase your arguments, then MD5 them together, and then use a hash array to map that MD5 to the response from the server. We do that with address lookups so we aren't constantly going to Google for geocode data (and eventually getting charged). Once we see an address we cache Google's geocode response and we never have to do the lookup again, thus saving money down the road.

  • @kaprikornz
    @kaprikornz 8 หลายเดือนก่อน

    Quality videos man! Keep it up

  • @pixtweaks393
    @pixtweaks393 8 หลายเดือนก่อน

    Thanks for sharing this amazing idea, very well explained, thank you!

  • @user-cf9ir4gw2c
    @user-cf9ir4gw2c 7 หลายเดือนก่อน

    Super. I remember dozens of usecases, where this implementation would of saved the day.

  • @addanametocontinue
    @addanametocontinue 8 หลายเดือนก่อน +1

    Great content. Got yourself a new subscriber!

  • @BMRG14
    @BMRG14 8 หลายเดือนก่อน

    So many great tips! Thank you very much!

  • @nawaahkh
    @nawaahkh 6 หลายเดือนก่อน

    Just 😲, a huge thank you for all these information in all videos.

    • @PlanetScale
      @PlanetScale  6 หลายเดือนก่อน +1

      You're welcome 🤗

  • @k1chko
    @k1chko 8 หลายเดือนก่อน +1

    You do great work! Tough to find solid material to point developers to.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      I appreciate that very much

  • @youssefhossam9077
    @youssefhossam9077 8 หลายเดือนก่อน

    I really learned something today, thank you man.

  • @MonokelJohn
    @MonokelJohn 8 หลายเดือนก่อน +4

    There are some pitfalls when using concatenated rows as the key, especially when there is no separator, but also when columns are nullable.
    What I'm thinking about is a row with column values 123 and 45 and another row with values 12 and 345, which would end up having the same concatenated value (when using just concat), and therefore the same hash. You could still end up in a similar situation when using concat_ws when some of the columns are nullable. We saw that the null values are completely omitted, so two sequential nullable columns could end up with the same hash, when a value is present in either one or the other column. There are even more ways where null values in some columns could result in the same input for the hash calculation. If the null values were converted to an empty string, which would create a separator representing every column, which would solve that problem, it would however be hashed the same as an empty string, or any other string you would use to represent null, but it's the best solution I can think of.
    That's why I think it's really critical to do the full comparisons described at 11:50.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      Yup this is a great argument for adding the extra conditions

    • @lyuboslavilov
      @lyuboslavilov 8 หลายเดือนก่อน

      Rule of thumb - absolutely always do use separators when concatenating multiple values and producing whatever fingerprint you are producing

  • @CRBarchager
    @CRBarchager 7 หลายเดือนก่อน

    Awesome video. Now I just need to find the T-SQL version of your channel and I'm complete :D

  • @oguznsari
    @oguznsari 8 หลายเดือนก่อน

    this is great, thank you aaron

  • @andyvirus2300
    @andyvirus2300 8 หลายเดือนก่อน +2

    Thank you for talking about collision, I was wondering about it even if I knew this wasn’t really an issue !

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +3

      It's always fun to look up collision probabilities!

    • @andyvirus2300
      @andyvirus2300 8 หลายเดือนก่อน +1

      @@PlanetScale not as much as your videos, and I’m learning a ton.
      Please keep up the good work !

  • @languagelearningexperience6814
    @languagelearningexperience6814 8 หลายเดือนก่อน

    Excellent video!

  • @rrd_webmania
    @rrd_webmania 5 หลายเดือนก่อน

    A gem again. Thanks

  • @TuralAsgar
    @TuralAsgar 5 หลายเดือนก่อน

    Good for you, Aaron!

  • @simonnjoroge933
    @simonnjoroge933 7 หลายเดือนก่อน

    Amazing video, great explanation and also, nice sql tool you have there. More videos 🌟

    • @PlanetScale
      @PlanetScale  7 หลายเดือนก่อน

      Thank you! Got a big one coming out tomorrow on a similar topic 🤐

  • @domanjie
    @domanjie 8 หลายเดือนก่อน

    Quality content👌🏾, Thank you.

  • @jhechtf
    @jhechtf 8 หลายเดือนก่อน +2

    "remember if you hash passwords with md5, straight to jail" the delivery got me to laugh out loud, take your like.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Got em! 😂 Glad you liked that part

  • @kyleridolfo
    @kyleridolfo 8 หลายเดือนก่อน

    Truly excellent video

  • @JohanNordberg
    @JohanNordberg 8 หลายเดือนก่อน

    I really like your presentation style. Clear and explanatory and good video length.
    I never use MySQL, so one improvement for me would be if you could mention if this would be relevant for MSSQL and Postgres as well and if the syntax is different, just make a short note of that.
    I assume most concept are similar.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Thanks Johan! I'm not as familiar with the other databases so I feel hesitant to speak on them

  • @treesitter
    @treesitter 8 หลายเดือนก่อน

    This video screams high quality content!

  • @rahulxcr
    @rahulxcr 8 หลายเดือนก่อน

    Thanks. This is really awesome trick.

  •  8 หลายเดือนก่อน

    You have enough friends and I'm proud to be your friend 👍🏻

  • @joshhickman77
    @joshhickman77 5 หลายเดือนก่อน

    The MD5 hash collision isn't the reason you add the WHERE, it's because sometimes your data can have the separator in it and "123|abc", "def" gets combined to the exact same string as "123", "abc|def" even though the values don't match.

  • @yakirgb
    @yakirgb 8 หลายเดือนก่อน

    Very good video!

  • @TroyNiemeier
    @TroyNiemeier 7 หลายเดือนก่อน

    You got my sub for the "Instead of hanging out with friends, ....we continue to read " line. :)

    • @PlanetScale
      @PlanetScale  7 หลายเดือนก่อน

      Haha glad you liked that one 😂

  • @119_shrenikshah7
    @119_shrenikshah7 8 หลายเดือนก่อน

    Great quality content

  • @standardio8270
    @standardio8270 8 หลายเดือนก่อน

    This is great content.

  • @rasheedatanda9927
    @rasheedatanda9927 8 หลายเดือนก่อน

    Thank you for this video. It provides more insight into why some rule of thumb exists in the database world.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      You're welcome!

    • @rasheedatanda9927
      @rasheedatanda9927 8 หลายเดือนก่อน

      @@PlanetScale In short, all your videos I've watched made lots of sense to me considering my side quest on building a database. I'm now able to see things from a perspective different from the usual "do this don't do that because of so and so..." which is quite rampant in the engineering world.

  • @nerdswhosell
    @nerdswhosell 8 หลายเดือนก่อน +1

    Love this. Is there any tips on thinking about this when building/designing the database schemas ?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Thanks! I think the most common use cases would be 1) when you're trying to index something that's not easily indexable, you can reach for this or a functional index 2) when you're trying to "paper over" some weirdness in your data model as a convenience or 3) when you're trying to create an ID for an entity that doesn't already have a natural ID. In this case the address didn't have a natural ID, so calculating a hash of the parts gives us a nice ID that we can use for deduping, uniqueness, or indexed lookups from the application side

  • @SlimNateKC
    @SlimNateKC 8 หลายเดือนก่อน

    I love this, I’m curious why this isn’t just the default behavior for indices and what the crossover point is for when this is faster than the traditional index

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      It has pretty severe tradeoffs, like the fact that it only works for strict equality lookups

  • @kirayamato6128
    @kirayamato6128 8 หลายเดือนก่อน

    dude u are awesome always!! thank you

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      🫡 thank you

  • @vasilisthemelis9017
    @vasilisthemelis9017 8 หลายเดือนก่อน

    You can have collisions due to the separator being included in the value stored in the columns as well

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      Totally! If that's possible with your data, that'd be a great reason to use the redundant conditions (mentioned at the end of the video) for deduping.

  • @MelroyvandenBerg
    @MelroyvandenBerg 8 หลายเดือนก่อน +1

    This is so great! I hope this all works on MariaDB as well?? And now I also need to print this "book" ;)

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Theoretically it should work the exact same! Smaller = faster. Give it a go and let me know

  • @ongong8209
    @ongong8209 5 หลายเดือนก่อน

    thanks you so much, your videos so helpul for my job. Can you do more video about EXPLAIN ANALYZE and how to optimize execution plan in MYSQL

  • @soofgolan
    @soofgolan 8 หลายเดือนก่อน

    Awesome video! one small nitpick, since you're dropping nulls when joining columns into a string with the "|" separator, the following rows will be converted into the same string before hashing, even if values came from different columns.
    All these:
    "A", NULL, "B"
    NULL, "A", "B"
    "A", "B", NULL
    Map to "A|B"
    I suggest using a more resilient value concat'ing scheme

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      True! Unlikely to happen with this data, but that would be a good reason to use the conditions shown at the end of the video

  • @JohnnyBigodes
    @JohnnyBigodes 7 หลายเดือนก่อน

    We use this to search for duplicates on the database. A thing that I would personally change is making all strings to LOWER(), so that if the user uses a big letter instead of a smaller one, it still sees it as a duplicate.

    • @PlanetScale
      @PlanetScale  7 หลายเดือนก่อน +1

      Nice! That's probably a good addition. Good thinking

  • @devcunha
    @devcunha 5 หลายเดือนก่อน

    Very nice!

  • @prasanthomanakuttan8391
    @prasanthomanakuttan8391 8 หลายเดือนก่อน

    Earned a Sub

  • @nick-khan
    @nick-khan 8 หลายเดือนก่อน

    Great content up to the point! What MySQL version are you using and reading docs of?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      The docs here were version 8, but this pattern will work in any version!

  • @rafaelacosta5374
    @rafaelacosta5374 8 หลายเดือนก่อน

    awesome!, I had no idea about this, I have a scenario where to use it. Thanks

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Love to hear that! Please report back how it goes

  • @adampatterson
    @adampatterson 7 หลายเดือนก่อน

    This is really cool, the only thing I was curious about is the hashing performance?
    I know the hash is only created in insert and update.
    It doesn't look like MySQL supports any of the newer hashes like xxh3 which would be my choice since it's so incredibly fast, typically faster than the computers memory bandwidth. It's not cryptographically ( is that a word ) strong, but it's randomness is extremely high

    • @PlanetScale
      @PlanetScale  7 หลายเดือนก่อน

      MD5 is incredibly fast also, which is actually one of the reasons it shouldn't be used for cryptography! I haven't benchmarked it, but I think it should be fast enough for most needs. You could try CRC32 but you'd have a ton of collisions so you'd definitely need to add the additional conditions mentioned at the end of the video. Worth a few experiments to see, I suppose!

  • @yangsui1301
    @yangsui1301 8 หลายเดือนก่อน

    The generated column got me thinking if there is some places I can actually do better on my work, thanks Aaron!!

    • @yangsui1301
      @yangsui1301 8 หลายเดือนก่อน

      One more thing, should we ignore the cost of concat, md5 or any pre-processing stuff here? I mean, I'm not a MySQL expert so I'm not sure if we do this calcuation in database level is good or not. And when should will consider using a generated column instead of a composite index? Should we always make a experiment or if there is any instuction we can follow?

    • @yangsui1301
      @yangsui1301 8 หลายเดือนก่อน

      BTW the generated column is a really nice idea, if there is a column referencing or calculating by the others that our apps will have to keep it correct, it's struggling because we are afarid of there is some place in our app that doesn't handle that calculation correctly, but if this job is done by the database, at least we can avoid lots of code changes and just trust the database will keep the consistency as usual.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      Generated columns are so fun. You can use them in so many places. I would probably ignore the cost of a concat or MD5, but if you were to do something very expensive in a generated column, you might consider declaring it STORED so it's not calculated over and over. I like having the database maintain these calculated values so that no matter where the insert or update is coming from, the generated column is always going to be correct and present.

    • @yangsui1301
      @yangsui1301 8 หลายเดือนก่อน

      @@PlanetScale Fair enough! Will definitely try it, thanks Aaron!!

  • @luizhrocha
    @luizhrocha 8 หลายเดือนก่อน

    This is gold

  • @thmater-x9786
    @thmater-x9786 8 หลายเดือนก่อน +1

    Whish there is a channel like this for mongodb :(
    Awesome content ❤

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Guess you'll have to switch to MySQL 😏😏

  • @VuTuanIT
    @VuTuanIT 8 หลายเดือนก่อน +1

    This is so useful that I feel stupid that didn’t knowing this after 6 years working with mysql 😂

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      Nah don't feel stupid, we should all be learning new things constantly!

  • @Laflamablanca969
    @Laflamablanca969 8 หลายเดือนก่อน

    Another brilliant fucking video! Keep them coming 😂

  • @nebbsie
    @nebbsie 6 หลายเดือนก่อน

    Nicely explained! What software are you using for querying?

    • @PlanetScale
      @PlanetScale  6 หลายเดือนก่อน

      TablePlus! th-cam.com/video/7V_CJBPZPes/w-d-xo.html

  • @konserv
    @konserv 8 หลายเดือนก่อน +1

    I would rather choose smaller hash and worry about hash collisions than picking wider hash and hoping there will be no collisions.
    Additionally to that, our CPUs and memory works better with 32/64 bit data that with 128 bit data. And, in this case, I would pick CRC32.
    Also, MD5 was designed as a cryptographic hash, you don't need cryptograpy overhead here. Simpler hash is better here.
    It would be nice to see a benchmark of those 3 approaches. Wide index, MD5, CRC32.
    Anyway, thanks for the video. I learned about auto-generated columns. And it was interesting.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      CRC32 plus the additional conditions to dedupe hashes would work just fine!

  • @thegrumpydeveloper
    @thegrumpydeveloper 6 หลายเดือนก่อน

    Honestly this video series was a big part of me switching back from neondb.

    • @PlanetScale
      @PlanetScale  6 หลายเดือนก่อน

      🤐
      this makes me happy

  • @jayantrawat5961
    @jayantrawat5961 8 หลายเดือนก่อน +1

    First of all that's a great video and thanks a lot for that.
    I have a question regarding this, as we have 6 columns hashed as index and I need to search in 3 columns instead of six, will it be helpful in that case too?? Like I want to search in address, zip code and state column, will it work??

    • @LucasSouza1
      @LucasSouza1 8 หลายเดือนก่อน

      10:50

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Unfortunately @LucasSouza1 is correct. It's no longer possible because you've destroyed quite a bit of that data by hashing it.

  • @visualapproach7155
    @visualapproach7155 8 หลายเดือนก่อน

    Ok, wait a minute. How is it possible I haven't come across your content before? You actually know your stuff. Moreover, I can't even play out the jokes in my head before you say them out loud. GET OUT OF MY HEAD. Also, keep producing amazing content. Deal? Deal.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      You drive a hard bargain but you got yourself a deal

  • @diavolokelevra4795
    @diavolokelevra4795 8 หลายเดือนก่อน

    Worrying about collision of md5 maybe a bit paranoid but one other collision I do worry about is concat_ws doesn't entirely solve the null problem, that null values simply disappear. For example, using your table schema, there will be collision if there is an address with street_name HL with no street_suffix, and another with no street_name and street_suffix HL. Boom, collision. I have seen too many weird real life data to be at ease about this.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Totally! In that case adding the extra conditions makes a lot of sense

  • @shamimi123
    @shamimi123 8 หลายเดือนก่อน

    Using that alongside hstore is pretty

  • @k98killer
    @k98killer 8 หลายเดือนก่อน

    Sounds pretty intuitive

  • @alvaroaquije2572
    @alvaroaquije2572 8 หลายเดือนก่อน

    This man needs to be more viral; question: you mentioned strict equality, but will this approach also work with statements like ‘is like’?

    • @PablumMcDump
      @PablumMcDump 8 หลายเดือนก่อน +1

      The hash function obliterates LIKE because there's no substring any more to match on.
      The MD5 hash is only suitable for exact matching.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Unfortunately it only works for strict equality

  • @sholtronicsaaa101010
    @sholtronicsaaa101010 8 หลายเดือนก่อน

    Really informative video. Just a question. Would a row with values 1, null and null, 1 not hash to the same value in this case using the concat_ws method?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Yes! It would. If that's possible with your data, that'd be a great reason to use the redundant conditions (mentioned at the end of the video) for deduping.

    • @NuncNuncNuncNunc
      @NuncNuncNuncNunc 8 หลายเดือนก่อน

      You can use IFNULL to replace null with empty string which will produce a separator at the appropriate location, i.e. 1| vs |1. You could also prefix each value with the column name as a way to produce unique concatenated strings, i.e. col1:1|col2: vs col1:|col2:1. I don't think much is gained by using column name prefixes except if you ever need to see the pre-hashed text.

  • @SebastianMares
    @SebastianMares 8 หลายเดือนก่อน

    Just wondering: Is an index on a generated column like in your example identical performance-wise and storage-wise to an index on an ULID or optimized UUID column? I assume the fragmentation would be worse for such a generated hash column due to the randomness of the values. (And yeah, I know the two serve totally different purposes, no question in that.)

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      It'll be about the same as any other secondary index! The bad thing about random UUIDs as *primary* keys is that the entire clustered index has to be rebalanced along with all the data. That's why you (rightly) here people go on and on about ULIDs and time-ordered UUIDs, so that that rebalancing doesn't have to happen. With a secondary index, it's constantly being broken and rebalanced, but it's so much cheaper because it's only the indexed data, not the entire table's worth of data as is the case with the clustered index. (Note this is accurate for MySQL only, I know very little about the inner workings of other databases.) Hope that makes sense slash answers your question!

    • @SebastianMares
      @SebastianMares 8 หลายเดือนก่อน

      @@PlanetScale Thanks!

  • @nathanoy_
    @nathanoy_ 8 หลายเดือนก่อน

    What program do you use to execute the queries and view the results?
    EDIT: Nvm, a few seconds after posting the comment I reached 8:37 where you mention that it is TablePlus.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +2

      Bah you beat me to it! Or I guess in a way I beat me to it. Regardless... TablePlus 😂

  • @vasiovasio
    @vasiovasio 6 หลายเดือนก่อน

    Well Aaron, here is where the Real Black Magic happens!
    Actually, this is really logical for some specific cases, not all of course.
    Keep introducing more of these hidden gems!

  • @alexandersnow5867
    @alexandersnow5867 8 หลายเดือนก่อน

    In theory, couldn’t you create the same concat string from various permutations of the column space, since the returned value doesn’t encode anything about the location of the null values?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      It is theoretically possible, yes! In that case adding the redundant conditions (mentioned at the end of the video) would be useful.

  • @emilk2729
    @emilk2729 8 หลายเดือนก่อน +4

    "Straight to jail!"🤣

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      It's true, don't try it!

  • @dave6012
    @dave6012 8 หลายเดือนก่อน +2

    Any concern with performance for tables where rows are frequently updated?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Hashes like this will be pretty cheap to calculate, so it's no concern to me.

  • @danielkurz7806
    @danielkurz7806 8 หลายเดือนก่อน

    What's the program you're using for interacting with mysql? Looks neat!

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      Table Plus!

    • @danielkurz7806
      @danielkurz7806 8 หลายเดือนก่อน

      thanks! great video btw

  • @andrewtran9870
    @andrewtran9870 8 หลายเดือนก่อน

    Amazing video, I'm a sql newbie, well newbie in any aspect of software engineering. But I was wondering, what's the downside of having an identity column as a primary key? Or rather, in what scenario would this concept be used over using identity?

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      I'd probably still have a primary key that's an integer, personally. Since these hashes would be randomly ordered, I wouldn't want them as PKs. I'd use them to check for existence or additionally throw a unique constraint on the hash to enforce uniqueness

    • @andrewtran9870
      @andrewtran9870 8 หลายเดือนก่อน

      @@PlanetScale Thanks for the response. So for checking existence, would I be correct to assume this method would be faster than checking "WHERE ... AND ... AND ..."?
      And as for the unique constraint, its possible to apply a unique constraint over multiple columns. So would applying the unique constraint to only the hash column have any benefits over multiple?

  • @rashshawn779
    @rashshawn779 8 หลายเดือนก่อน

    Sidenote: If you use a more efficiently indexed binary uuid, you need hex, cannot use the built in bin2uuid. Need to use hex/unhex directly.

  • @LuisSieira
    @LuisSieira 8 หลายเดือนก่อน

    Just being pedantic here, but you would have collisions if two columns can be null, and contain the same data, which is perfectly possible with user fed data, like "city=null, state=texas" and "city=texas, state=null" would have the same hash with you example. Just being pedantic as I said. Great video.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      You're totally right! Worth adding the extra conditions shown at the end if that's the case (and it may well be!)

  • @arcanernz
    @arcanernz 8 หลายเดือนก่อน

    Great technique but I’m not sure why you would have a large composite index (that’s not unique) since in my experience the optimizer only uses the first or at most the first and second columns from the index.
    I suppose you can use the latter columns to reduce a read to the main table but only if the query reads columns only from the index. But in that case a hash index wouldn’t be a viable solution.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      In the original case a unique constraint was added, which is a nice further use of a hash.

    • @arcanernz
      @arcanernz 8 หลายเดือนก่อน

      @@PlanetScale makes sense thanks

  • @npc73x
    @npc73x 8 หลายเดือนก่อน +1

    Can you discuss your thoughts on ULID

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Yes! It's on my list

  • @mahdikarimi6467
    @mahdikarimi6467 8 หลายเดือนก่อน

    Love you Aaron

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน +1

      🤗 Thanks Mahdi! Made my day

  • @rid9
    @rid9 8 หลายเดือนก่อน

    How about compressing it into an even smaller hash that may even collide from time to time, and if it does collide, then use the explicit WHERE clases you added? The smaller hash could just be, for example, half of the MD5 output, so just 8 bytes.

    • @PablumMcDump
      @PablumMcDump 8 หลายเดือนก่อน

      The more you match, the more you'll have to scan. Your trade-off will be I/O, but whether or not that's better will be situational. You'll be trading index scan for scanning the matching table rows.

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Totally doable and would depend on your data! I'm not sure at what point that would be better or worse, but it'd be fun to experiment with

  • @DirkSchut
    @DirkSchut 6 หลายเดือนก่อน

    You said that MD5 is just fine for this specific purpose. Wouldn't it be highly preferable though? Because it's cheap computationally?

  • @conceptrat
    @conceptrat 8 หลายเดือนก่อน

    You are very entertaining. Great content. Thanks 👍
    Straight to jail! 😂

  • @FailMachineInc
    @FailMachineInc 8 หลายเดือนก่อน

    thanks

  • @damaradiprabowo
    @damaradiprabowo 8 หลายเดือนก่อน +1

    Anyone have any idea of what SQL client he is using? It looks neat

    • @PlanetScale
      @PlanetScale  8 หลายเดือนก่อน

      Table Plus