Faster database indexes (straight from the docs)

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 ธ.ค. 2024

ความคิดเห็น • 318

  • @PlanetScale
    @PlanetScale  ปีที่แล้ว +13

    If you enjoyed this, we also have a free MySQL for Developers course with 50+ videos. Thanks for watching! planetscale.com/learn/courses/mysql-for-developers/introduction/course-introduction

  • @aegif
    @aegif ปีที่แล้ว +115

    As a student that has taken a dbms course in mysql and hated every part of it, I can absolutely tell that I would love it had it been even a bit like this video and series in general. Thank you Aaron, this content is genuinely incredible and is making me want to actually learn the details of mysql.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +20

      I can't tell you how happy this makes me. It's such a shame to me that so many interesting things are taught in such an uninteresting way.

    • @ahmad-murery
      @ahmad-murery ปีที่แล้ว +10

      @@PlanetScale Teaching is a skill that not every expert possesses (assuming they were expert in the first place)

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +4

      @@ahmad-murery Extremely good point

    • @godonholiday
      @godonholiday ปีที่แล้ว +1

      Aaron, this approach is refreshing. Hope you get to some focused use case series, like SQL for Product Managers!

    • @reikooters
      @reikooters ปีที่แล้ว

      You'll have to learn it one day. There's no system in the world that doesn't use data - it's the entire point of why we build applications. I used to fall asleep in my database classes when I was in university (15 odd years ago) and my mate used to wake me up when the class finished because the lecturer was terrible. But after graduating I spent a lot of time teaching myself about databases for the reason I mentioned earlier.

  • @remedix
    @remedix ปีที่แล้ว +57

    I don’t usually post on videos but the quality and the details on these is amazing! Thank you Aaron! Keep them coming!

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Thank you so much, I really appreciate you saying that. We're definitely gonna keep em coming 🫡

    • @andynn6691
      @andynn6691 ปีที่แล้ว +1

      Agreed! I don't even use or like MySQL very much, but the content is often more broadly applicable.

  • @transcendtient
    @transcendtient ปีที่แล้ว +24

    I appreciate you talking about collisions because the one case that you do have where it happens would be a nightmare if you didn't plan for it no matter how small the chance may be.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +5

      Totally. Especially if you use CRC32 which is going to have massive amounts of collisions!

    • @Pilikio
      @Pilikio ปีที่แล้ว +4

      Also when using non cryptographic hash function you have to assume that a malicious user may enter data to ptoduce hash collisions on purpose.
      In best case this will lead to an annoying bug, in worst case a severe security vurnability.

    • @sealoftime
      @sealoftime 6 หลายเดือนก่อน

      @@Pilikio It can very-very theoretically cause Hash DoS attack

  • @MonokelJohn
    @MonokelJohn ปีที่แล้ว +4

    There are some pitfalls when using concatenated rows as the key, especially when there is no separator, but also when columns are nullable.
    What I'm thinking about is a row with column values 123 and 45 and another row with values 12 and 345, which would end up having the same concatenated value (when using just concat), and therefore the same hash. You could still end up in a similar situation when using concat_ws when some of the columns are nullable. We saw that the null values are completely omitted, so two sequential nullable columns could end up with the same hash, when a value is present in either one or the other column. There are even more ways where null values in some columns could result in the same input for the hash calculation. If the null values were converted to an empty string, which would create a separator representing every column, which would solve that problem, it would however be hashed the same as an empty string, or any other string you would use to represent null, but it's the best solution I can think of.
    That's why I think it's really critical to do the full comparisons described at 11:50.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +2

      Yup this is a great argument for adding the extra conditions

    • @lyuboslavilov
      @lyuboslavilov ปีที่แล้ว +1

      Rule of thumb - absolutely always do use separators when concatenating multiple values and producing whatever fingerprint you are producing

  • @ayeshk5997
    @ayeshk5997 ปีที่แล้ว +1

    I've been using MySQL for ages now, and yet you still come up with new things and ideas (like this one and the Geo box combined with haverseine calculation). Keep it up!

  • @alexbalandi
    @alexbalandi ปีที่แล้ว +1

    I'm coming from your planetscale mysql db course and just gotta say - it's really amazing. Just the perfect level of depth, engineering knowledge and ease of use. I've been mostly the ml engineer guy through my career, now I'm switching to full-stack role and your course is really amazing for me to delve into optmizing the database shenanigans :) Thank you very much!

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      I'm so glad to hear that! Thank you for letting me know. I put a lot of work into that course so I'm pleased that you've enjoyed it

  • @aoe4_kachow
    @aoe4_kachow ปีที่แล้ว +19

    I wish the results were rendered in a monospace font so that the md5 hashes lined up neatly

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +7

      You and me both 🫠

  • @simonhartley9158
    @simonhartley9158 ปีที่แล้ว +13

    Lossy compound indexes seems such a good idea, that it should become a standard built-in feature at some point, e.g. a particular index type. You could then have index types like Postgres' GIST which will handle the deduping automatically.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +5

      Yeah that's an interesting idea. It would be neat to see that built in

  • @jimothyus
    @jimothyus ปีที่แล้ว +5

    These videos are really high quality. You are doing an excellent job

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Thank you so much for saying so

    • @JohnRoux
      @JohnRoux ปีที่แล้ว

      Yeah so much this!
      When I see a new video on my feed, I watch it before all my usual content because honestly, this is just so good!

  • @reikooters
    @reikooters ปีที่แล้ว +16

    You missed one big thing. When you add the generated column, you need to also specify the "stored" keyword. This means when inserts/updates are made, the generated column will be computed and stored in the table at the time of writing. The default behaviour is "virtual", which means the generated column is not stored with the data and instead must be computed at the time of reading. This means every time you select from the table, mysql needs to recompute the hash for every row in order to find the one you want, which kinda defeats the purpose of the whole exercise.

    • @monad_tcp
      @monad_tcp ปีที่แล้ว

      It all depends on how fast is hashing vs disk I/O.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +11

      In this particular example the data is stored even though it is a virtual column. Because we put an index on it, those values are calculated and then written into the btree. Calculating the hash is pretty cheap, but if you have an expensive generated column it might be worth declaring it stored instead of virtual.

  • @anderskozuch7838
    @anderskozuch7838 ปีที่แล้ว +2

    You are awesome man:) I am self learning web development and you are such a nice source for wholesome and humouristic learning!

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +2

      I love to hear that! Thank you for letting me know. And good luck on your web dev journey!

  • @mibennettc
    @mibennettc ปีที่แล้ว

    I saw this video and it literally solved a problem I had at work the next day. Stellar job Aaron!

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Oooo no way! Can you give me any more hints? What'd you end up hashing?

  • @ihzakarunia2408
    @ihzakarunia2408 ปีที่แล้ว

    As developer that always use mysql, you very did a great job explaining this topic. Thank you very much

  • @StingSting844
    @StingSting844 ปีที่แล้ว +1

    As someone who is just getting started with databases, this is incredible! Thank you A Aron

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      A A Ron 😂😭 You're welcome!

  • @jhechtf
    @jhechtf ปีที่แล้ว +2

    "remember if you hash passwords with md5, straight to jail" the delivery got me to laugh out loud, take your like.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Got em! 😂 Glad you liked that part

  • @rickmyers7627
    @rickmyers7627 11 หลายเดือนก่อน

    Excellent Aaron! This technique can also be used to create a Cache key to store data in memory (or on disk), if you need to "roll-you-own" caching mechanism. Basically, sort/uppercase your arguments, then MD5 them together, and then use a hash array to map that MD5 to the response from the server. We do that with address lookups so we aren't constantly going to Google for geocode data (and eventually getting charged). Once we see an address we cache Google's geocode response and we never have to do the lookup again, thus saving money down the road.

  • @saadkhan3401
    @saadkhan3401 ปีที่แล้ว

    My man this is is very greate and the niche you choose to post this video perticularly is very great the quality and everything I hope you make this as a series as this will help everyone a lot ❤

  • @andyvirus2300
    @andyvirus2300 ปีที่แล้ว +2

    Thank you for talking about collision, I was wondering about it even if I knew this wasn’t really an issue !

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +3

      It's always fun to look up collision probabilities!

    • @andyvirus2300
      @andyvirus2300 ปีที่แล้ว +1

      @@PlanetScale not as much as your videos, and I’m learning a ton.
      Please keep up the good work !

  • @gazsi72
    @gazsi72 ปีที่แล้ว

    I haven't learned such a useful trick from a youtube video in a long time, thx

  • @budidarmawan6959
    @budidarmawan6959 ปีที่แล้ว

    i didn't even use MySQL, yet i was pleasantly entertained by your video (i believe some of the knowledge is applicable to other databases as well)
    thank you, keep up the good vid 💪🏻

  • @postnick
    @postnick ปีที่แล้ว

    Content like this I love - I hope I can find a useful way to do this in my position but my indexes are always Material + location + date usually.

  • @k1chko
    @k1chko ปีที่แล้ว +1

    You do great work! Tough to find solid material to point developers to.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      I appreciate that very much

  • @pranaygoel
    @pranaygoel ปีที่แล้ว

    Hi from India!
    It's the best thing I came across today.

  • @ridass.7137
    @ridass.7137 ปีที่แล้ว

    So well explained with the right pace and example, its incredible. Thank you!

  • @Hislodin
    @Hislodin ปีที่แล้ว

    I can see how this could be very useful in optimizations. Thank you for your practical guide on deeply doc-founded knowledge.

  • @thereasonableprogrammer4921
    @thereasonableprogrammer4921 10 หลายเดือนก่อน

    As someone who likes reading the MySQL docs, printing them out and binding them is hardcore 😂 props Aaron 🎉

  • @Mika-s6e
    @Mika-s6e ปีที่แล้ว

    Super. I remember dozens of usecases, where this implementation would of saved the day.

  • @TroyNiemeier
    @TroyNiemeier ปีที่แล้ว

    You got my sub for the "Instead of hanging out with friends, ....we continue to read " line. :)

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Haha glad you liked that one 😂

  • @nawaahkh
    @nawaahkh ปีที่แล้ว

    Just 😲, a huge thank you for all these information in all videos.

  • @n4trojan
    @n4trojan ปีที่แล้ว

    That's was definitely a great piece of knowledge I got in just 13 minutes. Thank you!

  • @jamaly87
    @jamaly87 ปีที่แล้ว +1

    Dude your content is 🔥.

  • @addanametocontinue
    @addanametocontinue ปีที่แล้ว +1

    Great content. Got yourself a new subscriber!

  • @simonnjoroge933
    @simonnjoroge933 ปีที่แล้ว

    Amazing video, great explanation and also, nice sql tool you have there. More videos 🌟

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Thank you! Got a big one coming out tomorrow on a similar topic 🤐

  • @nerdswhosell
    @nerdswhosell ปีที่แล้ว +1

    Love this. Is there any tips on thinking about this when building/designing the database schemas ?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Thanks! I think the most common use cases would be 1) when you're trying to index something that's not easily indexable, you can reach for this or a functional index 2) when you're trying to "paper over" some weirdness in your data model as a convenience or 3) when you're trying to create an ID for an entity that doesn't already have a natural ID. In this case the address didn't have a natural ID, so calculating a hash of the parts gives us a nice ID that we can use for deduping, uniqueness, or indexed lookups from the application side

  • @dave6012
    @dave6012 ปีที่แล้ว +2

    Any concern with performance for tables where rows are frequently updated?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Hashes like this will be pretty cheap to calculate, so it's no concern to me.

  • @youssefhossam9077
    @youssefhossam9077 ปีที่แล้ว

    I really learned something today, thank you man.

  • @yangsui1301
    @yangsui1301 ปีที่แล้ว

    The generated column got me thinking if there is some places I can actually do better on my work, thanks Aaron!!

    • @yangsui1301
      @yangsui1301 ปีที่แล้ว

      One more thing, should we ignore the cost of concat, md5 or any pre-processing stuff here? I mean, I'm not a MySQL expert so I'm not sure if we do this calcuation in database level is good or not. And when should will consider using a generated column instead of a composite index? Should we always make a experiment or if there is any instuction we can follow?

    • @yangsui1301
      @yangsui1301 ปีที่แล้ว

      BTW the generated column is a really nice idea, if there is a column referencing or calculating by the others that our apps will have to keep it correct, it's struggling because we are afarid of there is some place in our app that doesn't handle that calculation correctly, but if this job is done by the database, at least we can avoid lots of code changes and just trust the database will keep the consistency as usual.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Generated columns are so fun. You can use them in so many places. I would probably ignore the cost of a concat or MD5, but if you were to do something very expensive in a generated column, you might consider declaring it STORED so it's not calculated over and over. I like having the database maintain these calculated values so that no matter where the insert or update is coming from, the generated column is always going to be correct and present.

    • @yangsui1301
      @yangsui1301 ปีที่แล้ว

      @@PlanetScale Fair enough! Will definitely try it, thanks Aaron!!

  • @JohanNordberg
    @JohanNordberg ปีที่แล้ว

    I really like your presentation style. Clear and explanatory and good video length.
    I never use MySQL, so one improvement for me would be if you could mention if this would be relevant for MSSQL and Postgres as well and if the syntax is different, just make a short note of that.
    I assume most concept are similar.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Thanks Johan! I'm not as familiar with the other databases so I feel hesitant to speak on them

  • @pixtweaks393
    @pixtweaks393 ปีที่แล้ว

    Thanks for sharing this amazing idea, very well explained, thank you!

  • @adampatterson
    @adampatterson ปีที่แล้ว

    This is really cool, the only thing I was curious about is the hashing performance?
    I know the hash is only created in insert and update.
    It doesn't look like MySQL supports any of the newer hashes like xxh3 which would be my choice since it's so incredibly fast, typically faster than the computers memory bandwidth. It's not cryptographically ( is that a word ) strong, but it's randomness is extremely high

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      MD5 is incredibly fast also, which is actually one of the reasons it shouldn't be used for cryptography! I haven't benchmarked it, but I think it should be fast enough for most needs. You could try CRC32 but you'd have a ton of collisions so you'd definitely need to add the additional conditions mentioned at the end of the video. Worth a few experiments to see, I suppose!

  • @joshhickman77
    @joshhickman77 ปีที่แล้ว

    The MD5 hash collision isn't the reason you add the WHERE, it's because sometimes your data can have the separator in it and "123|abc", "def" gets combined to the exact same string as "123", "abc|def" even though the values don't match.

  • @CRBarchager
    @CRBarchager ปีที่แล้ว

    Awesome video. Now I just need to find the T-SQL version of your channel and I'm complete :D

  • @rasheedatanda9927
    @rasheedatanda9927 ปีที่แล้ว

    Thank you for this video. It provides more insight into why some rule of thumb exists in the database world.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      You're welcome!

    • @rasheedatanda9927
      @rasheedatanda9927 ปีที่แล้ว

      @@PlanetScale In short, all your videos I've watched made lots of sense to me considering my side quest on building a database. I'm now able to see things from a perspective different from the usual "do this don't do that because of so and so..." which is quite rampant in the engineering world.

  • @toki.mp3
    @toki.mp3 ปีที่แล้ว

    Do you have any plans to add foreign key support?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      We do in fact 🤐 Follow us here or on Twitter to stay up to date, but things are ✨happening✨

    • @toki.mp3
      @toki.mp3 ปีที่แล้ว

      @@PlanetScale bro I tried it with drizzle orm before it did not work. foreign key is a big thing for my workflow.When did you release it?

  • @vasilisthemelis9017
    @vasilisthemelis9017 ปีที่แล้ว

    You can have collisions due to the separator being included in the value stored in the columns as well

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Totally! If that's possible with your data, that'd be a great reason to use the redundant conditions (mentioned at the end of the video) for deduping.

  • @jayantrawat5961
    @jayantrawat5961 ปีที่แล้ว +1

    First of all that's a great video and thanks a lot for that.
    I have a question regarding this, as we have 6 columns hashed as index and I need to search in 3 columns instead of six, will it be helpful in that case too?? Like I want to search in address, zip code and state column, will it work??

    • @LucasSouza1
      @LucasSouza1 ปีที่แล้ว

      10:50

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Unfortunately @LucasSouza1 is correct. It's no longer possible because you've destroyed quite a bit of that data by hashing it.

  • @ongong8209
    @ongong8209 ปีที่แล้ว

    thanks you so much, your videos so helpul for my job. Can you do more video about EXPLAIN ANALYZE and how to optimize execution plan in MYSQL

  • @BMRG14
    @BMRG14 ปีที่แล้ว

    So many great tips! Thank you very much!

  • @damaradiprabowo
    @damaradiprabowo ปีที่แล้ว +1

    Anyone have any idea of what SQL client he is using? It looks neat

  • @rrd_webmania
    @rrd_webmania ปีที่แล้ว

    A gem again. Thanks

  • @treesitter
    @treesitter ปีที่แล้ว

    This video screams high quality content!

  • @arcanernz
    @arcanernz ปีที่แล้ว

    Great technique but I’m not sure why you would have a large composite index (that’s not unique) since in my experience the optimizer only uses the first or at most the first and second columns from the index.
    I suppose you can use the latter columns to reduce a read to the main table but only if the query reads columns only from the index. But in that case a hash index wouldn’t be a viable solution.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      In the original case a unique constraint was added, which is a nice further use of a hash.

    • @arcanernz
      @arcanernz ปีที่แล้ว

      @@PlanetScale makes sense thanks

  • @oguznsari
    @oguznsari ปีที่แล้ว

    this is great, thank you aaron

  • @kaprikornz
    @kaprikornz ปีที่แล้ว

    Quality videos man! Keep it up

  • @soofgolan
    @soofgolan ปีที่แล้ว

    Awesome video! one small nitpick, since you're dropping nulls when joining columns into a string with the "|" separator, the following rows will be converted into the same string before hashing, even if values came from different columns.
    All these:
    "A", NULL, "B"
    NULL, "A", "B"
    "A", "B", NULL
    Map to "A|B"
    I suggest using a more resilient value concat'ing scheme

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      True! Unlikely to happen with this data, but that would be a good reason to use the conditions shown at the end of the video

  • @npc73x
    @npc73x ปีที่แล้ว +1

    Can you discuss your thoughts on ULID

  • @andrewtran9870
    @andrewtran9870 ปีที่แล้ว

    Amazing video, I'm a sql newbie, well newbie in any aspect of software engineering. But I was wondering, what's the downside of having an identity column as a primary key? Or rather, in what scenario would this concept be used over using identity?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      I'd probably still have a primary key that's an integer, personally. Since these hashes would be randomly ordered, I wouldn't want them as PKs. I'd use them to check for existence or additionally throw a unique constraint on the hash to enforce uniqueness

    • @andrewtran9870
      @andrewtran9870 ปีที่แล้ว

      @@PlanetScale Thanks for the response. So for checking existence, would I be correct to assume this method would be faster than checking "WHERE ... AND ... AND ..."?
      And as for the unique constraint, its possible to apply a unique constraint over multiple columns. So would applying the unique constraint to only the hash column have any benefits over multiple?

  • @visualapproach7155
    @visualapproach7155 ปีที่แล้ว

    Ok, wait a minute. How is it possible I haven't come across your content before? You actually know your stuff. Moreover, I can't even play out the jokes in my head before you say them out loud. GET OUT OF MY HEAD. Also, keep producing amazing content. Deal? Deal.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      You drive a hard bargain but you got yourself a deal

  • @languagelearningexperience6814
    @languagelearningexperience6814 ปีที่แล้ว

    Excellent video!

  • @TuralAsgar
    @TuralAsgar 11 หลายเดือนก่อน

    Good for you, Aaron!

  •  ปีที่แล้ว

    You have enough friends and I'm proud to be your friend 👍🏻

  • @nebbsie
    @nebbsie ปีที่แล้ว

    Nicely explained! What software are you using for querying?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      TablePlus! th-cam.com/video/7V_CJBPZPes/w-d-xo.html

  • @AntonyXavier-v9j
    @AntonyXavier-v9j 7 หลายเดือนก่อน

    what if you update one of the columns that is used for indexing, will the hash also be updated ?

  • @MelroyvandenBerg
    @MelroyvandenBerg ปีที่แล้ว +1

    This is so great! I hope this all works on MariaDB as well?? And now I also need to print this "book" ;)

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Theoretically it should work the exact same! Smaller = faster. Give it a go and let me know

  • @konserv
    @konserv ปีที่แล้ว +1

    I would rather choose smaller hash and worry about hash collisions than picking wider hash and hoping there will be no collisions.
    Additionally to that, our CPUs and memory works better with 32/64 bit data that with 128 bit data. And, in this case, I would pick CRC32.
    Also, MD5 was designed as a cryptographic hash, you don't need cryptograpy overhead here. Simpler hash is better here.
    It would be nice to see a benchmark of those 3 approaches. Wide index, MD5, CRC32.
    Anyway, thanks for the video. I learned about auto-generated columns. And it was interesting.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +2

      CRC32 plus the additional conditions to dedupe hashes would work just fine!

  • @SlimNateKC
    @SlimNateKC ปีที่แล้ว

    I love this, I’m curious why this isn’t just the default behavior for indices and what the crossover point is for when this is faster than the traditional index

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      It has pretty severe tradeoffs, like the fact that it only works for strict equality lookups

  • @daniellionel01
    @daniellionel01 ปีที่แล้ว

    What's the program you're using for interacting with mysql? Looks neat!

  • @bogdanta7931
    @bogdanta7931 ปีที่แล้ว

    from how many indexed tables does it start to make sense?

  • @nathanoy_
    @nathanoy_ ปีที่แล้ว

    What program do you use to execute the queries and view the results?
    EDIT: Nvm, a few seconds after posting the comment I reached 8:37 where you mention that it is TablePlus.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Bah you beat me to it! Or I guess in a way I beat me to it. Regardless... TablePlus 😂

  • @DJMariush
    @DJMariush ปีที่แล้ว

    So this technique could be used to hash string column with uuid values that are a key, to avoid indexing uuid column?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      There is actually a uuid_to_bin column to compress a uuid value, which you could use to create a smaller index!

  • @JohnnyBigodes
    @JohnnyBigodes ปีที่แล้ว

    We use this to search for duplicates on the database. A thing that I would personally change is making all strings to LOWER(), so that if the user uses a big letter instead of a smaller one, it still sees it as a duplicate.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Nice! That's probably a good addition. Good thinking

  • @sholtronicsaaa101010
    @sholtronicsaaa101010 ปีที่แล้ว

    Really informative video. Just a question. Would a row with values 1, null and null, 1 not hash to the same value in this case using the concat_ws method?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Yes! It would. If that's possible with your data, that'd be a great reason to use the redundant conditions (mentioned at the end of the video) for deduping.

    • @NuncNuncNuncNunc
      @NuncNuncNuncNunc ปีที่แล้ว

      You can use IFNULL to replace null with empty string which will produce a separator at the appropriate location, i.e. 1| vs |1. You could also prefix each value with the column name as a way to produce unique concatenated strings, i.e. col1:1|col2: vs col1:|col2:1. I don't think much is gained by using column name prefixes except if you ever need to see the pre-hashed text.

  • @diavolokelevra4795
    @diavolokelevra4795 ปีที่แล้ว

    Worrying about collision of md5 maybe a bit paranoid but one other collision I do worry about is concat_ws doesn't entirely solve the null problem, that null values simply disappear. For example, using your table schema, there will be collision if there is an address with street_name HL with no street_suffix, and another with no street_name and street_suffix HL. Boom, collision. I have seen too many weird real life data to be at ease about this.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Totally! In that case adding the extra conditions makes a lot of sense

  • @DirkSchut
    @DirkSchut ปีที่แล้ว

    You said that MD5 is just fine for this specific purpose. Wouldn't it be highly preferable though? Because it's cheap computationally?

  • @mateuszmalik1531
    @mateuszmalik1531 ปีที่แล้ว

    One question. Would it be possible to send a hashed value to MySQL over unencrypted network at use some built in MySQL function to unhash this data to separate columns?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Hashing (like shown here) is a one-way function, so it's not reversible. I'm not sure exactly what you're going for, to be honest!

    • @mateuszmalik1531
      @mateuszmalik1531 ปีที่แล้ว

      @@PlanetScale In general I have two questions. 1st. You showed that MySQL can (un)hash data and keep it private or concatenate larger information, it can even do so in realtime based on information changed in the fields of a table. Can it also do it in the other way, update fields in the table based on changed data in the hashed field in real time? 2nd. Maybe this is a topic for different time/video. Synchronise two distant databases. An offline version periodically connects to an online and upon connection a synchronisation takes place. Is there a built in MySQL function to perform something like this?

  • @ikhwanmaftuh
    @ikhwanmaftuh ปีที่แล้ว

    Niw i wonder what if the we add a cokumn from other table

  • @john-doe-r3h
    @john-doe-r3h ปีที่แล้ว

    Is there a hash function in MySQL, such that if I run a query with "order by f(a,b,c)", the result set will be in the same order, as for the query with "order by a, b, c"?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Well now that's an interesting question isn't it. I can't think of a lossless way to do that off the top of my head, but the question will haunt me so I'll let you know if I come up with anything.

  • @alexandersnow5867
    @alexandersnow5867 ปีที่แล้ว

    In theory, couldn’t you create the same concat string from various permutations of the column space, since the returned value doesn’t encode anything about the location of the null values?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      It is theoretically possible, yes! In that case adding the redundant conditions (mentioned at the end of the video) would be useful.

  • @nick-khan
    @nick-khan ปีที่แล้ว

    Great content up to the point! What MySQL version are you using and reading docs of?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      The docs here were version 8, but this pattern will work in any version!

  • @VuTuanIT
    @VuTuanIT ปีที่แล้ว +1

    This is so useful that I feel stupid that didn’t knowing this after 6 years working with mysql 😂

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Nah don't feel stupid, we should all be learning new things constantly!

  • @TightyWhities94
    @TightyWhities94 ปีที่แล้ว

    Will this not hurt write performance though? I don't see how this is different than UUID indexes? Now if the hashes are sequential that's another story.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      A UUID index is normally a primary key. This would replace a composite secondary key. It would have the same insert performance of any secondary key. The hash calculation is de minimus

  • @kyleridolfo
    @kyleridolfo ปีที่แล้ว

    Truly excellent video

  • @119_shrenikshah7
    @119_shrenikshah7 ปีที่แล้ว

    Great quality content

  • @thmater-x9786
    @thmater-x9786 ปีที่แล้ว +1

    Whish there is a channel like this for mongodb :(
    Awesome content ❤

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Guess you'll have to switch to MySQL 😏😏

  • @standardio8270
    @standardio8270 ปีที่แล้ว

    This is great content.

  • @iWhacko
    @iWhacko ปีที่แล้ว

    is it possible to add a trigger on select statements to do the concat automatically? and then make it search on that?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      I don't think a trigger is necessary here, as you can use a generated column to do the concat automatically. Perhaps I've misunderstood the question though!

    • @iWhacko
      @iWhacko ปีที่แล้ว

      @@PlanetScale yeah to generate it yes. But once you added the column. You said in order to search with that indext you can concat it in the select query and search on the generated column. Bit instead of concating in your select query, could you add a trigger that takes the un concatenated fields to concat it for you and then search on the generated column. Just in case you forget to concat in your query or older parts of the software don’t implement that yet

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      @@iWhacko Ah gotcha. I actually don't know! Maybe so? I haven't worked with select triggers very much. I'd probably opt against that personally, but it might work!

  • @veds-art-world
    @veds-art-world ปีที่แล้ว

    Can we add Hex values for all characters values and then use it in application i.e. php, Java, etc?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      Hmm I'm not sure I totally understand the question!

    • @veds-art-world
      @veds-art-world ปีที่แล้ว

      I mean to say if address table has multiple varchar columns, instead of storing varchar data, should I change column type to binary column for storing data? Is it good idea?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      @@veds-art-world Ah I see. Not that's probably not great because in that case you actually do need/want to store the characters as they semantically mean something. With an MD5 hash the characters are meaningless, the sequence of bytes is all that matters.

  • @Pptruenoz
    @Pptruenoz 11 หลายเดือนก่อน

    Mysql docs are blocked for my country Venezuela

  • @SebastianMares
    @SebastianMares ปีที่แล้ว

    Just wondering: Is an index on a generated column like in your example identical performance-wise and storage-wise to an index on an ULID or optimized UUID column? I assume the fragmentation would be worse for such a generated hash column due to the randomness of the values. (And yeah, I know the two serve totally different purposes, no question in that.)

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +2

      It'll be about the same as any other secondary index! The bad thing about random UUIDs as *primary* keys is that the entire clustered index has to be rebalanced along with all the data. That's why you (rightly) here people go on and on about ULIDs and time-ordered UUIDs, so that that rebalancing doesn't have to happen. With a secondary index, it's constantly being broken and rebalanced, but it's so much cheaper because it's only the indexed data, not the entire table's worth of data as is the case with the clustered index. (Note this is accurate for MySQL only, I know very little about the inner workings of other databases.) Hope that makes sense slash answers your question!

    • @SebastianMares
      @SebastianMares ปีที่แล้ว

      @@PlanetScale Thanks!

  • @domanjie
    @domanjie ปีที่แล้ว

    Quality content👌🏾, Thank you.

  • @yakirgb
    @yakirgb ปีที่แล้ว

    Very good video!

  • @rahulxcr
    @rahulxcr ปีที่แล้ว

    Thanks. This is really awesome trick.

  • @Vinicius7cordas
    @Vinicius7cordas ปีที่แล้ว

    Is this MySQL GUI from PlanetScale?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      It's TablePlus, actually!

  • @alvaroaquije2572
    @alvaroaquije2572 ปีที่แล้ว

    This man needs to be more viral; question: you mentioned strict equality, but will this approach also work with statements like ‘is like’?

    • @PablumMcDump
      @PablumMcDump ปีที่แล้ว +1

      The hash function obliterates LIKE because there's no substring any more to match on.
      The MD5 hash is only suitable for exact matching.

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว

      Unfortunately it only works for strict equality

  • @luizhrocha
    @luizhrocha ปีที่แล้ว

    This is gold

  • @dragonwave2652
    @dragonwave2652 ปีที่แล้ว

    What app are you using to query request mysql?

  • @rashshawn779
    @rashshawn779 ปีที่แล้ว

    Sidenote: If you use a more efficiently indexed binary uuid, you need hex, cannot use the built in bin2uuid. Need to use hex/unhex directly.

  • @j4ggi
    @j4ggi ปีที่แล้ว

    Are generated columns in mysql persisted by default or does it allow you to create an index on non-persisted value? both options seem kinda weird/interesting

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +4

      By default they are virtual, meaning they are calculated at run time. You can create them with the STORED keyword and they'll be written to disk. Interestingly, if you create a virtual generated column and then index it, the value is in fact written to disk, because the index is written to disk.

  • @lapulapucityrider3227
    @lapulapucityrider3227 ปีที่แล้ว

    does mysql have trigger?

    • @PlanetScale
      @PlanetScale  ปีที่แล้ว +1

      It does! I don't use them very often, especially in cases where generated columns will suffice.

    • @lapulapucityrider3227
      @lapulapucityrider3227 ปีที่แล้ว

      @@PlanetScale keep making content I enjoy the way you teach. I always watch you when Im going to sleep and I like your relaxing voice