The Problem With UUIDs

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 พ.ค. 2024
  • THANK YOU PLANETSCALE FOR SPONSORING THIS VIDEO
    UUID's have a time and place, but I rarely see them used correctly. I've wanted to do this rant for awhile and I'm happy I did because CUID2 is NOT a good option either 🙃
    SOURCE
    planetscale.com/blog/the-prob...
    Check out my Twitch, Twitter, Discord more at t3.gg
    S/O Ph4se0n3 for the awesome edit 🙏
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 965

  • @einrealist
    @einrealist หลายเดือนก่อน +620

    UUID versions are NOT supposed to replace each other. They are just different implementations that address different problem domains. Once humans become a serious space-faring civilization, we probably need yet another UUID version for problems like time dillation. But such a UUID V99 is not supposed to replace V4, but to complement it. V4 can still be used within certain bounds.

    • @vikingthedude
      @vikingthedude หลายเดือนก่อน +8

      So its like newton’s laws

    • @TheOriginalBlueKirby
      @TheOriginalBlueKirby หลายเดือนก่อน +46

      ​@@vikingthedudeTerrible analogy.

    • @Imperial_Squid
      @Imperial_Squid หลายเดือนก่อน +64

      Calling them something like flavours rather than versions is probably more fitting

    • @steamer2k319
      @steamer2k319 หลายเดือนก่อน +23

      ​​@@Imperial_Squid
      Yeah. I've never been confused by the UUID types but I could see how one could be misled into expecting obviation/revision from a numerical sequence of "versions".

    • @vikingthedude
      @vikingthedude หลายเดือนก่อน +4

      Newtons laws are still useful even though we’ve got general relativity

  • @flymetothemoon5138
    @flymetothemoon5138 29 วันที่ผ่านมา +382

    It's amazing watching somebody talk shit about something he obviously doesn't understand beyond his sponsors blog post

    • @pchris
      @pchris 21 วันที่ผ่านมา +67

      He talked up the video at the start like he's gone through many iterations of it so I was kind of shocked when the entire video was just him reading somebody else's blog post. Did he really have so few thoughts on it himself, or were they all about GUID and made obsolete as he read the UUID article?

    • @hellowill
      @hellowill 20 วันที่ผ่านมา +41

      this frontend guy needs to stfu about backend and databases lol

    • @aguenter
      @aguenter 20 วันที่ผ่านมา +13

      @@pchris Skimmed right over ULID too, which fits his stated requirements/wants.

    • @DavidOtto82
      @DavidOtto82 17 วันที่ผ่านมา +8

      lol i saw this top comment and thought "lets see how long it takes until i can call bullsh*t"...took 2 minutes. btw: former frontend dev here xD

    • @tahvohck
      @tahvohck 16 วันที่ผ่านมา +4

      Thanks for saving me some time.

  • @averageemogirl
    @averageemogirl 18 วันที่ผ่านมา +67

    "hey guys! today i will be talking about something i clearly don't understand" is a great summary of this video

  • @eamonburns9597
    @eamonburns9597 29 วันที่ผ่านมา +172

    17:58 Just so you know, "I'm not telling you the difference between them, that's your problem", takes a lot more time to say than "1 byte is 8 bits"

    • @nArgari
      @nArgari 24 วันที่ผ่านมา

      Nice... So no difference between an octet and a Byte !? Or may be...
      The terms "octet" and "byte" are often used interchangeably in common usage, but there is a technical distinction between the two.
      A byte is a unit of digital information that consists of 8 bits. A bit is the smallest unit of data in a computer, and a byte is a group of 8 bits that can represent a single character of text or a small amount of numerical data. Bytes are commonly used to represent characters in text, encode instructions for a computer program, store data in memory, and transmit data over networks.
      An octet, on the other hand, is specifically defined as a unit of digital information that consists of 8 bits. In networking and data communication contexts, the term "octet" is often used to emphasize the size of the unit (8 bits) without assuming any particular character encoding or representation. The use of the term "octet" in networking standards helps to avoid confusion that may arise from different character encodings that use different numbers of bits per character.
      In summary, an octet is a specific term for a group of 8 bits, while a byte is a more general term for a group of 8 bits that can represent a character or other data. In practice, however, the two terms are often used interchangeably, especially in everyday language.

    • @vorrnth8734
      @vorrnth8734 24 วันที่ผ่านมา +20

      @@nArgari Actually a byte is the smallest directly accessible piece of memory. It does not need to be 8Bit wide. Historically there were other sizes too.

    • @uncaboat2399
      @uncaboat2399 23 วันที่ผ่านมา +1

      @@nArgari I thought an "octet" was a baby octopus? 😝

    • @PhilHibbs
      @PhilHibbs 22 วันที่ผ่านมา +2

      Anyone that deep into the video who doesn't know that is in the wrong place.

    • @eamonburns9597
      @eamonburns9597 22 วันที่ผ่านมา +1

      @@PhilHibbs I mean, if they are watching this channel, they are probably JavaScript developers, so....

  • @cherubin7th
    @cherubin7th หลายเดือนก่อน +934

    I propose v9, it is the same as v4, but instead of a 4 it has a 9. This changes everything.

    • @mx-kd2fl
      @mx-kd2fl หลายเดือนก่อน +16

      lmao

    • @DavidLindes
      @DavidLindes หลายเดือนก่อน +23

      vibes of the IPv9 (April Fools) RFC. :)

    • @johnblyberg4801
      @johnblyberg4801 หลายเดือนก่อน +3

      Genius!

    • @follantic
      @follantic หลายเดือนก่อน

      It's incremented. Good for perormance. Then we can proceed with the A-F versions.

    • @follantic
      @follantic หลายเดือนก่อน

      Also, Tom, is that you?

  • @xdaniels13
    @xdaniels13 หลายเดือนก่อน +747

    Oh I am still waiting on the video about PlanetScale removing the free plan and the cheaper price twice as much as competitors...

    • @portalpacific4578
      @portalpacific4578 หลายเดือนก่อน

      We need to stop watching compromised influencers pushing businesses despite not having any support from the dev community.
      I cant even hear the info cuz im so annoyed hes shilling their BS and taking their money.. Fck you Theo.

    • @portalpacific4578
      @portalpacific4578 หลายเดือนก่อน +145

      Theo = sell out.

    • @yiannis_p
      @yiannis_p หลายเดือนก่อน

      @@portalpacific4578do you ever Google anything before calling people names? Name me one other company offering a hosted vitess cluster with a https proxy, backups, branches and online ddl that is cheaper than planetscale and then I will agree with you.

    • @yiannis_p
      @yiannis_p หลายเดือนก่อน

      You do realise planetscale offers something no other company offers right ?
      The only other way to get a hosted vitess cluster with online ddl, backups and an http proxy is to make it yourself, and unless you value your time at zero dollars, it would be a lot more expensive…

    • @tanotive6182
      @tanotive6182 หลายเดือนก่อน +16

      He already made a video about that

  • @tarunpai2121
    @tarunpai2121 หลายเดือนก่อน +273

    Dude literally just read an article and nothing more. Offer some POV at least.
    - I don't understand if this is really an issue with mysql or with postgres too?
    - Is this even as big a problem? or is it just clickbait?
    - What's the actual performance impact? there's a lot of "perf issues" thrown around with no(?) benchmarks.

    • @urmom8322
      @urmom8322 หลายเดือนก่อน

      What do you mean? He was also able to get paid to shill for planetscale. Amazing content

    • @intentionally_obscure
      @intentionally_obscure 29 วันที่ผ่านมา +63

      Thanks for saving me half an hour

    • @hum6le
      @hum6le 29 วันที่ผ่านมา +69

      thats kinda this whole channel

    • @AxioMATlC
      @AxioMATlC 27 วันที่ผ่านมา +1

      Highly variable, but consider where an auto-incrementing 32bit int index is 16GB. UUIDs could be as large as 4x or 64GB. Now you are talking about the difference from a cheap server to a more expensive one unless you degrade to searching within an index based on drive speed instead of RAM. ~100x slower for index usage

    • @pchasco
      @pchasco 25 วันที่ผ่านมา +1

      @@AxioMATlCOK but you wouldn’t use cuid in a scenario where you can use a 32-bit int, either. You haven’t pointed to a problem with UUID here. This is a problem of choosing the wrong solution to a specific problem.

  • @rikschaaf
    @rikschaaf หลายเดือนก่อน +174

    3:49 To my knowledge, not all versions in UUID should be seen as "the next version". Some are more like variants than versions. That's why both v1 and v4 are still used often. You could say that variant 5 is a new version of variant 3 though, because we now have better hashing algs that might have not been considered in v3. Variant 6 similarly is a new version of v1, where they reordered the date to be sortable (with v7 being a variant of that). You shouldn't think of every new --"version"-- variant being better than the previous, but that they complement each other.

    • @workmad3
      @workmad3 หลายเดือนก่อน +13

      Exactly this. Going back to RFC-4122 (2005 one detailing the 5 currently standard variants), it says this about the variant bits:
      > The variant field determines the layout of the UUID. That is, the
      interpretation of all other bits in the UUID depends on the setting
      of the bits in the variant field. As such, it could more accurately
      be called a type field; we retain the original term for
      compatibility.
      The idea is you pick the type that suits your needs. If you don't have any particular needs, v4 (the 'random' type) is a good choice, because you simply want a long, random value.

    • @aredrih6723
      @aredrih6723 หลายเดือนก่อน +4

      It's unfortunate that the article didn't mention "version" 0 (that only contains the nil UUID), it would have made that point more obvious.

    • @davidmartensson273
      @davidmartensson273 หลายเดือนก่อน +1

      Yes, they are intended to solve different problems and all come with different trade offs to achieve that, a user bound one is good if you need to be able to identify the user over values stored in many many systems to not have to add multiple extra columns.
      I see some problems with this, but I can se the reason.
      And while you sometimes do want UUIDs that are sortable, especially if you want to use them as primary keys to avoid having to inject things in pages, sometimes you really want more random with no traceable time component.
      This feels very much like a commercial for planetbase and not informational.

    • @rikschaaf
      @rikschaaf หลายเดือนก่อน +11

      @@davidmartensson273 I don't think that the issue here was planetscale's article. It was very informative about the consequences of using those UUIDs in keys/indexes. I think that the issue here mostly came from Theo's prejudice and inability to see them as variants, rather than versions. To be fair, Wikipedia also talks about versions rather than variants, so Theo isn't alone in that regard. It would be good if official documentation and knowledge bases would be updated to reflect this. So, who's gonna change the word version to variant on wikipedia? 😄

    • @onoff5604
      @onoff5604 29 วันที่ผ่านมา +6

      sounds like 'versions' of raid storage...

  • @chepossofare
    @chepossofare หลายเดือนก่อน +528

    The proble with UUIDs is that you are not using PostgreSQL.

    • @mvnkycheez
      @mvnkycheez หลายเดือนก่อน

      What's the difference?

    • @KangoV
      @KangoV หลายเดือนก่อน +134

      In PostgreSQL a UUID is stored in 16 bytes. Indexes on these are super fast as they are performed numerically.

    • @spicynoodle7419
      @spicynoodle7419 หลายเดือนก่อน +15

      When using PostgreSQL instead of MySQL, your whole DB is much slower rather than only the UUIDs xD

    • @milanmiljus823
      @milanmiljus823 หลายเดือนก่อน +195

      @@spicynoodle7419 cringe

    • @landsman737
      @landsman737 หลายเดือนก่อน

      hehe exactly

  • @workmad3
    @workmad3 หลายเดือนก่อน +80

    Thinking about the different UUID variants as 'versions' that replace previous versions isn't quite right... the idea is that the different variants are used in different circumstances, so a deterministic variant is useful in some circumstances. Which is why there's a 'variant' number in it, so different variants can be checked differently.

  • @supersat
    @supersat หลายเดือนก่อน +23

    V1 was a pretty good idea because collisions were basically guaranteed to never occur. But that was in a world where MAC addresses were guaranteed to be unique, and VMs basically killed that.

  • @ChrysusTV
    @ChrysusTV 29 วันที่ผ่านมา +47

    "I spent years researching this to make sure I got it right. Turns out, I don't need to. Here's what my sponsor has to say." Sheeesh, DarkViper's video accurately notes a lot of key issues, like this one. Something always felt off about this channel and getting a different perspective really made things clear for me. This channel really is just regurgitation with, as you admit yourself in the first few seconds, little research.

    • @raphaelmorgan2307
      @raphaelmorgan2307 17 วันที่ผ่านมา +3

      also in the video shows us himself posting someone else's joke on Twitter lmao

  • @danhorus
    @danhorus หลายเดือนก่อน +117

    This channel does read a lot of blog posts

    • @Dekutard
      @Dekutard หลายเดือนก่อน

      i rather listen than read.

    • @thomassynths
      @thomassynths หลายเดือนก่อน +20

      Reading blog posts is fine and dandy. Regurgitating sponsor speak isn't though.

  • @steveoc64
    @steveoc64 29 วันที่ผ่านมา +101

    Javascript programmer using MySQL gives us a tech talk about the importance of performance at scale
    lol

    • @romankoncek150
      @romankoncek150 26 วันที่ผ่านมา +8

      Probably more experienced with real projects at scale than 99.9% of people here posting useless opinionated statements having nothing better to do

    • @JohnSmith-xv1tp
      @JohnSmith-xv1tp 23 วันที่ผ่านมา +19

      @@romankoncek150 As someone who has actually worked at scale, Theo should definitely have stayed in his lane on this one. He clearly doesn't know enough about databases nor UUID for this conversation.

    • @iPuls3
      @iPuls3 17 วันที่ผ่านมา

      Don't mind me, the Node.js TypeScript developer using MariaDB...

  • @dijikstra8
    @dijikstra8 หลายเดือนก่อน +91

    Using a char(36) to store a uuid just seems like the worst idea every, does anyone actually do this? Postgres has a specific datatype for uuid, does MySQL still not have that?

    • @mehmeterencelik594
      @mehmeterencelik594 หลายเดือนก่อน +16

      Yes. For my opinion, this video only shows how better the postgres than other relational dbs.

    • @nathanalberg
      @nathanalberg หลายเดือนก่อน +10

      Im moving our app to postgres because of the UUIDs alone. UUIDs in mysql is basically roll-your-own.

    • @SPeeSimon
      @SPeeSimon หลายเดือนก่อน +4

      Yes, they should have used varchar(36). Barely joking, because i actually saw a team use nvarchar(36) on an Oracle DB for their PK. Not really a shocker, because the NIH disorder was really high. So they did not use the same raw(16) type as everywhere else in the system and did not reuse existing code.

    • @Lexaire
      @Lexaire หลายเดือนก่อน +2

      @@SPeeSimon Codebase I work on uses varchar(40) and some uuids have hyphens and some don't. Thanks MySQL.

    • @johnbernardlambe8582
      @johnbernardlambe8582 26 วันที่ผ่านมา +1

      If someone wants hexadecimal, they could at least omit the hyphens, giving 32 characters. Even worse is the 38-character format, with added chain brackets.

  • @unknown.profile.handle
    @unknown.profile.handle 26 วันที่ผ่านมา +9

    btw it is an RFC, not just a proposal anymore.
    But there were many wrong assumptions in this video.

  • @loganyt8818
    @loganyt8818 หลายเดือนก่อน +162

    instead of PlanetScale sponsoring Theo, they should have used that money to keep reasonable free plan

    • @sexymeal2427
      @sexymeal2427 หลายเดือนก่อน

      lmao

    • @__Brandon__
      @__Brandon__ หลายเดือนก่อน

      Except now 100 thousand new people know about planet scale and probably 5% of those people are likely to use their service. 5k new users is a lot of users

    • @epajarjestys9981
      @epajarjestys9981 29 วันที่ผ่านมา +11

      @@__Brandon__ "probably 5%", lol. that's a completely ridiculous and extremely improbable number. it's just as probable as theo's sponsorship money being enough to finance the free plan.

  • @falkkyburz
    @falkkyburz หลายเดือนก่อน +12

    Why are we mixing "version" and "variant"?

  • @NicolasTroncoso
    @NicolasTroncoso หลายเดือนก่อน +26

    I believe there is some confusion about sortability.
    All primary keys are sortable, regardless of the value. They will fit in the B+ Tree pages with different packing sizes, and with the balancing issues described in the video.
    What some people want is that records are sorted by creation time. Which is what the CUID post talks about. Don't worry about the record being naturally sorted by creation time. Use a random key to avoid hot spots, and use a createAt (or reverse CreatedAt) index if you need to scan the table the naturally created order.

    • @rainmannoodles
      @rainmannoodles 27 วันที่ผ่านมา +1

      Exactly.
      Locating a given CUID is still efficient with the tree. Internally, the tree structure just won’t follow the creation time order. There is still a defined sort order, it just doesn’t sort based on criteria that a human would consider “useful”.

    • @PatrikTrefil
      @PatrikTrefil 26 วันที่ผ่านมา

      I am glad that someone cleared this up in the comments.
      There is still one thing that is unclear to me though. How is it any different if you use ids that are not sorted based on time creation and use an index for createdAt? The secondary index for createdAt is again a B+ Tree (or any other datastructure) which again causes hotspots. In this case the hotspots are just in a different index. It seems like the performance issues are still there. Could anyone explain if and how this solves the problem?

    • @pacifico4999
      @pacifico4999 26 วันที่ผ่านมา +2

      But what about page fragmentation?

    • @TricksterRad
      @TricksterRad 13 วันที่ผ่านมา

      CUID's rationale behind why you don't need to worry about performance is "your db is fast enough anyway".
      The examples of security issues given are quite literally examples of bad security practices that won't be solved by just using a random ID instead of a sequential one.
      Here are the security issues raised btw:
      Unauthorized PW reset via guessed ID
      Unauthorized access to private GitLab issues
      Unauthorized PW reset via guessed GUID
      In case 1 and 3, the solution isn't random IDs, but signed password reset tokens with an expiry.
      In case 2, the solution isn't random IDs, but proper access controls on resources.
      In all cases, this is not a database design issue - it's a security practices issue. Designing your way out of this by degrading your database performance is a solution, but I hope we can all agree it is not the optimal solution.

  • @benmeehan1968
    @benmeehan1968 หลายเดือนก่อน +31

    Curious why there is no evidence regarding performance, only vague handwaving that storage space is a factor. What this does show is that MySQL has an incrementing index type and an indexing mechanism that is optimised for incrementing indexes. My (naive) understanding was that with a B-Tree, if the index values were random, the tree will likely tend towards being balanced (at least over time), and that B-Trees don't perform well with sequential index values, hence the need for an optimisation. Sequential values don't scale horizontally, hence the need for a larger 'namespace'. Sequential 32 bit values and random 128 bit values are apples and oranges, and meet different requirements. Choose your index keys appropriately, if horizontal scalability is a likely to be a factor (which is often a guess when we don't know just how large or distributed a storage system is going to be). Clearly a UUID can't compete with a 32bit sequential when you have a single key generator, but a 32bit sequential can't even function when there are multiple key generators (without segmenting the address space). Horses for courses.

    • @workmad3
      @workmad3 หลายเดือนก่อน +15

      Yeah, the main thing I got from this was "MySQL sucks for UUID keys", because they've optimised their b-tree index page generation around sequential keys, and don't have the ability to change index type easily.
      Postgres, on the other hand, has a native uuid type that can automatically cast from strings, so you don't need to mess around with conversion functions, has a different optimization for b-trees that, while still has uuid impacts, isn't as severe from what I can tell (it's mostly a cache issue in postgres, as uuids are more likely to be spread over multiple pages, so if you've loaded a page into cache, it's not as likely to be hit again), and also gives the option of other index types that could be better depending on exact db use (like the hash index type, which only supports equality operations)

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน +4

      You don't need evidence. This is btree 101. Left inserts are slow. Go watch a video to find out why right-only append is faster.

    • @benmeehan1968
      @benmeehan1968 หลายเดือนก่อน +3

      @@7th_CAV_Trooper I absolutely agree, but there is no quantification of how big the effect is, and that might mean any argument for optimisation is not relevant to any particular use case.

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน +1

      @@benmeehan1968 worst case is every pk inserts at index zero and forces full right shift of the array each time. I think DB perf is quantitized in terms of O notation rather than ms.

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน +1

      @@benmeehan1968 good point about perf not relevant for every use case

  • @Kazyek
    @Kazyek หลายเดือนก่อน +32

    What I don't see mentioned here and should be mentioned is that your Database IDs, which you use for indexing, and referencing relationships, doesn't HAVE to be the same value you use to reference records from the API's PoV.
    You can just use an integer pirmary key, then add a binary UUID field with a unique key constraint, which you don't use for joining / referencing in the database but is used at api-surface-level to identify the records.

    • @TheCryn
      @TheCryn 26 วันที่ผ่านมา +2

      Yes but I don't see how this improves things. Consider your primary key being an autoincrement integer, that you never expose via api and an object id that is some sort of UUID that you do expose. If you expose an object via the object id (UUID), you probably want to have an index on the object id, as you now need to access the object via object id (UUID), as you don't know the primary key on an api level. So you are at the rebalancing problem again, this time "just" for the index and you probably don't win anything performance wise. If your dataset is so small that you don't index, then the performance hit for using a UUID PK probably won't matter.

    • @HenryLoenwind
      @HenryLoenwind 23 วันที่ผ่านมา +3

      @@TheCryn It improves things as you now only have one table with one index that has UUIDs. And this table will, in many cases, be relatively small, as the things you expose via ID often are short-lived (sessions, password reset tokens, email verification tokens, ...). That's way better than having a UUID primary key on each and every table---that could easily be hundreds of them.
      In any case, using a UUID as an external ID for a limited number of externally addressable objects is different from using it as the primary key for every table in the system.

    • @TheCryn
      @TheCryn 23 วันที่ผ่านมา +3

      @@HenryLoenwind I think most real world applications I have worked on had their PK for most tables exposed in one way or another (depending on where you draw the line on exposure).
      I have worked on a legacy system that dates back before the introduction of UUIDs there the original designers have put much effort into not using autoincrement PKs but to generate something time based (for sorting purposes) but also random (to prevent congestion on DB pages, if a lot of IDs/objects are added to the DB).
      You should also consider that a lot of IDs you expose (i.e. order IDs for a web shop) can give away insights you often don't want out there (see: German tank problem).
      So you probably still want most API IDs to not be predictable / autoincrement and need an ID-table for almost each table and an additional join for most queries (that can have a noticeable performance impact itself).

  • @dragons_advocate
    @dragons_advocate หลายเดือนก่อน +22

    "Planetscale is not very liked atm..." -- "Thank you planetscale for sponsoring this video"

  • @tabletuser123
    @tabletuser123 หลายเดือนก่อน +89

    “this video is sponsored by planetscale” is a good way to make sure we know the video is not credible

    • @xelspeth
      @xelspeth หลายเดือนก่อน +13

      I mean he's just reading a blogpost and giving his own opinions, I don't see how this video would be credible in the first place lol
      You're supposed to form your own oppinion anyway

  • @titbarros
    @titbarros หลายเดือนก่อน +32

    ULIDs resolve most, if not all of the problems mentioned. Is my go to now

  • @worldadmin9811
    @worldadmin9811 หลายเดือนก่อน +26

    RIP ULIDs got no mention even when shown in article

    • @JohnDoe-jc4xp
      @JohnDoe-jc4xp หลายเดือนก่อน

      Right ;(

    • @xelspeth
      @xelspeth หลายเดือนก่อน

      23:47 ULID mentioned

    • @QuentinDurot
      @QuentinDurot หลายเดือนก่อน +3

      We have to remember that he is a react dev and forgive him

  • @Techman83
    @Techman83 หลายเดือนก่อน +10

    This article appears to be entirely misunderstanding UUID as incremental versions, along with MySQL not handling them as well as something like PostgreSQL. UUID4 for guaranteed random, the rest have different, but valid use cases.

    • @iPuls3
      @iPuls3 17 วันที่ผ่านมา +1

      If used properly in MySQL, it's stored as binary.

  • @ckpioo
    @ckpioo หลายเดือนก่อน +126

    did he really just say storing 20x more is no big deal casually?, all computer engineers biting their nails rn

    • @Luxalpa
      @Luxalpa หลายเดือนก่อน +36

      The 20x figure is highly misleading, because it depends on what is multiplicated. As he said in the video, your records already contain a lot of data, in that case it wouldn't be 20x, more like 1.05x.

    • @ckpioo
      @ckpioo หลายเดือนก่อน +12

      ​@@Luxalpa its not misleading instead you're misunderstanding it, the 20x figure is taking into account just the primary key, and youre also considering the *useful* data stored with the primary key, which you shouldn't be doing because the point is that if you consider the keys as useless data (because its mostly not used by the end user in an app) then what the 20x figure says is that as your data scales up, a larger % of the data will be "useless" data compared to another database which has the same data but is using a incremental primary key. the whole point of the key is just for locating the useful data, so if you are using a system which requires 20x more space to do just that then its basically just a big waste of storage space AND compute.

    • @Sharess
      @Sharess หลายเดือนก่อน +6

      @@ckpioo you are correct, but overall, that 20+ x increase can be disregarded, unless you have an insane amount of data where reducing the storage and compute requirements would result in significant monetary savings

    • @DavidLindes
      @DavidLindes หลายเดือนก่อน +5

      @@Sharess and @Luxalpa: the thing is, 20x _storage_ (long-term) for just the primary keys may not be a big deal in terms of overall table size, but it _is_ (or at least can be) for the _index size_ -- which does indeed have compute implications, as Theo discusses. And, ya know, it's nice when indexes are actually fast -- I've seen complex queries go from something absurd like 30 minutes (memory hazy on this, so grain of salt, but a long long time) to I think sub-1-second just by indexing a column that should have been indexed in the first place. But if those indexes are 20x slower (which may not strictly be the case here, not sure, but still)... ??

    • @danhorus
      @danhorus หลายเดือนก่อน +5

      Keep in mind that these primary keys also appear as foreign keys in several other tables. It is not uncommon to have multi-million row transaction tables with dozens of foreign keys. This shit adds up.
      I'm also disappointed the video doesn't mention the performance loss in JOIN operations, which is also a big deal for applications with lots of users.

  • @Denominus
    @Denominus หลายเดือนก่อน +45

    UUIDv7 is suitable for all the relational dbs, except MSSQL which would require V8 and a custom generator.
    Our largest table is many billions of rows and UUIDv7 still performs well in Postgres.

    • @paulie-g
      @paulie-g หลายเดือนก่อน +1

      Out of curiosity, what stops you using int autoincrement as primary key and uuids as an extra field if you need them for some reason? Is it some sort of roll-your-own-sharding thing?

    • @Denominus
      @Denominus หลายเดือนก่อน +10

      @@paulie-g An earlier version of the system did actually have that, but it didn’t buy us much (if anything). It’s a big distributed system, so to say 99%+ of queries were done by the uuid (external_id as we called it) would be an underestimate.
      It also wrecks one of the benefits of using uuids as identifiers, that is you will need to do round trips to the DB on insert to get the ID for FKs.

    • @paulie-g
      @paulie-g หลายเดือนก่อน

      @@Denominus I always assumed people would use stored procedures or equivalent to avoid the round trip.

    • @uttula
      @uttula หลายเดือนก่อน +3

      @@paulie-gwhen you already have a (for most practical purposes) unique key you can use, adding a autoincrement field to use as your synthetic primary key serves zero purpose aside the fact that meatbag humans like sequential integers for some weird reason.

    • @paulie-g
      @paulie-g หลายเดือนก่อน

      @@uttula UUIDs are synthetic as well. Integer primary keys are an implementation detail. And they're an implementation detail for a reason that has nothing to do with human preference.

  • @MeriaDuck
    @MeriaDuck หลายเดือนก่อน +8

    As others also pointet out, they are not versions, they are _variants_ with purposes per variant.

  • @username7763
    @username7763 29 วันที่ผ่านมา +6

    UUIDs were not designed to be database primary keys. They were designed to solve problems with distributed networking, not data storage. There is no problem with UUIDs if used as designed. Kids these days see UUIDs and immediately think databases. Ha!

  • @disjustice
    @disjustice 29 วันที่ผ่านมา +6

    I generally use an integer sequence as the primary key and have an associated UUID in a unique index. The primary key is only ever used internally in the DB model and in application code. When exposing the entity in public references, the UUID is used. In addition to the issues touched on in the article, this saves us from another drawback: having to use that same 128bit UUID in foreign keys all over the DB. This goes for any globally unique identifier scheme. Because these unique identifiers need to be globally unique, they are generally so long that they are unwieldy to use as primary keys, even if they are sortable.

  • @debasishraychawdhuri
    @debasishraychawdhuri หลายเดือนก่อน +62

    MySQL does care about a distributed database, so they are interested mostly in the performance of a single B-Tree. Imagine how hard it would be to create an auto-incrementing ID in a distributed insertion. If you use an auto-incrementing ID, you can only insert in sequence. This is why people use UUID. Theo needs to learn about databases. Everyone writing enterprise software needs to learn about databases.

    • @scragar
      @scragar หลายเดือนก่อน +4

      Percona solved distributed ID insertions by having each master get a sequence, then each master only inserts IDs matching it's sequence mod the number of masters.
      So if you have 3 master servers you'll have one only inserting 1,4,7,10; another only inserting 2,5,8,11; and a third inserting 3,6,9,12; this ensures they never conflict on the ID.
      This does however mean the auto increment doesn't retain consistency between servers.

    • @workmad3
      @workmad3 หลายเดือนก่อน +7

      @@scragar it also means that if you need to add a new master into the mix, you have more work to do adjusting every node, especially if you're using the id%n to identify the master for some purpose as well

    • @keithjohnson6510
      @keithjohnson6510 หลายเดือนก่อน

      @@workmad3 A variant of @scragar option you just use a compound primary key of RecordID(seq), ClusterID, . ClusterID could even be a single byte giving you 256 nodes, U32 for the RecordID, then every node can have 4294967295 records. Guaranteed no collision with 5 bytes. Synchronising clusters is also very simple. Note also because is a sequence, RecordID, ClusterID, performance is also a added bonus as B-Tree's balance better than UUID.

    • @Novascrub
      @Novascrub หลายเดือนก่อน +4

      @@scragar We used to do something like this on the client side. Each client (web sever) would claim a block of ids by inserting a claim into an ids table (i claim id's 90-99), and then assign them client side. Combed uuids are just better. We knew about the page splits problem and solved it in 2003.

    • @paulie-g
      @paulie-g หลายเดือนก่อน +2

      You need to learn a bit more than you know to get you to the level you think you are at. Lamport timestamps or equivalent are necessary in a distributed system anyway for serializability and you get your unique, monotonically increasing id for free.

  • @fifty6737
    @fifty6737 หลายเดือนก่อน +7

    UUIDs event account for Time Travel but only as back as 1568, apparently, in the 1567 they were using Integers

  • @Viviko
    @Viviko หลายเดือนก่อน +15

    Why do we need to sort by ID? Why not sort through other fields, like a user’s name, creation date, etc…?
    Point of the ID is… as the name implies, identify some data. We don’t need to sort it using that field.

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน +3

      Because the clustered index is sorted by the PK.

    • @itsbazyli
      @itsbazyli หลายเดือนก่อน +5

      As explained in the video - it helps to improve performance of retrieving a single record by ID. It's not related to your application logic's sorting, just to how databases access records by ID.

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน +1

      Sorted data makes it possible to perform a binary search. The ID isn't the offset into the data file. So even a point query requires a search.

    • @everton000rsc
      @everton000rsc 19 วันที่ผ่านมา +1

      When the database gets to huge sizes any index created is costly. I have a use case in my company where the ordering by ids is used, if we had to order by createdAt we'd need an index in that column, and guess what would be the size of that index in disk in our DB? Almost 1TB (and it takes weeks to create)... The primary key is almost always in memory (cached), almost always the first and fastest to load, so using it for ordering is much less expensive compared to creating another index just for that. But I understand that use cases vary, depends on what kind of system you're working with and it's business rules and requirements.

  • @TFayas
    @TFayas 19 วันที่ผ่านมา +2

    Solution I've always used is have an incrementing int id primary key and a guid for external reference (and in some cases a unique human reasonable code name).
    Id for performance and internal joining, code name if you need to reference it in code, and guid for matching across environments in staging situations where you push items from one environment to another.

  • @Chamassa1210
    @Chamassa1210 หลายเดือนก่อน +4

    The bad part about the size comparison, is that where it matters you are probably using bigint already, making binary(16) only twice as big as bigint, without collisions

  • @yuwei2522
    @yuwei2522 29 วันที่ผ่านมา +10

    Sounds more like a problem with MySQL. Some non-relational databases would actually benefit from the randomness of UUIDs.

    • @jongeduard
      @jongeduard 23 วันที่ผ่านมา

      Very true, I don't remember all the details, but for example SQL Server has solved this problem because of the way it sorts and stores the rows in a sorted order, while it generates the ids based on timestamps at the same time.
      Although it's still important how you configure things exactly, there are certain details which need to be done right.

    • @everton000rsc
      @everton000rsc 19 วันที่ผ่านมา

      Unfortunately not all systems can/should/are built with non-relational DBs, it has some really bad drawbacks for some use cases

    • @amadensor
      @amadensor 7 วันที่ผ่านมา

      Most relational databases also benefit from randomness. It keeps the trees balanced without having to do splits and tree balancing. You don't need to leave a gap for every potential value.

  • @ericf6964
    @ericf6964 21 วันที่ผ่านมา +3

    Another problem with UUIDs is logging. First, logs almost always store UUIDs as text, which as you pointed out is very inefficient at 38 bytes per. Second, since they are designed to be unique, they tend to be relatively uncompressable compared to normal text. The time based UUIDs may be less bad in that respect. But the bottom line is that if you have a requirement to log your identifiers, then using UUIDs will practically guarantee huge storage costs for those logs.

    • @everton000rsc
      @everton000rsc 19 วันที่ผ่านมา +1

      Good point, at massive scale any extra byte matters

    • @someed3702
      @someed3702 18 วันที่ผ่านมา

      @@everton000rsc If we're talking extra bytes on a record, not really. When we're talking about extra bytes on every index record, Absolutely.
      It's easy to think that the index files are storing one copy of the primary key per record in the database. If your index is just for equality matches, that's even correct. But if your index handles sub string searches or most kinds of approximate match that's very much not the case. Note that I'm not just talking about the index for the primary key, which I would expect to be an equality index. I'm talking about any other columns on that record, so some of them could reasonably be indexed for sub string searches.

  • @user-qq7yc1qp8z
    @user-qq7yc1qp8z หลายเดือนก่อน +56

    Why even care about planetscale? For production, you have AWS RDS, for small projects you have Supabase.

    • @MrDadidou
      @MrDadidou หลายเดือนก่อน +7

      how does it invalid the whole discussion around uuids ?
      It's like having a guy screenshare his IDE when talking about a complex programing question and your only input is "dude why even use the gruvbox theme in your IDE? you have Catpuccin and DoomOne available."

    • @carlosmspk
      @carlosmspk หลายเดือนก่อน +6

      @@MrDadidou I like that you tried to show how silly his comment was, but ended up providing a valid comment for a video. It's perfectly valid for you to comment on whatever shows up in the video. He could even just go "Damn Theo, your hair is looking mighty fine today" and it would be a valid comment.

    • @jeffreyjdesir
      @jeffreyjdesir หลายเดือนก่อน +1

      @@MrDadidou I just wish the discussion wasn't based on faulty premises (like UUID variants serve to succeed their formers) cause it suggests biases I'm sure we're all aware of

    • @everton000rsc
      @everton000rsc 19 วันที่ผ่านมา

      RDS gets really expensive when you get to the point you need horizontal scaling. Even planetscale is too expensive, my company choose to deploy our own vitess cluster manually to be able to save almost 2/3 in database costs. I get it that people don't like that planetscale removed the free plans but after I suffered so much in the migration from mysql/mariadb to vitess I understand the real value of what they are delivering. And they're almost the only maintainers of vitess atm so i hope they don't go out of business anytime soon...

    • @raphaelmorgan2307
      @raphaelmorgan2307 17 วันที่ผ่านมา

      "why even care about planetscale?" because they're paying him money, obviously! (ty for posting other options)

  • @paulurban2
    @paulurban2 หลายเดือนก่อน +3

    Having sequential IDs create contention hotspots in the B-Tree. No good for horizontal scalability or vertical scalability.

  • @XCanG
    @XCanG หลายเดือนก่อน +6

    I used snowflakes a lot and like them slightly more, than other type of identifiers, especially considering that each of them have datetime baked in.

    • @2penry2
      @2penry2 หลายเดือนก่อน

      Yeah snowflake IDs are nice, I like being able to sort via time on them

  • @johnredberg
    @johnredberg หลายเดือนก่อน +4

    So... "The Problem with InnoDB".

  • @DonAlcohol
    @DonAlcohol หลายเดือนก่อน

    also for a start date/time to begin counting i have a good one :P , theres this relly weird but consistent thing the eath does , its like a wobble, and the whole thing lasts about 26000 years (25800) but as a consequence of this the sun rises in a different constellation on the spring equinox every 2160 years... so if we pick the beginning of one of those as the starting point to begin counting time wouldnt that be a great idea ? coincedentally the last time we shifted (from Ram to Pisces (its reverse form the horoscope) , its at the same time we shift from BC to AD so 1 jan of the year 1 seems like a nice spot to start counting time not?

  • @thomaseichinger1717
    @thomaseichinger1717 หลายเดือนก่อน

    What do you think about to use UUIDvTimebased for Businessobjects and snowflakes for provessingIdentifyers what are also not lived "long".
    integer/bigint/.. are amazingly fast because of the processor structure and also because of the storage structure.

  • @michaelutech4786
    @michaelutech4786 หลายเดือนก่อน +4

    You can always use UUIDs (or any stable data whatsoever) to compare them orderly. The problem with non-ordered ID's that that when inserting data, they are not growing according to that order. Sequential IDs do that, the next ID will always be greater than the previous one. When inserting random IDs, they tend to fill pages better (it's random on which side of a node the tree grows). Not all indexes are necessarily trees.
    The problem with such statements is that there is no A is better than B in all generality. You have to actually look at what your systems do with your data.
    If you need distributed keys, e.g. keys where there are multiple authorities (instances) that generate synthetic keys (keys that are not data driven), then UUIDs and other mechanisms make the handling of these authorities easy. That has a price. If you don't want to pay this price, you need to pay another one. Maybe you need to use data storage that works efficiently with your keys. Maybe you organize your key-namespaces or scopes manually such that you can use the most efficient local storage and key management and save runtime performance but then have to invest in deployment configuration.
    Any attempt to find a silver bullet can at best work for were wolfs and vampires but will fail for mosquitos or blue whales.
    These generalizations and over-simplifications are just stupid. UUIDs are perfectly fine and battle proven data types for certain applications. When used in the wrong context or in the wrong way (version, ...), they fail, just like any other mechanism does, 32 bit numbers or 640K were proven not to always be enough, despite timely expectations.

  • @mmathijs
    @mmathijs หลายเดือนก่อน +7

    Who needs standards, its all about the vibes. UUID v8 is by far the best

    • @Zeragamba
      @Zeragamba หลายเดือนก่อน

      variant 8 still uses the same formatting of the 128bits so it's compatible with any system that can store UUIDs, but leaves the specific way to fill those bits up to the user/system

  • @paulmoore3755
    @paulmoore3755 27 วันที่ผ่านมา +2

    Some reference to the collisions that included the context around them would be appreciated

  • @d3fau1thmph
    @d3fau1thmph หลายเดือนก่อน

    I use internal sequence as PKEY and UUID as "public id". PK is never used in the code or visible in any API, it is only used on the database level for relational stuff.
    That aside, are UUIDs equally problematic in PostgreSQL?

  • @TizzyT455
    @TizzyT455 หลายเดือนก่อน +3

    Cuid2 sounds unnecessary. Its just a seeded sha3 from my reading. What is wrong with just combining a sequential counter and any secure prng? What I tend to use is an N-bit sequential part and an X-bit random number (usually 64/448). This allows to generate sequential numbers up to N, while having collision resistance of X). If you really wanted to, you could just use SHA3 to generate the X bits yourself if you want the same randomness as cuid2. Of course adjust as needed.

  • @peppybocan
    @peppybocan หลายเดือนก่อน +4

    I haven't heard of CUID, but I have heard of the ULID. I don't need the 128bit of random data. I am fine with the 64-80 bits of random data. It doesn't even need to be cryptographically secure, to be honest.

  • @MrL22
    @MrL22 22 วันที่ผ่านมา

    Can someone explain what is wrong with storing number based id's but then using an encrypted version of that id for anywhere visible by the user? I have been using red-explosion/laravel-sqids

  • @MikkoRantalainen
    @MikkoRantalainen 29 วันที่ผ่านมา +2

    15:20 I don't understand this argument. If you're happy to store 32 bit integer in binary form, you should be happy to store the 128 bit UUID in binary form, too. The database can convert from binary value to human readable value on the fly, just the way it happens for binary 32 bit integers, too!
    With the same logic you could argue that 128 bit binary UUID is actually better format than 32 bit integer stored as base-10 number as a string.
    That said, if your database doesn't support UUID column type, maybe you should switch database.

  • @sd_pjwal
    @sd_pjwal หลายเดือนก่อน +15

    My first startup I worked for, the main DB admin insisted on using primary keys made up of real user data or composite keys made up of real information. It was so insane. Social Security # as primary ID for a user? This was a benefits enrollment company. Shocker, some of our customers had illegals that shared SS #'s.

    • @User_2
      @User_2 หลายเดือนก่อน +3

      > Illegal argument exception 🤭

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน +4

      SSN isn't unique by law and person can get a new one. They are not identifiers. Don't use natural keys.

    • @Spiker985Studios
      @Spiker985Studios หลายเดือนก่อน

      @@7th_CAV_Trooper SSNs + Birth Date is unique, and it's why they are associated together. You will never see an SSN field without a DOB field, unless they're looking up your DOB by some other means

    • @7th_CAV_Trooper
      @7th_CAV_Trooper หลายเดือนก่อน

      @@Spiker985Studios yes, but OP suggested using SSN alone as PK, unless I misunderstood.

    • @RickYorgason
      @RickYorgason หลายเดือนก่อน

      In Canada, I believe it's illegal to use social security numbers as database keys.

  • @Mike-zr9wq
    @Mike-zr9wq หลายเดือนก่อน +3

    You'd have to be crazy to pay PlanetScale today

  • @Danny1986il
    @Danny1986il หลายเดือนก่อน +2

    I think in UUID the v stands for variant, not version

  • @TapetBart
    @TapetBart หลายเดือนก่อน

    I work a mot with time series data from measuring devices, and my biggest problem with UUIDs (each device has an uuid as a PK) is that you can basically not compress them.
    You can compress a 8 byte int by only storing the difference between each int.

  • @klaudyw3
    @klaudyw3 หลายเดือนก่อน +12

    Is the versioning done by the USB forum? It sure looks like it might be.

    • @KangoV
      @KangoV หลายเดือนก่อน

      I think you've misunderstood what version means (it's actually the type).

    • @Spiker985Studios
      @Spiker985Studios หลายเดือนก่อน +5

      @@KangoV I think that you missed the fact that the USB spec is horrendous, and that if using the version marker on UUID as an actual version, and not a variant - they are comparable to how horrible they are to follow

  • @ed.puckett
    @ed.puckett หลายเดือนก่อน +3

    I avoid sequential ids because they make it too easy to accidentally reference an incorrect record. With uuids, there is no chance of this.

  • @rafaelmesaglio1399
    @rafaelmesaglio1399 26 วันที่ผ่านมา

    So how do NanoIds (and other alternate ids) solve the sortability/paging issue?
    Not to mention those likely won't be natively handled by the database and language of choice, i don't think it's a great idea to go dependency heavy either

  • @MonsterWriter
    @MonsterWriter หลายเดือนก่อน +1

    As I understand, the database does not need to rebalance the B-Tree very often if the indexed value is sequential (the id is sortable). If the ID is random, the tree deforms much faster.

    • @BruceWilsonTMB
      @BruceWilsonTMB 17 วันที่ผ่านมา

      That's backwards. If the indexed value is sequential, then all the new rows will be added at the end, and the tail of the index will get very heavy. The existing part of the tree will be fast, the newly added records will be slower to access until it rebalances. With a random ID, the values are distributed equally across the index, and the index stays mostly in balance for a long time.
      He mentioned page splitting. Page splitting happens with both sequential and random IDs, it's just a question of where (a lot in one place, or a little bit everywhere). He mentioned the worst case of 50% full pages, but failed to mention that most databases tend toward 75% full over time.

  • @dfs-comedy
    @dfs-comedy หลายเดือนก่อน +6

    Using MySQL is the first mistake.
    FWIW, PostgreSQL has a native UUID type that occupies 16 byes of space. But yeah... I don't like UUIDs as primary keys.

    • @nicolaicornelis2853
      @nicolaicornelis2853 หลายเดือนก่อน +1

      binary(16) is the same as a native UUID type though (storage wise).

  • @dahahaka
    @dahahaka หลายเดือนก่อน +3

    Is indexes nowadays an accepted form of indices? When did that happen 😭
    Edit: typo

    • @ceigey-au
      @ceigey-au หลายเดือนก่อน +5

      Well, it's been more acceptable than spelling "accepted" as "excepted" for a while, but both are understood colloquially _irregardless_ ;-) (sorry, couldn't let those opportunities pass by...)

    • @dahahaka
      @dahahaka หลายเดือนก่อน

      @@ceigey-au omfg how did that happen, well that's embarrassing XD

    • @dahahaka
      @dahahaka หลายเดือนก่อน +1

      @@ceigey-au no worries, I would've done the same 💀

  • @winchesterdev
    @winchesterdev หลายเดือนก่อน +2

    UUIDv7 can have counters which makes them sequential as well.
    The UUIDv7 that could be shipped with Postgres 17 has a randomly-seeded counter.

  • @nicolaicornelis2853
    @nicolaicornelis2853 หลายเดือนก่อน +2

    Nobody should use UUIDs as "non-guessable identifiers", so what the cuid(1) post is saying is weird - they're not meant to be cryptographically secure, they're meant to identify rows. If you want a secure key, generate a separate random column and put an index on that, using a truly random value. It's two entirely different problems.

    • @MurtagBY
      @MurtagBY 19 วันที่ผ่านมา

      They are sort of secure - people inspecting your business can not see how many users have registered, actions made, bills paid

  • @DominikZogg
    @DominikZogg หลายเดือนก่อน +21

    Can we all agree that mysql is more of a pain in the ass than a decent database? So it's not a relevant metric for any solution.

    • @aldi_nh
      @aldi_nh หลายเดือนก่อน +2

      no

    • @DominikZogg
      @DominikZogg หลายเดือนก่อน +3

      @@aldi_nh if you burned your fingers often enough you'll change your mind. Postgres besides its ugly user management is superior. Or switch to non relational database cause most uses cases are better represented by documents anyway.

    • @aldi_nh
      @aldi_nh หลายเดือนก่อน +3

      @@DominikZogg oh please, i beg to differ, mongodb are even harder to mantain. i dont know where you're coming from but from my experience, very little use cases are suitable with documents

    • @DominikZogg
      @DominikZogg หลายเดือนก่อน

      @@aldi_nh web applications, websites

    • @byuwur
      @byuwur หลายเดือนก่อน +4

      My dude doesn't know how a relationship works.

  • @TheOmfg02
    @TheOmfg02 หลายเดือนก่อน +25

    5:00 complains about 1568 as an implementation detail… proceeds to use 1970 instead.

    • @linminsu3443
      @linminsu3443 หลายเดือนก่อน +5

      1970 is the unix epoch, which is already used everywhere

    • @mzg147
      @mzg147 หลายเดือนก่อน +1

      complains that years 1568-1990 will never be used, therefore it's wasteful

    • @TheOmfg02
      @TheOmfg02 หลายเดือนก่อน

      @@linminsu3443 i think you missed the point of my comment 😅

    • @KangoV
      @KangoV หลายเดือนก่อน

      Anyone who's done anything with dates should know what the Gregorian calendar is and when it started. I knew it straight away :) Try converting dates to other calendars like Chinese and JIS and you'll see how important this is!

    • @gbjbaanb
      @gbjbaanb หลายเดือนก่อน +3

      UNIX epoch time is as arbitrary a standard as Gregorian or any other time standard. You just have to pick a start point and a storage size. UNIX time isn't necessarily the best, as we will find out in 2038.

  • @GunarBastos
    @GunarBastos 26 วันที่ผ่านมา

    Love these kind of videos, help me to shift the way I see technology and it's uses

  • @ConverseMidas
    @ConverseMidas 28 วันที่ผ่านมา

    I remember reading some back and forth about this, but, for most projects --- does it make much of a tangible difference, performance wise? I use Postgres when I can so perhaps UUIDs aren't as clunky as it appears they are in MySQL. Also, I read that non-incrementing PKs make your DB info a little less obvious to prying eyes, though this is probably not important for blog posts. Also, a UUID blog post URL would be pretty hideous I guess. I've never heard of snowflakes or ULIDs so grateful for the new (to me) info!

  • @ziv132
    @ziv132 หลายเดือนก่อน +4

    I personally like ULID's, not just because they are incremental, but also the fact that I can just swap to using them in the same BINARY(16) column that was used for UUIDs (probably not the best idea to switch and have mixed content, but eh)

  • @robertluong3024
    @robertluong3024 หลายเดือนก่อน +5

    4am, waiting for my medical procedure to start and I'm spending it learning about UUIDs.

  • @rirajojo
    @rirajojo 21 วันที่ผ่านมา

    I'm not sure about the rebalancing impact from UUID 4, because it is random, shouldn't that keep the database from having to (full) rebalance a lot? Because new values should be evenly balanced in the tree. While auto incrementation is always on the top of the tree. However I have not tested this.

  • @ShinigamiZone
    @ShinigamiZone หลายเดือนก่อน +1

    UUID v4 do not only have the "4" statically at index 12, but also there is a "8, 9, a or b" at index 16 (starting index by 0)
    So its /^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i

  • @SatoshiCommentorto
    @SatoshiCommentorto 28 วันที่ผ่านมา +2

    click baited into a blog screenreader again

  • @trapfethen
    @trapfethen หลายเดือนก่อน +3

    Couple Things. UUIDS can be stored as a regular column with an index for fast lookup while using an integer for the primary key. The UUID is used for externally representing the resource, while the integers are used internally for performance (it is more resource intensive to do a 128-bit compare than a 32 or 64 bit compare on most CPUs and that is if you use the binary format). This approach even allows for distributed systems to coordinate via message passing, only using the UUIDs instead of the internal integers which are treated as an implementation detail. This approach is less attractive than just storing the UUIDs as the primary key which makes coordination and replication trivially simple by comparison. It is an option worth exploring based on your needs.
    The second thing is that the rebalancing problem introduced by random UUIDs is more severe than communicated here. While it is true that random UUIDs can cause databases to not utilize a larger portion of page sizes, the real performance hit is the stochastic nature of B-tree rebalances. B-trees are a compromise between the theoretically optimal B-tree and the reality of modern computers where data retrieval is severely slower than multiple comparisons. B-trees lean into splitting up the tree more aggressively at each level and requiring more value comparisons over-all in order to reduce the number of costly data-fetches per look-up. This causes each rebalance to be more expensive than in a comparably unbalanced binary search tree and require touching more nodes than in a BST. In particular, unlike in a BST. b-tree balance procedures requires frequent changing of the node's content rather than just changing the links between nodes. Coupled with the ripple effect that can occur where b-trees might have to traverse all the way back to the root node in order to rebalance, the performance hit is worse than it first seems and is slightly random itself. When b-tree insertions are roughly ordered, these copies and extra comparisons are more akin to amortized costs like we see with auto-scaling Vectors. When the insertions are random, the frequency of multi-level rebalancing operations increases. If you support key removals, even more so.

    • @nouveu1
      @nouveu1 หลายเดือนก่อน

      But primary key is also an (unique) index and all indexed columns is what causes pressure when writing stuff. More indices means bigger penalty hence having both sequential id and uuid doesn't help and only make things worse. Not to mention that under hood index probably don't store whole record but pair uuid+id hence you have to make more lookups to get whole record.

  • @RTomassi
    @RTomassi 12 วันที่ผ่านมา

    Thanks, this has been very educational for me. I will admit my OCD kicked in when, at 11:00, you fixed all the arrows except for the green one. xD

  • @SpykePorcupine
    @SpykePorcupine หลายเดือนก่อน

    What will they do when they run out of versions?

  • @sturdyfool103
    @sturdyfool103 หลายเดือนก่อน +4

    I always love the large discussion around databases, because they’re nearly useless to me, as in, in fifteen years I’ve never once used a database, and only three of the thousands of my applications would’ve been an ok use of a database, with the rest just needing a config file

  • @Hypn0s2
    @Hypn0s2 หลายเดือนก่อน +24

    Jebus. I couldn't last the full video. I don't know how you followed this all.
    I need to wash my brain now with something else and pretend this isn't real.

    • @diamondkingdiamond6289
      @diamondkingdiamond6289 หลายเดือนก่อน

      It isn’t those aren’t even the main reasons of why you shouldn’t use uuid in MySQL. Using uuids in MySQL is bad for performance in MySQL but it’s fine in postgresql

    • @KangoV
      @KangoV หลายเดือนก่อน +6

      Perfectly easy to understand. Versions in this instance are actually Variants. Each variant does not supersede the last.

    • @denisblack9897
      @denisblack9897 หลายเดือนก่อน +1

      Like in Loki?😅

  • @ustav_o
    @ustav_o หลายเดือนก่อน

    ive been first using uuid's, cuid's and then nanoid for quite a long time. really nice video because i didnt know how things really worked.

  • @JacobP81
    @JacobP81 16 วันที่ผ่านมา +1

    Sounds like the UUIDv7 would make the most sense if you are using a UUID as a primary key because it combines the date/time with a random number, which to me actually sounds perfect if you need a primary key and just an incrementing value won't do. I also really like that it uses the UNIX Epoch for the time.

  • @fgregerfeaxcwfeffece
    @fgregerfeaxcwfeffece หลายเดือนก่อน +4

    The guy who came up with the 1568 wins a paddlin. That's a paddlin well deserved if I ever seen that. Unixtime was already the established standard and this new one does nothing better.

    • @guguludugulu
      @guguludugulu 25 วันที่ผ่านมา +2

      First UUIDs were used in 1980 and were counting time since 1980-01-01 and then were standardized around 2005 to use a gregorian calendar.
      While you could argue that unixtime was an established standard in 2005 i don't think it was "the" established standard back then since the programming landscape was very different and most of the programmers would use windows (well, arguably they still are). Windows times starts at 1601 btw.
      Either way that's a non-issue. If you have at least enough of a brain to carefully read the RFC and know how to count you won't have any problem implementing UUID.

    • @fgregerfeaxcwfeffece
      @fgregerfeaxcwfeffece 17 วันที่ผ่านมา

      @@guguludugulu Fair point.

  • @MichaelKire
    @MichaelKire หลายเดือนก่อน +6

    Regarding the sortability being a potential security: Security by obscurity is not actual security.

    • @jwueller
      @jwueller หลายเดือนก่อน +3

      Technically it is if the obscurity is big enough. All of cryptography is technically just very hard to guess, not impossible.

    • @itsbazyli
      @itsbazyli หลายเดือนก่อน +4

      There are a number of features where obscurity is the default mode of security. Think about situations like "share via link". You can do that to an unlisted TH-cam video. The only thing you need to know to see the contents is the link itself. If TH-cam's video ID was sequential, or guessable even just in part, you would be able to enumerate videos one-by-one, and discover every unlisted video on the platform. If the video ID is fully random though, your chance of "stumbling upon" an unlisted video is significantly lower. Not zero, but combined with rate limiting and other anti-bot security measures, becomes a pretty decent barrier from a bad actor doing just that.

    • @ivanfilhoz
      @ivanfilhoz หลายเดือนก่อน

      This is a good motto for source code or other implementation details, but it shouldn't be taken generally. Exposing less information is part of security. The whole point of secrets, like keys, tokens or passwords, is to keep them safe. But they're not the only data that should be hidden, are they?
      Your application must only expose information it is supposed to, and if the sorting is guessable by your IDs, you are leaking data. Timestamps are not always sensitive data, but they often are, such as in bank transactions or medical history.

  • @NickMoline
    @NickMoline 28 วันที่ผ่านมา

    I'm curious your opinion on ULIDs which were mentioned only once in the article (at the end as one of the other options) and you just glossed over in the video. The string version of ULIDs is only 26 characters as opposed to the 36 of UUIDs, stored with the first 10 characters as the UNIX timestamp including milliseconds encoded as base32 (using the digits and most of the upper case alphabet excluding I, L, O, and U) which works for the next 8,860+ years without running out, and then 16 characters of randomness also stored base 32, which allows up to 1.21 * 10^24 (2^80) ULIDs created within the same millisecond

  • @TerranVisitor
    @TerranVisitor หลายเดือนก่อน +1

    @Theo : Funny. About 4 Decades ago I designed a UID that incorporated brute force and accident aspects - such as mentioned herein - for incorporation into (truck) weigh scale software. The software which utilized this UID was smaller than 1.44MByte, including graphics, credit card processing, remote access and update, and recording of transactions. The software was stored on an EEPROMish type device which emulated, as you MAY have guessed, a floppy disk. For those curious, the OS was (a form of) DOS and it was programmed in BASIC and assembly.

  • @hansdietrich1496
    @hansdietrich1496 หลายเดือนก่อน +3

    And all that hazzle, just because incrementing an integer is too boring for some people ...

    • @nnnik3595
      @nnnik3595 หลายเดือนก่อน +1

      Seems like you do not understand basic programming so let me explain:
      Certain systems become so large and distributed it is difficult and slow to get definite answers.
      In those systems it is difficult to just "increment an integer".
      These systems are also fairly common due to the scale of computing we use in every day lives.
      As such we just generate an ID we know will be unique for all of our use cases.

    • @Luxalpa
      @Luxalpa หลายเดือนก่อน +1

      Incrementing requires a mutation, so it requires one to wait for an authority to give out those IDs. That's a nightmare in distributed systems.

    • @hansdietrich1496
      @hansdietrich1496 หลายเดือนก่อน

      @@Luxalpa No. You just need to give the initial value to the one creating its IDs (and only check for collision on insert). And for all creating authorities, you have their own IDs. which become part of the ID, e.g. 1_000_000_000_000 * CreatorID + generatedID. All easier than that UUID stuff.

    • @KangoV
      @KangoV หลายเดือนก่อน

      @@hansdietrich1496 We have between 20-100 nodes that are created/destroyed based upon load. These are split across multiple availability zones and regions and are stateless. Now, how would you handle the creation of IDs across that scenario? We're talking trillions (dns queries).

    • @hansdietrich1496
      @hansdietrich1496 หลายเดือนก่อน

      @@KangoV Not that complicated. Each of your upcoming nodes has a unique ID. 20-100, so one byte would be enough for that. Be generous and reserve two bytes. So each of your 2^16 max parallel hosts has its unique ID. Then you have two bytes of host-id and six bytes of whatever-id, and you got your IDs sorted within a single long int. If a single host runs out of IDs, just give him a new one of the probably superflous host IDs. And if in doubt, take some int128 and split it up accordingly.

  • @DominikZogg
    @DominikZogg หลายเดือนก่อน +3

    Autoincrement (number) cannot be used by developers who care that an entity should have an id before being persisted, so for me any argument pro them is irrelevant.

    • @gbjbaanb
      @gbjbaanb หลายเดือนก่อน

      You pre-allocate the id in such cases. Most DBs will let you do that.
      But frankly you're probably doing it wrong if you want an entity to be tracked by the DB before you've even added it to the DB.

  • @zebraforceone
    @zebraforceone หลายเดือนก่อน +2

    Of course they hurt performance, they will continue to do so until 128-bit architecture is a thing.
    They're not supposed to match integers in terms of performance.

  • @akak5128
    @akak5128 หลายเดือนก่อน +1

    hey you should have posted the other videos! we love your takes

  • @tinymouse2
    @tinymouse2 หลายเดือนก่อน +9

    I just use v4 and write my code with the idea of "can this collide? Oh it can? I'll check before generating it"
    Adding a read before generating and a write is a worthwhile step to ensure I don't run into a collision

    • @NubeBuster
      @NubeBuster หลายเดือนก่อน +11

      You're writing a check for UUID4 collision?
      If you gave every person on the world 500 million uuids, there would be only 50% chance that there is 1 collision.

    • @tinymouse2
      @tinymouse2 หลายเดือนก่อน +5

      ​@@NubeBuster What can I say, I'm generous with my UUID's.
      It's seriously overkill I know, but in the systems I've helped implement, the result of a collision could mean personal user data gets sent to the wrong user and that's not acceptable to me.

    • @jameslouizbuilds
      @jameslouizbuilds หลายเดือนก่อน +2

      Realistically how likely are you going to get a collision? Are you operating at a huge scale?

    • @tinymouse2
      @tinymouse2 หลายเดือนก่อน

      @@jameslouizbuilds I added some logging for when we have a collision, and so far we've had none.
      The scale for the biggest thing I worked on with this is still relatively small, but we're generating maybe a few hundred UUID's a day, and doing a read that many times a day is a pretty small overhead for the absolute guarantee that we don't pipe sensitive information to the wrong party.
      If it was for something where security didn't matter I probably wouldn't have added the collision check honestly.

    • @NubeBuster
      @NubeBuster หลายเดือนก่อน

      @@tinymouse2 tell the global news outlets when you get a collision. This will be world news.

  • @NeoChromer
    @NeoChromer หลายเดือนก่อน +5

    imagine using mysql in 2024 holy..

  • @mikekent8148
    @mikekent8148 หลายเดือนก่อน

    I'd be interested to hear your take on how surrealDB handles ids

  • @speedytruck
    @speedytruck 18 วันที่ผ่านมา +1

    Regardless of the video's contents, what Firefox extension did you use for the tabs?

  • @coolemur976
    @coolemur976 หลายเดือนก่อน +5

    "The calculated probability of at least one collision occurring when generating 1 billion UUIDv4 identifiers is effectively 0. "
    Are collisions even real ?
    Ok, so you might get an error when creating few billions of records, so what ? Try, catch, retry.
    And about performance - if your UUIDv4 is too slow for your billions of records, perhaps it's time to archive the data ?
    I mean everyone was using UUIDv4 for a long time. And suddently it's a problem ? How so ? Or is it just someone wants to push their own agenda while introducing their own type of IDs?

    • @IceTank
      @IceTank หลายเดือนก่อน

      Yeah, Wikipedia says you need to generate one billion v4 UUIDs per second for 86 years to have a 50% chance of collision.

  • @notangelmario
    @notangelmario หลายเดือนก่อน +3

    classic theo use someone else's work when you don't want to do it. man i love your work but you got to stop being so obvious about it

    • @philadams9254
      @philadams9254 หลายเดือนก่อน

      Do you mean because he copied Josh tried coding's video? Or something else?

    • @notangelmario
      @notangelmario หลายเดือนก่อน

      @@philadams9254 well i didn't know about that. but yeah sure, that and most of the videos. i am quite sad that these days i just click on the video and look up the article he is reading or the video he is watching and just do that instead of watching him

  • @JimCraddock
    @JimCraddock 22 วันที่ผ่านมา +1

    35 years as a data architect. Inevitably, I run into uuid's being used by people that do not understand databases. Use an incrementing integer for your clustered index, preferably make it your primary key. The primary key is included in every index on a table, so if you are using the uuid as the primary key you just increased the size of every single index by a lot compared to an integer.

    • @everton000rsc
      @everton000rsc 19 วันที่ผ่านมา

      Yes, I initially believed that UUIDs are excellent and didn't care much about what ppl said about it's dawbacks, but after working on backend doing some query optimizations and understanding better how relational databases work I now understand how naive I was

  • @RobertShane
    @RobertShane 19 วันที่ผ่านมา

    What character is FF7 and 10 overlap???

  • @lennysmileyface
    @lennysmileyface หลายเดือนก่อน +1

    What characters overlap from FF7 and FF10? The names?

    • @Funcijej
      @Funcijej หลายเดือนก่อน +2

      Other than Cid, I have no idea what he is talking about

    • @ukyoize
      @ukyoize หลายเดือนก่อน

      There is character shinra in X-2,which shares name with corporation in 7 and they have a small cameo in 7r

    • @Tr1gg3rZ
      @Tr1gg3rZ หลายเดือนก่อน

      Biggs and Wedge

  • @jeanysergeimezarodriguez8629
    @jeanysergeimezarodriguez8629 หลายเดือนก่อน +1

    What about mongodb ObjectIds?

  • @BradleyWeston92
    @BradleyWeston92 หลายเดือนก่อน +1

    Say for example, the consumer of an API is the one generating an identifier, a UUID is required.
    So the table can have the auto incrementing ID still, and have a UUID that is unique index. The application layer would only reference the row using the uuid, but having multiple indexes are still smaller as using the AI ID not the UUID.

  • @landmanland
    @landmanland หลายเดือนก่อน +2

    The original purpose of UUID was to create RPC functions, not random numbers. This was before cryptography became important.