The Problem With UUIDs… And How To Solve It

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ต.ค. 2024

ความคิดเห็น • 98

  • @MilanJovanovicTech
    @MilanJovanovicTech  2 หลายเดือนก่อน +3

    Want to master Clean Architecture? Go here: bit.ly/3PupkOJ
    Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt

    • @iteospace
      @iteospace 2 หลายเดือนก่อน +1

      Milan, How about "long" Snowflake Id next?

  • @P1n3apqlExpr3ss
    @P1n3apqlExpr3ss 2 หลายเดือนก่อน +7

    I'd highly recommend Jeff Moden's Guid fragmentation talk that dives deeper into this topic looking at page splits, fill factors, and gradually evolving data. It offers a perspective that is much different to the high count one-off insert/read demo shown here, and should be closer to the reality of many production systems

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      Link it here?

    • @P1n3apqlExpr3ss
      @P1n3apqlExpr3ss 2 หลายเดือนก่อน

      @@MilanJovanovicTech th-cam.com/video/rvZwMNJxqVo/w-d-xo.htmlsi=TJ3KhHMFwows0C_4

    • @EddieDemon
      @EddieDemon 2 หลายเดือนก่อน

      @@MilanJovanovicTech th-cam.com/video/rvZwMNJxqVo/w-d-xo.html

    • @dotnetfanboy
      @dotnetfanboy 2 หลายเดือนก่อน

      @@MilanJovanovicTech th-cam.com/video/qfQtY17bPQ4/w-d-xo.html

    • @goldenwraft5362
      @goldenwraft5362 หลายเดือนก่อน

      I believe it is this one th-cam.com/video/qfQtY17bPQ4/w-d-xo.htmlsi=Ls6A110rSMKR5Qbn @MilanJovanovicTech

  • @WolfspiritMagic
    @WolfspiritMagic 2 หลายเดือนก่อน +14

    While everything in the video is true, there is one thing that somehow nobody mentions when talking about these time based ids: Security. In my opinion there are cases where exposing the creation Date/Time of a database record can lead to security issues. For example with these kind of IDs users will always be able to figure out if they have been created before or after another user if they get their IDs. That's sometimes the reason why you use GUIDs instead of auto incrementing numbers in the first place.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +1

      Makes sense, but a timestamp leaks less information than an INT/LONG id.

    • @Whojoo
      @Whojoo 2 หลายเดือนก่อน +1

      Asume I know too little about security. Why would it matter if I as a user know if my account is created before or after an other user?

    • @WolfspiritMagic
      @WolfspiritMagic 2 หลายเดือนก่อน

      @@Whojoo Before/After might not a big issue but it's still data that you might not want to leak. For example if you have a game and User A invites User B to your game, might even spend some money for User B to give them a starting credit and then figures out based on the ID that User B was already member of that game before User A.

    • @WolfspiritMagic
      @WolfspiritMagic 2 หลายเดือนก่อน

      @@MilanJovanovicTech Actually I'm not sure about that. I think it depends and a timestamp might in some cases be even worse. Lets say you have a database of patients for a doctor treating a specific illness. Knowing the timestamp could give people insight into when that person actually got that illness.
      Or lets say you're looking for another job and join a job site for that. Your current employer finds you on that website. With an ID in itself of int/long (and no other IDs to compare) they won't have much informations (unless the website provides that to them) and you could be member of that size for years. With an ID based on a timestamp they will be able find out that you just recently started looking for a new job.
      Also most of randomness is based on time. Having the exact creation time in milliseconds with the entry might in some cases help an attacker to limit the amount of bruteforce required. That's for example how they recently hacked a crypto wallet.

    • @JohnathonSmith03
      @JohnathonSmith03 2 หลายเดือนก่อน

      @@WolfspiritMagicstill don’t see how making it harder for users to lie and deceive each other would be a security issue… not to mention it isn’t even reliable at all short scale anyway otherwise time based keys around be guid anyway. So you get a small amount of ordering info which is can think of very few circumstances being an actual security issue. Just makes it harder to lie about sequencing.

  • @robadobdob
    @robadobdob 2 หลายเดือนก่อน +5

    I think using UUIDs in databases is one of those topics which probably doesn't have a measurable impact to most of us at the scales we work at.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +1

      Probably true

    • @AdonikamVirgo
      @AdonikamVirgo 2 หลายเดือนก่อน

      We've had issues in a 1mil+ row events table; Jeff Moden's talk is verbose, but boils down to not using the defaults - 90% fill factor for random ids causes expensive page splits, clustered PK is a waste and sequential inserts are not the enemy UNTIL you update and grow the row in some way. th-cam.com/video/rvZwMNJxqVo/w-d-xo.html

  •  2 หลายเดือนก่อน +6

    This video is really great, and there are so many bad videos on this topic. The only thing I don't think you mentioned (or I might have missed it), is that SQL Server saves and order a UNIQUEIDENTIFIER in a different byte order than C# (and some other databases). That means even with UUID 7, if we use this the UNIQUEIDENTIFIER datatype, there will be fragmentation issues. Storing as string or binary will not have the same type off issues though.

    • @RobertMcLaws
      @RobertMcLaws 2 หลายเดือนก่อน +3

      That only happens when you have a clustered UNIQUEIDENTIFIER. Create the ID column as NONCLUSTERED and create a CLUSTERED INDEX on the DateCreated instead, and your database will have almost the exact same performance as INT IDs.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +3

      "there are so many bad videos on this topic" - I'm not sure why some videos focused on the how fast the ID creation time is. 🤷‍♂️

    • @dcernach
      @dcernach 2 หลายเดือนก่อน +1

      Never thought about it. Our DBAs complain a lot about our using GUID as a PK. If I understood correctly, you meant to create a table with an ID as a UNIQUEIDENTIFIER and add a column like CreatedOnUtc. Then, create a CLUSTERED INDEX using both columns, or create a clustered index on CreatedOnUtc and a non-clustered index on ID?

    • @daveanderson8348
      @daveanderson8348 2 หลายเดือนก่อน +2

      @@RobertMcLaws How about using that ID column as a Foreign Key in other tables? What is the impact of that?

    • @daveanderson8348
      @daveanderson8348 2 หลายเดือนก่อน +1

      @@dcernach The GUID type column as PK should be non-clustered index. Otherwise the fragmentation of GUID would have negative performance impact when your table has at least millions of rows.

  • @ucretsiztakipci6612
    @ucretsiztakipci6612 2 หลายเดือนก่อน +2

    Adding Postgress to this benchmark would be interesting. Wher is the repo?

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      Was planning to, but didn't have time to add a Postgres benchmark

  • @ahmettarksahin3777
    @ahmettarksahin3777 2 หลายเดือนก่อน

    When we use ULID or UUID as the primary key in the database, should we check whether ULID or UUID has been created before when adding data to the database?

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +2

      No need, they're random enough that you're unlikely to see a collision. And even then, you'd get an exception for duplicate key values.

  • @jodainemoore8300
    @jodainemoore8300 2 หลายเดือนก่อน

    Great Videos, Please can you do one on Composition and Or with Strategy Pattern, best struggling with these for months.

  • @FranciscoCastellanodePablo
    @FranciscoCastellanodePablo 2 หลายเดือนก่อน

    Be carefully with int, it can only support database generation. I have faced problems when your application (not planned in case) began to support external applications that has there own database like mobile applications that need to work with no connection.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      There's also the HiLo strategy

    • @FranciscoCastellanodePablo
      @FranciscoCastellanodePablo 2 หลายเดือนก่อน

      @@MilanJovanovicTech good point, never the less in my experience its better to go with UUID, this Hilo and others strategies that could exists could be needed if UUID creates other issues like performance as mention in the video. But for the majority of the cases I presume UUID it is the better option.

  • @RomanTurovskyy
    @RomanTurovskyy 2 หลายเดือนก่อน +1

    If using GUIDs introduces less then 2x slowdown I do not see this as an issue. 10x performance penalty would have been a problem, but 2x can be easily covered by the modern hardware for the benefits GUIDs provide.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +1

      True. Most problems in engineering are amplified by scale.

  • @AhmedKhaled-g8t
    @AhmedKhaled-g8t 2 หลายเดือนก่อน

    what is the best order to study clean arc

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      There's a million sources, watch them all. But the source of truth is Uncle Bob's book.

  • @thedacian123
    @thedacian123 2 หลายเดือนก่อน

    Nice,but about database sequences?

  • @kenwarner
    @kenwarner 2 หลายเดือนก่อน

    It's probably just me but having the SSMS toolbars on two rows instead of one stuck out like a sore thumb. I always combine to a single row whenever I have to install it. 😬

  • @casperhansen826
    @casperhansen826 2 หลายเดือนก่อน

    The problem with ULID it is not compatible with UUID/GUID so it will require a lot of changes to make it work in an existing system
    If you let the SQL server database create the guid it will be generated in sorted order and work just as well

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      SQL Server isn't the only database people commonly use, so we should look at this from a broader perspective.
      I'll upload a Postgres benchmark soon!

    •  2 หลายเดือนก่อน +1

      If you use NEWID and not NEWSEQUENTIALID, then you should have the same problem if MSSQL generates it. NEWSEQUENTIALID is closer to what UUIDv7 does, but compatible with the UNIQUEIDENTIFIER type without getting to much fragmentation. ULID is 128 bits, same as UUID so not sure what you mean with "not compatible", there is always the little-big-endian issues when working with UUID/GUID/ULID, so if your working on a bit-level, you can get confused really fast... For the text representation, they are not presented in the same way. But if you store it as CHARs and not BINARY or UNIQUEIDENTIFIER, then you have about doubled the size already... if that's important for your usecase.

  • @theeusftn
    @theeusftn 2 หลายเดือนก่อน

    But why I need to use timestamp in my uuid? i dont got it

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      Clustered index

    • @theeusftn
      @theeusftn 2 หลายเดือนก่อน

      @@MilanJovanovicTech interesting

  • @vijayarajan-bt5fk
    @vijayarajan-bt5fk 2 หลายเดือนก่อน

    15.26 where to get this benchmark...? Shell script;

  • @DavidSmith-ef4eh
    @DavidSmith-ef4eh 2 หลายเดือนก่อน

    do they slow down queries and db in general? an eleven length unsigned int seems much more preferable than a 32-length char entry lol. I assume the index size and look up speed are much slower on the latter, on huge databases.

    • @DavidSmith-ef4eh
      @DavidSmith-ef4eh 2 หลายเดือนก่อน

      although, on the other hand, you don't need a created_at column anymore. I can see msyelf using them, for things that require unique ids accross a system

    • @Rob_III
      @Rob_III 2 หลายเดือนก่อน +3

      You shouldn't store a ULID (or (G/U)UID for that matter) as (var)char but as (G/U)UID. And, yes, a ULID can be stored as a (G/U)UID.

    • @RobertMcLaws
      @RobertMcLaws 2 หลายเดือนก่อน

      Using the NONCLUSTERED UniqueIdentifer + Clustered DateCreated is nearly as fast as integers and has been tested on huge databases with negligible performance impact.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +1

      They have an impact on table/index size, for sure

    • @casperhansen826
      @casperhansen826 2 หลายเดือนก่อน +1

      It's 16 bits and the main benefit is that it can be generated on the client and written to the database.
      The problem with GUID and UUID is that the order of them is more or less random so writing it to the database as a clustered index will cause some heavy manipulation to make it work. ULID should solve that problem, still generating unique values in almost sorted order making it possible to just place the new record at the end of the table

  • @marikselazemaj3428
    @marikselazemaj3428 2 หลายเดือนก่อน

    10:45 What are those icons before class names?

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน +1

      ReSharper detects they're entities in an EF DbContext

  • @sunzhang-d9v
    @sunzhang-d9v 2 หลายเดือนก่อน +2

    SnowFlake ldGenerator

    • @DavidSmith-ef4eh
      @DavidSmith-ef4eh 2 หลายเดือนก่อน

      lol, thats an unfortunate name. does it contain the pronoun of the id as well?

    • @Rob_III
      @Rob_III 2 หลายเดือนก่อน +1

      @@DavidSmith-ef4eh Snowflake predates woke stuff by at least a decade; Twitter introduced it in 2010.

    • @DavidSmith-ef4eh
      @DavidSmith-ef4eh 2 หลายเดือนก่อน

      @@Rob_III yeah, read upon it. seems to be used across all the major socials, apparently. must be fastest I guess.

    • @MilanJovanovicTech
      @MilanJovanovicTech  2 หลายเดือนก่อน

      Cool, will see if I can include it in some future discussions

    • @sunzhang-d9v
      @sunzhang-d9v 2 หลายเดือนก่อน

      @@MilanJovanovicTech what? feel ,long long ago

  • @sunzhang-d9v
    @sunzhang-d9v 2 หลายเดือนก่อน

    kubernetes ,Waiting