How does the database guarantee reliability using write-ahead logging?

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ม.ค. 2025

ความคิดเห็น • 40

  • @MaxPicAxe
    @MaxPicAxe หลายเดือนก่อน +2

    Wow when I saw how the log sequence number was done I was shocked about the attention to detail.
    Really good job explaining everything, you're getting a sub.

  • @manurajsinghrathore2950
    @manurajsinghrathore2950 3 หลายเดือนก่อน

    By learning Development I literally want to learn the implementation and Architecture of systems,
    not just want to learn how to use them with a different language/libraries .
    And that's what you are teaching.
    Love your channel.

  • @jaskiratwalia
    @jaskiratwalia 11 หลายเดือนก่อน +2

    Amazing content as always. The clarity you have on such important concepts is amazing!

  • @atlicervantes6187
    @atlicervantes6187 2 ปีที่แล้ว +2

    Excellent explanation!

  • @pranjalagnihotri6072
    @pranjalagnihotri6072 2 ปีที่แล้ว +10

    Hi Arpit, totally loving these videos.
    I had few questions on WAL:
    1- Suppose we configure flush frequency as 2 second and a update operation happens which gets appended to the log file and not yet flushed to the disk, in that time if other process tries to read the same row it will get the stale data right?
    2- When we are appending log to the log file it will reply with some kind of acknowledgement right saying log is successfully written, what will happen if this process crashes and acknowledgement is not sent will it write the same log twice(due to retry)? If this is the case how do we ensure that we will discard duplicates while flushing logs to disk?

    • @AsliEngineering
      @AsliEngineering  2 ปีที่แล้ว +12

      1. The updates are first made in the disk blocks the database fetched in memory. Because it might be possible due to some constraint that update might fail so writing to flush file without checking if it is even possible is futile. Hence the flow is: fetch the diskblocks to be updated in memory, update the blocks in memory, write to WAL, flush it to the disk
      My bad: I should have explained this in the video.
      2. To be honest I am not sure. But my guess is because writing to a log file is not a network call, it is not unpredictable. Hence you will always get an ACK about writing it to the log file so no need of retry. Again my best guess.
      It would be awesome if we could deep dive into it sometime, If you find any resource let me know.

    • @AbhishekYadav-fb3uh
      @AbhishekYadav-fb3uh ปีที่แล้ว +4

      Ans 2)
      In order to ensure that duplicate log records are not written to the log file during a retry, most databases use a technique called "idempotent logging". In idempotent logging, each log record is assigned a unique identifier, and the database uses this identifier to check if the log record has already been written to the log file. If the log record has already been written, the database simply ignores the duplicate record and does not write it again.

    • @harvendrasinghrathore2848
      @harvendrasinghrathore2848 8 วันที่ผ่านมา

      After completing this tutorial this is the first question which strike my mind, like what will happen if these update has taken exclusive lock on row and as it will be committed after a second, will next transaction have to wait as it is not committed yet.

  • @jayesha.6194
    @jayesha.6194 2 ปีที่แล้ว +1

    Thanks Arpit for this videos.

  • @sanjitselvan5348
    @sanjitselvan5348 ปีที่แล้ว

    Nice explanation! I learnt a lot. Thank you!

  • @AshwaniSharma0207
    @AshwaniSharma0207 ปีที่แล้ว +1

    In WAL file, if we don't keep the actual data with Insert/Update commands, how will we get the data in a new fresh DB by applying the WAL file?

  • @shadyabhi
    @shadyabhi 21 วันที่ผ่านมา

    Thanks for the video. I'm unable to find the reference to "sync mode" while opening files. From what I can see, you can always call "fsync" after "write", but that won't be an atomic operation.
    Did you mean O_DIRECT mode? Can you share more details? Thanks!

  • @vinitsunita
    @vinitsunita ปีที่แล้ว +1

    Writing to wal means we are persisting changes in Disk which could be slow operation also. What is the way to make it faster?

  • @dipankarkumarsingh
    @dipankarkumarsingh ปีที่แล้ว +1

    ✅ finsihed .... 👌.... ❤

  • @viren24
    @viren24 2 ปีที่แล้ว

    Awesome Content. I am loving it Arpit. it is also useful in case of Master Slave configuration.

  • @ziyinyou938
    @ziyinyou938 8 หลายเดือนก่อน

    This is just AWESOME

  • @lakshaysharma8144
    @lakshaysharma8144 2 ปีที่แล้ว +20

    Netflix for developers.

    • @sanketh768
      @sanketh768 6 หลายเดือนก่อน +1

      nice analogy

    • @saurabhsuman4960
      @saurabhsuman4960 2 หลายเดือนก่อน

      Very true, with Arpit learning is fun

  • @ramyakmehra312
    @ramyakmehra312 ปีที่แล้ว +1

    if we are flushing the data every 1 minute lets say, how is the consistency ensured? Incase a read happens before the changes are made. It will fetch it from the disk but disk doesn't have the latest changes yet.

    • @Avinashkk360
      @Avinashkk360 ปีที่แล้ว +2

      Until it is flushed, it stays in memory. And reads are addressed from memory/cache first, so it wont go to disk to fetch it.

  • @ShubhamKumar-fi1kp
    @ShubhamKumar-fi1kp 2 ปีที่แล้ว

    Hi Arpit Bhaiya , I am not able to understand when the db has to update million rows what happens can you please give a brief about that

  • @abhinavsingh4221
    @abhinavsingh4221 2 ปีที่แล้ว +2

    Nice video! Had one question. You mentioned that the committed data is not straightaway put in the database memory instead it is appended in the WAL file and then asynchronously these changes are applied to the memory. But suppose I write some data in a commit and that query is appended to WAL but not written to the memory. And in the mean time I read from the database so will I get the old data? How this situation is handled?

    • @vishalbhopal
      @vishalbhopal 2 ปีที่แล้ว

      I think the write operation will happen in main memory which is fast. It will be written in disk asynchronously

    • @harvendrasinghrathore2848
      @harvendrasinghrathore2848 8 วันที่ผ่านมา +1

      I think it has updated the disk block in memory and considered as committed transaction, so when the next transaction will ask for data it will serve from the memory, not from the disk.

  • @pratprop
    @pratprop ปีที่แล้ว

    Hi Arpit, great explanation of WAL. Do you know what would be a great follow up
    to this, ARIES for database recovery.

    • @AsliEngineering
      @AsliEngineering  ปีที่แล้ว

      That's an excellent topic. Thanks for suggesting.

  • @VikramKumar-lp7wv
    @VikramKumar-lp7wv 2 ปีที่แล้ว +1

    Hi arpit great explaination man🙌
    just one question: you've said that while adding a new entry to the WAL file we generate CRC, first add that to the WAL file and then store correspoing SQL command/data as a new entry. Is this not a possibility that after writing the CRC for the new entry and as we've written some SQL command entry text, our system crashes and this latest entry is incorrect. But as the system will be fixed and up again, we'll read this latest entry as a valid one bcz CRC was assigned to this entry and we consider it as a valid entry bcz of that.
    Is it not better to first add the SQL command/data as a new entry first into WAL page and then if its written successfully assign CRC to this new entry. This way in what ever scenario our system goes down, we'll be correctly able to figure out whether the latest entry was correctly written or not!

    • @kalyanben10
      @kalyanben10 ปีที่แล้ว +1

      So, you first want to read the entire record data and then read CRC code? Don't you think its not optimal? You first read CRC code, and then you keep reading fixed number of bytes, keep doing check, this way you don't overuse the memory.. CRC check is done sequentially on a file.. hence, the point is to only read fixed number of bytes of the actual record, apply check, discard whatever record data you read.. load next chunk of bytes and repeat the process. If you record is very huge.. you are unnecessarily overloading system into reading entire record.

  • @akashagarwal6390
    @akashagarwal6390 2 ปีที่แล้ว

    What if there is a dirty read made to the same data written concurrently? How do we deal?

    • @RahulPal-mz4oj
      @RahulPal-mz4oj 4 วันที่ผ่านมา

      I think dirty read is a race condition and has to be avoided. To deal with that, we have different methods. I don't think it will interfere with WAL

  • @kushagraverma7855
    @kushagraverma7855 2 ปีที่แล้ว

    awesome content, would be great to have a discord / reddit page for each of the videos for further discussion

  • @protyaybanerjee5051
    @protyaybanerjee5051 ปีที่แล้ว

    TL;DR - Crash recovery guarantees in ALMOST all DBs are enabled by the judicious use of WAL.

  • @sakshamgoyal4079
    @sakshamgoyal4079 2 ปีที่แล้ว +1

    😍

  • @zdevpro
    @zdevpro 2 หลายเดือนก่อน

    are you gujarati bhaiya ?

    • @AsliEngineering
      @AsliEngineering  2 หลายเดือนก่อน +1

      Yes.

    • @zdevpro
      @zdevpro 2 หลายเดือนก่อน

      @@AsliEngineering great i got it from your surname btw are you bornn and brought upin gujrat ?? btw you are kathiyawadi ??

    • @zdevpro
      @zdevpro 2 หลายเดือนก่อน

      @@AsliEngineering i like your in depth video for any system and concept

  • @kumarprateek1279
    @kumarprateek1279 2 ปีที่แล้ว

    😍