Thinking of taking up your masterclass. Scouting some more videos before taking the call. So far so good! Finally someone who sees past the interviews and is a true advocate of CS!
Thanks for making such quality content and making it available to everyone free of cost Arpit. The most valuable resource any human can have is time and someone like you spending your personal time to help others improve, is a significant and noble sacrifice. Thank you Sir!!
Would you not do "Insert into users select * from users_old order by id desc". It is more probable that our application READS the recent writes more compared to the old ones.
@@karanverma9197 Depends on the number of columns, etc. But we ran a migration for 100 million records, each row being around 3000 bytes on a 16 core machine with 64 GB ram. This took us around 10 minutes.
use negative range of int till you fix your code. Create a trigger to auto increment on every insert FYI this is a quick workaround and you will have enough time to fix the issue
Hi..I have one doubt. At 22:55 if you have already altered the auto increment to start from 2147483648, then if we execute the last line to copy old data, won't it start adding from the new increment value i.e. from 2147483648, 2147483649 ..etc and there will be no value from 0 to 2147483647...plz explain
There would be no data at the time of table creation but slowly data will be added from the users_old table so later on we will have data from 0 to 2147483647
Good info Arpit! I think one question remains how the 'old_users' and 'new_users' table will work along since now the records would be apart (incase required). Creating a view named 'users' fetching from both the tables should be good right?
Great video Arpit. I mean as a developer no one will expect us to fix this as it’s mostly done by DBA but yeah knowing the concept was insightful. So next time if Duplicate Key exception arises 🤣🤣🤣 I’ll have this scenario in my mind as well.
Hey Arpit, Few queries: 1. Since we are duplicating billions of row from one table to another in the second approach. What if the server does not have the disk space to store duplicate data for both tables. 2. How to handle this in sharded database case. Will the same approach be able to take care of sharded tables. Like the distribution of rows on different machines etc.
I am wondering what will happen when the insertion is in progress from users_old to users and there is a query for the data which isn't migrated in that moment of time, IMO, the 2nd approach only holds good, if data from the users' table(in this scenario) is not queried at a higher rate for select operations, else there is another outage to handle for data not found, even though its there just sitting in another table(users_old) 😊
You first create a blank table with a new name, migrate the data and then rename it to the table name. My bad. I should have used the name users_new instead of users_old. Woould have been much clearer.
Hi Arpit, I really enjoyed the tutorial. Just out of curiosity, the data migration operation from users_old to users table would potentially range from several hours to days, and in between them there may be a chance of multiple insert operations. So, does it maintain the sequence in the table over the ID column?
Buddy we can't run alter table on such huge tables for that there is a tool percona memory management for online alter. Pt online schema change alters in background.
How long will it take to copy from user_old to user table of 2 million records? Bcz my doubt is if it takes longer time Might cause data inconsistency right as application code search for some record in newly created users table but record from user_old table in not yet migrated
Great video, small query when we are altering the table from int to bigint the application level code will also require changes correct? Otherwise failures will be imminent.
Why should there be any change in application code, so long as your programming language can handle that integer range? Can you elaborate on possible errors you are concerned about.
@@AsliEngineering for example if we are using `int` for ids in Java shouldn't we change `id` data type now to `BigInteger` corresponding to the SQL schema changes?
@@prashantshubham I assumed you were using Python. But if you are using a type not compatible with the type you are suing then application needs an alteration.
UUID index very poorly hence is avoided at scale. If we consider to use UUID after this outage then a lot of application code and other peripheral systems would need to change of they assumed integer IDs. But I see the point in moving into a ID with a larger width and solving the problem.
@@AsliEngineering understand your point thanks for correcting me I also like to know that how can we optimize uuid for better performance ?? Pls, share your thoughts
@@subhaspaul495 you cannot optimise. UUIDs are massive as compared to integers. 4x big. So it bloats up index by a factor of 4. Making your db less performant. Personal experience 😀
@@AsliEngineering got it thanks so how would we store ids uniquely across all DBs and also for better performance ?? my thoughts would be to store 2ids one(_id: INT) is for PK and another as a unique identifier (use UUID here) and use this uuid for relation mapping is this a good way to do it??
@@subhaspaul495 as soon as you store big ID in index it will hamper the performance. the best way to do this is use normal integer ID for local indexing and the use the partition key to mathematically find which DB the entry resides.
Very informative video. Thanks for sharing. Is this a feasible solution? Add an additional column and make id and that newly added column as composite key? I think this will affect the application layer as well, but would love to know your thoughts.
Application layer affected for sure. In any case you table is altered, be it ID or other column being added. So time taken remains the same. No potential benefit, rather this requires you to change your app code.
TH-cam Short Worked !!!
Thinking of taking up your masterclass. Scouting some more videos before taking the call. So far so good! Finally someone who sees past the interviews and is a true advocate of CS!
Thanks. I love CS to the core; and hopefully, I see you in my future cohort :)
Thanks for making such quality content and making it available to everyone free of cost Arpit. The most valuable resource any human can have is time and someone like you spending your personal time to help others improve, is a significant and noble sacrifice. Thank you Sir!!
17:07 bruh ☠️
😂😂
Would you not do "Insert into users select * from users_old order by id desc". It is more probable that our application READS the recent writes more compared to the old ones.
Yeah for sure. Users can start seeing the recent items quicker. Thanks.
How much approximate time would it take to copy 2 billion rows to the new table?
@@karanverma9197 Depends on the number of columns, etc. But we ran a migration for 100 million records, each row being around 3000 bytes on a 16 core machine with 64 GB ram. This took us around 10 minutes.
use negative range of int till you fix your code. Create a trigger to auto increment on every insert
FYI this is a quick workaround and you will have enough time to fix the issue
well said, Arpit. we learn from the dissection more as its more practical. Thanks for sharing the approaches.
Absolutely amazing Dissection and easy to understand as well.
Also arpit i think a quick code change till the data is being migrated if userid less than 2^32 Select from oldtable else select from new table
Yeah. That's a solid hack to apply 🤘
Ye best hai 🙌
Bro, you are a GEM !
2nd Approach is beauty
Thanks for covering this in great detail!
Too many ads though breaks the attention span
Awesome explanation 👌🏻 Thanks Arpit. Waiting for next outage😂
Very Informative video.Thanks for sharing.
Good explanation.
Really faced this issue
Hi..I have one doubt. At 22:55 if you have already altered the auto increment to start from 2147483648, then if we execute the last line to copy old data, won't it start adding from the new increment value i.e. from 2147483648, 2147483649 ..etc and there will be no value from 0 to 2147483647...plz explain
There would be no data at the time of table creation but slowly data will be added from the users_old table so later on we will have data from 0 to 2147483647
No auto increment will always the take the larger value. If you insert any id below it will let you insert until & unless it is duplicate key.
no cause you are also inserting their ids from user_old to user_new auto increment only works as default if nothing is given
Would love to see more like this. Very informative video.
Good info Arpit! I think one question remains how the 'old_users' and 'new_users' table will work along since now the records would be apart (incase required). Creating a view named 'users' fetching from both the tables should be good right?
Great video Arpit. I mean as a developer no one will expect us to fix this as it’s mostly done by DBA but yeah knowing the concept was insightful.
So next time if Duplicate Key exception arises 🤣🤣🤣 I’ll have this scenario in my mind as well.
Hey Arpit, Few queries:
1. Since we are duplicating billions of row from one table to another in the second approach. What if the server does not have the disk space to store duplicate data for both tables.
2. How to handle this in sharded database case. Will the same approach be able to take care of sharded tables. Like the distribution of rows on different machines etc.
want to know when this happens, can we move to sharding? will it take more time in this case?
Can you create a new table as a UNION of the existing, new tables and rename appropriately to solve the issue quickly?
What if we have references to the user ID in other tables?
Awesome, thank you
I am wondering what will happen when the insertion is in progress from users_old to users and there is a query for the data which isn't migrated in that moment of time, IMO, the 2nd approach only holds good, if data from the users' table(in this scenario) is not queried at a higher rate for select operations, else there is another outage to handle for data not found, even though its there just sitting in another table(users_old) 😊
You first create a blank table with a new name, migrate the data and then rename it to the table name.
My bad. I should have used the name users_new instead of users_old. Woould have been much clearer.
Hi Arpit,
I really enjoyed the tutorial. Just out of curiosity, the data migration operation from users_old to users table would potentially range from several hours to days, and in between them there may be a chance of multiple insert operations. So, does it maintain the sequence in the table over the ID column?
Can we use negative ids ?
silly question - if its a signed int with min value < 0, why the first row insertion started with 1?
Buddy we can't run alter table on such huge tables for that there is a tool percona memory management for online alter. Pt online schema change alters in background.
I know and this is just a demo. Also, percona just creates a shadow table and switches.
How long will it take to copy from user_old to user table of 2 million records?
Bcz my doubt is if it takes longer time
Might cause data inconsistency right as application code search for some record in newly created users table but record from user_old table in not yet migrated
Migrate first & then renaming would solve this, I guess
I think renaming till will only work if you have no foreign key reference on the table.
yes. most companies are scale do not have them to allow no downtime migrations.
most handle this at a transactional level.
How the read will work if copy will take days, please explain this part too
You do an inplace migration.
what would happen if after 100 we try to add a row with id 95 ?
would mysql allow it? or only values greater than last row should be added?
It will allow it, If the 95 is not preset in the table then it will allow it.
Beekeeper Studio is amazing. I like sequel ace tho
Dont you think the user to user_old rename can also slowdown considering it already had 2B records. ?
It's just a rename operation. Lightning fast. And not dependent on the amount of data.
Great video, small query when we are altering the table from int to bigint the application level code will also require changes correct? Otherwise failures will be imminent.
Why should there be any change in application code, so long as your programming language can handle that integer range? Can you elaborate on possible errors you are concerned about.
@@AsliEngineering for example if we are using `int` for ids in Java shouldn't we change `id` data type now to `BigInteger` corresponding to the SQL schema changes?
@@prashantshubham I assumed you were using Python. But if you are using a type not compatible with the type you are suing then application needs an alteration.
taking UUID into consideration can solve this issue well
UUID index very poorly hence is avoided at scale.
If we consider to use UUID after this outage then a lot of application code and other peripheral systems would need to change of they assumed integer IDs.
But I see the point in moving into a ID with a larger width and solving the problem.
@@AsliEngineering understand your point thanks for correcting me
I also like to know that how can we optimize uuid for better performance ??
Pls, share your thoughts
@@subhaspaul495 you cannot optimise. UUIDs are massive as compared to integers. 4x big. So it bloats up index by a factor of 4.
Making your db less performant. Personal experience 😀
@@AsliEngineering got it thanks
so how would we store ids uniquely across all DBs and also for better performance ??
my thoughts would be to store 2ids
one(_id: INT) is for PK and another as a unique identifier (use UUID here) and use this uuid for relation mapping
is this a good way to do it??
@@subhaspaul495 as soon as you store big ID in index it will hamper the performance. the best way to do this is use normal integer ID for local indexing and the use the partition key to mathematically find which DB the entry resides.
Very informative video. Thanks for sharing.
Is this a feasible solution? Add an additional column and make id and that newly added column as composite key? I think this will affect the application layer as well, but would love to know your thoughts.
Application layer affected for sure.
In any case you table is altered, be it ID or other column being added. So time taken remains the same. No potential benefit, rather this requires you to change your app code.
@@AsliEngineering Makes sense. Thanks!
I’m giving my name to you❤
Interesting