The trillion+ ledger entries are driver+rider receipts which need to be stored for N+3 years. Uber does around 10B trips a year averages about 300k trips concurrently [globally]. This produces a minimum 24B receipts a year, but the actual number is much higher since many rides have N+1 riders and they also have to breakout surge (rider+driver) and tolls. They have been at this scale for at least 5 years.
And any existing on-premise RelDB wouldn't have cut it. Or even any DB that supports partitioning and LSM-trees. Or any timeseries DB, big data FS, and so on and so on. No, Silicon Valley needs to reinvent the DB! Uber has so much money to burn anyway :D
@@monolith-zl4qt good point cassandra and relational are too cumbersome at scale. they did open source their time-series db m3db. I wish they would open-source schemaless and this ledgerDB they just built :P
@@kieronwiltshire1701they were profitable last year with $2billions but have been losing 10s of billions the years prior. Too early to say they are profitable
Ledger store’s written data can never be updated because the 20 engineers that built it will be gone and no one can understand the code anymore. Someone wanted some job security or to play with some toy ideas.
The 6 million figure almost definitely includes the cost of new employees required to maintain the new data store as well as any infrastructure costs. The way you're thinking of it is like saying I saved 100% of my monthly food costs by switching from store X to store Y.
You also need a shit ton of engineers to run a cloud solution, except they are usually working with a black box with very little control. Sometimes it can pay to know everything about a tool from ground up. At least now they have control and governance.
He also miscounted the number he calculated in the first place (said 21 trillion when the calculated number was only 21 billion). This worked in his favor a bit since since he ended up only being off by 1 order of magnitude from the real answer (40k * 60 * 60 * 24 * 365 = 1.3 trillion)
Can't talk about data transfer, but i can talk about a certain data transform job that is on month one of two (maybe three) because it was either "batch it live and slowly" or "two weeks downtime". Not a joke.. The backing system is dog slow on certain operations, and so convoluted only it is reliable enough to do it, raw dogging the DB always ends in tears...
2:39 if he talks about transactions: No one is sure how much Visa CAN(their tests says 56k, but no one can reproduce it for obvious reasons) do, but in 2023, at the peak, it's 757 mil daily transactions, or something around 8-9k transactions per second
Ah, if it were not for the pesky regulation of having to keep all the records for 5/7/10 years, pending on the market. Those complexities were not considered when they decided to save the money they were giving Stripe and 'make their own theme park'...
I assumed the cost of the new solution was factored in and the 'savings' is what's left after the implementation of the inhouse solution. At least that's how all the P&L estimation I ever saw worked.
In a large company it completely makes sense to build your own… especially if the cost is ultimately cheaper to build and maintain, and reliability is ensured.
It’s the maintaining and reliability that suffers, at least in my experience it never has the resources needed to maintain since it’s not the core business
@@HarpreetSingh-xg2zm probably because non-software educated people are in the management hierarchy making the decisions. Software costs and maintenance are no different to running a lab or a factory… known ongoing costs… fall short of covering the costs or not keeping up maintenance, then the lab/factory will fail, same with software. It’s actually a pretty simple concept. And simple cost benefit analysis. Do we outsource or do we do it internally? It shouldn’t matter that it’s “more complex” because as long as you can hire the right people and the cost is lower than outsourcing, then you’re more profitable.
@@HarpreetSingh-xg2zmIf you are a big company as uber scale economics matter. And it's not that big of an issue to maintain a custom built db if you know what you are doing
I love these videos, i studied IT and worked 11 something years, but now i've achieved financial freedom and after a year, i wanna go back to do some actual work ...
I see why more companies are writing their own solution instead of using Open Source. No one wants to develop their entire infrastructure and the open source pull the plug on the last minute. Im talking to you Terraform and Redis.
The main take away isnt that there are other companies taking in money...all do...Uber needs blobs and thats more than just numbers and characters in a dbms...thats system storage that starts to drain the basic cost of financial regulation datasets. Their transactions are actually blobs of data. Being able to store the blobs...which are ever increasing and the transactions and incur the cost as a business expense rather than get really penallized by the actual providers who can't handle their storage demands.
Uber probably knows all their vendors are just as evil as they are. There's some substantial risk reduction in replacing a single source solution with something you own. Is there anything out there existing where they don't risk serious vendor lock b/c of their scale?
This is the correct answer. When you have that much data, its best to make your own database. Its not as hard as people would think. Just make sure your filesystem type and kernel are capable of concurrency while reading and writing. You can almost copy old database code and your job is about 3/4ths the way done. You can also convert your company into a DB and software company that you can sell services. I might need to buy Uber stocks.
1:42 I'm never going to complain about any programmer that falls for NIH, its our jobs to create software, that's what we do, though luck for the salesman trying to sell crap we could have done in-house. But one might say its not good for the company, yes, who cares.
In adding up all the engineers and efforts and such at Uber, nobody commented that DynamoDB sounds like it had no problems (it sounds like Uber's only gripe was the cost). In other words a 2012-era java-based solution designed to solve this problem at a larger scale in an efficient manner was humming along... And not just humming along with one teeny tiny little Uber's snail fart worth of data, but tens to hundreds of thousands of Ubers worth, with plenty of capacity for more to jump on board at a moments notice, ready for anyone with a high enough limit on their credit card... :P
That's interesting.. I wonder if it would be more efficient to let the indexes consume the log and update themselves; at least the ones not providing constraints. A request can lock the required indexes at the latest sequence; so they would all need to catch up and snapshot at that point before the request could be fulfilled. IDK; interesting stuff. I need to go back in read the Tango white paper. Can't recall if they discussed indexes in there.
To put it in perspective, total operating costs of Uber are over $33b. Not sure how they counted that $6m/year and how much money they spent getting there, but it just doesn't sound like a huge success money wise so maybe it's an operationally significant improvement? Fun project for devs though, that's for sure
For $6m a year you can probably hire half IT workforce in some countries and just 8 people in us? In Poland you could hire 100 senior engineers for $6m.
All this talk about how 8M dollars would hire Uber like 8 engineers, but you could hire a small army of them in India or Balkans or Eastern Europe for this kind of money
I guess it depends on your country, but In mine, Poland, Java is #1 lanugage server. There are always jobs for java devs, they are stable and pay very well. Modern java is kinda nice to write, very simillar to the typescript - Spring Boot is a complex being, but standarization which comes with it makes it easier to go into existing big project and understand what's going on. And the dealbreaker for long running projects - while ts ecosystem changes all the time, Java is very, very stable.
If you want to use java with frontend framework, you can use spring boot openapi project and all generate from them types using openapi-ts - i maybe even like it more than TRPC, because it still gives you the full type safety and there is a strong separation between frontend (public code) and backend (private code). Also, you don't have to go hard on OOP - in my carrier I've seen crazy OOP projects are they are always hard to work with - you can in java as well write simple, mostly procedural code with a few functional features such are amazing java streams api. This leads to clean (in a meaning of easy to understand, work with) code.
@@ooijaz6063 Hmm thank you very much for the advice, i was also considering java to be a better option because if needed its easy to go from java to mern then vice versa.
Nah man financial databases are dog shit. You're just going to get an ERP with a thick client built on top of a SQL database with 9000 tables. And it will store all the ERP code in the database.
Hi ThePrimeagen, I have been coding for 3 years and really passionated about that area like building optimized and modern applications. I have started looking DSA from scratch and watching your DSA course. In that point, I am afraid of AI. Should I still improve myself about programming or not ? I earn money from that area
'peta' is 10^15, or 1,000,000,000,000,000. If you had 300,000,000 processes acting in parallel at one task per second non-stop around the clock, it would take ~38.5 days to go through 10^15 tasks. If you had 500,000,000 processes acting in parallel you could go through 10^15 tasks in ~33 minutes. I think that's right... NOTE: Thanks to @LtdJorge for pointing out that my math was off by a lot...500M reduces the ~38.5 days to ~23.3 days. Peer review rocks!!
SQL ... squirrel ... squirrel-lite ... Why people dont say S Q L ? Shortcuts(?) are not meant to build a word. EA = eay? PHP = pehpe? JS = jayes ? Why?
Why not just read and react to the original article, made by Uber, which has all the background info, explaining why it made sense, and what the wanted to accomplish? 🤷♂
Please wate I have examples “where” on my phone but I need 10k to have apple recover the data “next” sorry if that’s too much for societies Delicate sensitivities
If each car literally had its own DB (Not a partition) one could then 'sync' encrypted SQLite files to the blockchain. Maybe use another DB to centralize only relevant data for book keeping etc, but blockchain would be able to preserve all the data a car would record (Locations, hair color, irises of customers, their conversations etc.). LOL. You could then buy and trade 'cars' or micros with all the data.
First hook up the return #+- rolled up indo stick to burn- I’m about to press the point ,ant, no need to ,act concerned - filling to find some smoke; you could ,say I’m, on the burn- if you don’t know ,what, mean then, you really odds learn/ #another… my altitude like living in Iraq, fucking with me you gonna get that whole stack
Anyone else shook by the lack of a "The name...." outtro?
extremely shook
Shooked 😲
Don't worry, already unsubscribed. The cheek !!
Don't know why, but that felt like a spoiler. Lol
Anyone got admen rights if so someone block me for a week or so 😅jk not kidding
The trillion+ ledger entries are driver+rider receipts which need to be stored for N+3 years. Uber does around 10B trips a year averages about 300k trips concurrently [globally]. This produces a minimum 24B receipts a year, but the actual number is much higher since many rides have N+1 riders and they also have to breakout surge (rider+driver) and tolls. They have been at this scale for at least 5 years.
And any existing on-premise RelDB wouldn't have cut it. Or even any DB that supports partitioning and LSM-trees. Or any timeseries DB, big data FS, and so on and so on. No, Silicon Valley needs to reinvent the DB! Uber has so much money to burn anyway :D
@@monolith-zl4qt good point cassandra and relational are too cumbersome at scale. they did open source their time-series db m3db. I wish they would open-source schemaless and this ledgerDB they just built :P
So csv's on a relatively small stack of hard drives would have been all they needed ;P.
You know I always thought uber made things too complicated and put too much engineering into it, but 10B trips is intense
Uber is crashing down and they pivot to become Oracle while cash still in the line, how did he miss it ?
Except Uber is a profitable business?
@@kieronwiltshire1701 in 2023 and 2024. During a brutal inflationary period. Their finances aren't rock solid.
@@kieronwiltshire1701they were profitable last year with $2billions but have been losing 10s of billions the years prior. Too early to say they are profitable
@@kieronwiltshire1701not this recent quarter
@@kieronwiltshire1701how is oracle not profitable…?
Ledger store’s written data can never be updated because the 20 engineers that built it will be gone and no one can understand the code anymore. Someone wanted some job security or to play with some toy ideas.
I started watching this channel around 60k subs and it has given me such a fun mix of knowledge and ridiculous silly jokes, what an awesome channel ❤
The 6 million figure almost definitely includes the cost of new employees required to maintain the new data store as well as any infrastructure costs. The way you're thinking of it is like saying I saved 100% of my monthly food costs by switching from store X to store Y.
Put a dollar sign in the title man it’s absolutely insane otherwise
Only bc that number uhhh commonly appears in another context
6 Million records is nothing
I'm one of those 6 million saved. Thank you Uber!
6 million trees. They going green.
Isn't Uber a german company? Ain't no way they saving 6 Million 💀
You also need a shit ton of engineers to run a cloud solution, except they are usually working with a black box with very little control. Sometimes it can pay to know everything about a tool from ground up. At least now they have control and governance.
You can also hire about 600 india devs for that money
and they will fuck everything up and in the long term you lose even more money
When calculating Visa you missed factor of 60: 40k per second * 60s * 60m * 24h * 365d
For everything else: There's Mastercard (we are your new masters)
You, me, and that one commenter picked up on it.
PRIIIIIIIIIIIIIIIIME (but not Amazon)
He also miscounted the number he calculated in the first place (said 21 trillion when the calculated number was only 21 billion). This worked in his favor a bit since since he ended up only being off by 1 order of magnitude from the real answer (40k * 60 * 60 * 24 * 365 = 1.3 trillion)
Can't talk about data transfer, but i can talk about a certain data transform job that is on month one of two (maybe three) because it was either "batch it live and slowly" or "two weeks downtime". Not a joke.. The backing system is dog slow on certain operations, and so convoluted only it is reliable enough to do it, raw dogging the DB always ends in tears...
Ain't no way we got OF girls in these comments...
Spambots know no bounds
Ofc. They are after virgins
Balls
Where’d everyone go we’s was talking here b’s jk I’m leaving
Not to get political but OF is a safer alternative to IRL sex work. Literally just a girl putting a camera in her room
I say this regardless of the subject; They are criminals, and the entire board of directors will experience what it's like to be in prison.
2:39 if he talks about transactions: No one is sure how much Visa CAN(their tests says 56k, but no one can reproduce it for obvious reasons) do, but in 2023, at the peak, it's 757 mil daily transactions, or something around 8-9k transactions per second
You would assume the term "saved 6 billion" would mean the cost for the replacement has been factored in
How many of those 1 trillion records could instead be purged and thus saving even more?
Ah, if it were not for the pesky regulation of having to keep all the records for 5/7/10 years, pending on the market. Those complexities were not considered when they decided to save the money they were giving Stripe and 'make their own theme park'...
@@zwerko where do those regulations actually exist? So far the worst I've met was 2 years
@@sealoftime Canada 6 years, Netherlands 5 years (last I checked), New York State 7 years (according to Google)
@@acmethunder never worked with either, thanks for the info!
@@sealoftime yeah, regulations are all over the place.
Some engineers gained black job security belts at Uber for next few years.
You missed a 60 in that multiplication formula. 40,000 txn/s * 60 s/m * 60 m/h * 24 h/day * 365 day/year = 1.26144×10¹², or roughly 1.2 trillion txn/year.
I assumed the cost of the new solution was factored in and the 'savings' is what's left after the implementation of the inhouse solution. At least that's how all the P&L estimation I ever saw worked.
In a large company it completely makes sense to build your own… especially if the cost is ultimately cheaper to build and maintain, and reliability is ensured.
It’s the maintaining and reliability that suffers, at least in my experience it never has the resources needed to maintain since it’s not the core business
@@HarpreetSingh-xg2zm probably because non-software educated people are in the management hierarchy making the decisions. Software costs and maintenance are no different to running a lab or a factory… known ongoing costs… fall short of covering the costs or not keeping up maintenance, then the lab/factory will fail, same with software. It’s actually a pretty simple concept. And simple cost benefit analysis. Do we outsource or do we do it internally? It shouldn’t matter that it’s “more complex” because as long as you can hire the right people and the cost is lower than outsourcing, then you’re more profitable.
@@HarpreetSingh-xg2zmIf you are a big company as uber scale economics matter. And it's not that big of an issue to maintain a custom built db if you know what you are doing
I've heard Uber is Splunk's biggest customer globally. If you're going to try and save money as Uber, throw those engineers at that!
I love these videos, i studied IT and worked 11 something years, but now i've achieved financial freedom and after a year, i wanna go back to do some actual work ...
If I could snap my fingers, this kind of purpose built back end stuff is exactly what I would love to do.
I really like the source materials he reacts to. Anyone just clicks through to read them yourself to save 30 mins of yapping time?
I’ve usually read the stuff he reviews.
Just do what we did and dump it into a bunch of parquet files. Lol
I see why more companies are writing their own solution instead of using Open Source. No one wants to develop their entire infrastructure and the open source pull the plug on the last minute. Im talking to you Terraform and Redis.
FYI, VISA does 400k transactions per second on Black Friday... 40k is normal operations... peak is 10x that...
The main take away isnt that there are other companies taking in money...all do...Uber needs blobs and thats more than just numbers and characters in a dbms...thats system storage that starts to drain the basic cost of financial regulation datasets. Their transactions are actually blobs of data.
Being able to store the blobs...which are ever increasing and the transactions and incur the cost as a business expense rather than get really penallized by the actual providers who can't handle their storage demands.
Uber probably knows all their vendors are just as evil as they are. There's some substantial risk reduction in replacing a single source solution with something you own. Is there anything out there existing where they don't risk serious vendor lock b/c of their scale?
This is the correct answer.
When you have that much data, its best to make your own database. Its not as hard as people would think. Just make sure your filesystem type and kernel are capable of concurrency while reading and writing. You can almost copy old database code and your job is about 3/4ths the way done. You can also convert your company into a DB and software company that you can sell services. I might need to buy Uber stocks.
Calculated visa's per year stats wrong. Chatter was right, you missed a factor of 60. 40,000 requests per second is 1.2 trillion requests per year.
Some talented intermediate: "Let's write our own DB, it will be awesome! In and out, 15 minute adventure!"
_5 years later:_
1:42 I'm never going to complain about any programmer that falls for NIH, its our jobs to create software, that's what we do, though luck for the salesman trying to sell crap we could have done in-house. But one might say its not good for the company, yes, who cares.
Uber 2024 revenue: $38.589B. And they will be saving $6M / year. Sweet.
In adding up all the engineers and efforts and such at Uber, nobody commented that DynamoDB sounds like it had no problems (it sounds like Uber's only gripe was the cost). In other words a 2012-era java-based solution designed to solve this problem at a larger scale in an efficient manner was humming along... And not just humming along with one teeny tiny little Uber's snail fart worth of data, but tens to hundreds of thousands of Ubers worth, with plenty of capacity for more to jump on board at a moments notice, ready for anyone with a high enough limit on their credit card... :P
Sweet looks like my comments no longer show that’s good. Now I can actually talk. The curations of the philosopher so strange how it occurs
That's interesting.. I wonder if it would be more efficient to let the indexes consume the log and update themselves; at least the ones not providing constraints. A request can lock the required indexes at the latest sequence; so they would all need to catch up and snapshot at that point before the request could be fulfilled. IDK; interesting stuff.
I need to go back in read the Tango white paper. Can't recall if they discussed indexes in there.
To put it in perspective, total operating costs of Uber are over $33b. Not sure how they counted that $6m/year and how much money they spent getting there, but it just doesn't sound like a huge success money wise so maybe it's an operationally significant improvement? Fun project for devs though, that's for sure
For $6m a year you can probably hire half IT workforce in some countries and just 8 people in us? In Poland you could hire 100 senior engineers for $6m.
Rittz at interview “so you rap” ya I’m real good at it too… “next”
but what is the point to have an immutable db in single POA? they can modify it anyway.
I think it was implemented in Go?
I'm not a data engineer but I wander couldn't they just setup a hadoop cluster for this?
Uber somehow manages to teeter between losing money and being barely profitable while taking half of the money from drivers. It's the engineers!
All this talk about how 8M dollars would hire Uber like 8 engineers, but you could hire a small army of them in India or Balkans or Eastern Europe for this kind of money
And get garbage code that doesn't work and would require those 8 engineers to make it work again
@@erosnemesis Not at all true. That's only the case with shitty service companies.
@@erosnemesis everybody makes garbage code, and every code devolves into garbage with time. They just deliver garbage faster.
@@erosnemesis it took them 40 engineers to reinvent the Database. So i'm pretty sure those overpaid Uber devs aren't Jesus either
Squeaking mic arm: The name.... Is the squeakagen!
or they could use s3 glacier deep archive dunno
Stay awesome Prime! Based prime!
Seems like they re using hyperledger Fabric
Bro, when netflix and uber did a lot of their decisions, most off the shelve solutions werent nearly as mature yet or an obvious choice
6 mill is nothing for uber tho, gojek pays upwards of 40 mill in maps
If they stopped at "We saved 6 million dollars by moving off DynamoDB", that would have been enough.
why not just use quickbooks?
should i go with java stack for backend or mern, about to finish js. what do you advice chat?
GOTH stack
I guess it depends on your country, but In mine, Poland, Java is #1 lanugage server. There are always jobs for java devs, they are stable and pay very well. Modern java is kinda nice to write, very simillar to the typescript - Spring Boot is a complex being, but standarization which comes with it makes it easier to go into existing big project and understand what's going on. And the dealbreaker for long running projects - while ts ecosystem changes all the time, Java is very, very stable.
@@ooijaz6063 oh and is it too hard to learn? its harder then mern and mern takes way less time to learn. i just wanna know is it too hard?
If you want to use java with frontend framework, you can use spring boot openapi project and all generate from them types using openapi-ts - i maybe even like it more than TRPC, because it still gives you the full type safety and there is a strong separation between frontend (public code) and backend (private code). Also, you don't have to go hard on OOP - in my carrier I've seen crazy OOP projects are they are always hard to work with - you can in java as well write simple, mostly procedural code with a few functional features such are amazing java streams api. This leads to clean (in a meaning of easy to understand, work with) code.
@@ooijaz6063 Hmm thank you very much for the advice, i was also considering java to be a better option because if needed its easy to go from java to mern then vice versa.
Back in the day IBM wasn't interested in saving 6 million... what, too soon?
Nah man financial databases are dog shit. You're just going to get an ERP with a thick client built on top of a SQL database with 9000 tables. And it will store all the ERP code in the database.
S3, the most cost effective, stable, scalable AWS service… rolled their own. K
INDENT is a typo. it's intent.
Visa is 4k/second at peak
Hi ThePrimeagen, I have been coding for 3 years and really passionated about that area like building optimized and modern applications. I have started looking DSA from scratch and watching your DSA course. In that point, I am afraid of AI. Should I still improve myself about programming or not ? I earn money from that area
'peta' is 10^15, or 1,000,000,000,000,000. If you had 300,000,000 processes acting in parallel at one task per second non-stop around the clock, it would take ~38.5 days to go through 10^15 tasks. If you had 500,000,000 processes acting in parallel you could go through 10^15 tasks in ~33 minutes.
I think that's right...
NOTE: Thanks to @LtdJorge for pointing out that my math was off by a lot...500M reduces the ~38.5 days to ~23.3 days. Peer review rocks!!
That makes the 10M problem seem like a joke
How does going from 300k to 500k workers reduce the time from 38 days to 33 minutes?
@@LtdJorge you're right, I didn't pay attention, it goes from 38 days to 23 days
@@LtdJorge Thank you for checking me! This is what I get when quickly posting without dbl-checking my own math. I'll edit and add a note....
I definitely don't trust Uber. That number is way too high. According to my math, they'd be saving $200,000 AT BEST
Could someone please tell me this individual's name?
they should focus on thier GPS which sks beyond.. from users to users....f
SQL ... squirrel ... squirrel-lite ...
Why people dont say S Q L ?
Shortcuts(?) are not meant to build a word.
EA = eay?
PHP = pehpe?
JS = jayes ?
Why?
Because we can.
So now Uber is into DLTs
But instead of using an existing open technology they make THEIR OWN DAMN THING ???????
Shoulda just used COBOL.
Hum, Uber has a problem (new competing rideshare companies)
Save 6 millions ??
Promotion Driven Architecture.
what is your name tho
wtf is tigerbeetle.. just learnd some sht #thx
no a gen ;(
Why not just read and react to the original article, made by Uber, which has all the background info, explaining why it made sense, and what the wanted to accomplish? 🤷♂
# NASA 4 president
AGEN
Please wate I have examples “where” on
my phone but I need 10k to have apple recover the data “next” sorry if that’s too much for societies Delicate sensitivities
As soon as you said "Uber", I wasn't trusting it.
Your math is bogus - you did requests per second but used 60minutes instead of 60seconds in your calculations.
idk how much did they save when that cab drive self immolated at city hall
.......lol.. ledgers.. theyre focusd on the $ but not the core val? yeah ..why i use Lyft.. f uber
If each car literally had its own DB (Not a partition) one could then 'sync' encrypted SQLite files to the blockchain. Maybe use another DB to centralize only relevant data for book keeping etc, but blockchain would be able to preserve all the data a car would record (Locations, hair color, irises of customers, their conversations etc.). LOL. You could then buy and trade 'cars' or micros with all the data.
First hook up the return #+- rolled up indo stick to burn- I’m about to press the point ,ant, no need to ,act concerned - filling to find some smoke; you could ,say I’m, on the burn- if you don’t know ,what, mean then, you really odds learn/ #another… my altitude like living in Iraq, fucking with me you gonna get that whole stack
all this just to save 6 mill?
This guy is ranting about a solution that he has no clue what the problem and constraints are. TH-camrs at its finest.
Is this a dog whistle?
garbage company, garbage service
in...hyper.. lollolol
Don’t have a nap day have a sponge day/ be it nap time or sponge time- phone died had to put on paper. c around 😅
Um what’s the data density of paper per square inch 😅 can dream sure can dream
I’m looking for a calculator is anyone a calculator here 🤚
I amuse myself
Put your phone down and immediately make your way to a hospital. You're currently experiencing a head injury.
@@alexandrep4913 nope dropped out the pussy like this
DDD - Dijkstra Driven Development
#HaloMentioned 😇
Who do I catch… prob both 🦾 strong arming it’s really weird that. Prospect alarms