AWS re:Invent 2018: How AWS Minimizes the Blast Radius of Failures (ARC338)
ฝัง
- เผยแพร่เมื่อ 6 ต.ค. 2024
- At AWS, we obsess over operational excellence. We have a deep understanding of system availability, informed by over a decade of experience operating the cloud and our roots of operating Amazon.com for nearly a quarter-century. One thing we've learned is that failures come in many forms, some expected, and some unexpected. It's vital to build from the ground up and embrace failure. A core consideration is how to minimize the "blast radius" of any failures. In this talk, we discuss a range of blast radius reduction design techniques that we employ, including cell-based architecture, shuffle-sharding, availability zone independence, and region isolation. We also discuss how blast radius reduction infuses our operational practices.
cell based architecture starts at 21:07
Absolutely clear overview of AWS blast radius. Very helpful abstract for reviewing for SAA exam.
If you want to watch and learn more about Shuffle Sharding start at 32:55
How do you handle database consistency if you have completely decoupled database in each cell?
revisit after the incident...
The talk was too slow and made me watch it yawning a bit, but the content is really nice.
Set playback speed to 1.5+ and enjoy
Kinesis :)
This video hasn't aged well :)
Wow, this is one boring topic. I struggled to stay interested and engaged through the talk
It is tricky. You will like it more if you have experienced scaling pain-points.