Not sure if you actually check the comments, but just in case: Do you have a sense of what the requirements for distributed parallel storage will be in the fifth epoch? Given the criticality of a balanced system as you extolled, I assume that high-performance storage will also have to evolve to balance the improvements in other areas as well. - IO latency in the 10s of microseconds? - Integration of different sorts of storage media (HDD, SSD, persistent memory, etc) into pools within the storage system where individual jobs/cliques can compose just the balance they need of each to balance their overall system? (Eg, a compute job or clique with more small file I/O or low IO latency requirements may benefit from a storage system that includes pmem for its low latency and byte-addressable features that enable it to perform better with that workload, but another job/clique that is primarily large streaming writes may be better balanced with just SSDs.) - An ability to integrate domain-specific optimizations into your data-layout, etc, versus treating all data as a blob and laying it all out the same as is done in today's general purpose storage interfaces (POSIX, S3, etc)? - Flexible data protection and availability mechanisms that would enable you to dynamically specify how many replicas an individual piece of data will need, versus a static definition for the whole system? - Does computational storage have a role to play here? Seems like potentially another form of performance gain that might now be harder to ignore as the growth on the hardware has slowed? Perhaps another form of tighter coupling within the infrastructure?
Big fan of Amin! Amazing talk and if you don't have time for listening to all of it, his closing thoughts starting at 45:49 are a must listen!
tank u that my dady
Very inspiring!
Not sure if you actually check the comments, but just in case:
Do you have a sense of what the requirements for distributed parallel storage will be in the fifth epoch? Given the criticality of a balanced system as you extolled, I assume that high-performance storage will also have to evolve to balance the improvements in other areas as well.
- IO latency in the 10s of microseconds?
- Integration of different sorts of storage media (HDD, SSD, persistent memory, etc) into pools within the storage system where individual jobs/cliques can compose just the balance they need of each to balance their overall system? (Eg, a compute job or clique with more small file I/O or low IO latency requirements may benefit from a storage system that includes pmem for its low latency and byte-addressable features that enable it to perform better with that workload, but another job/clique that is primarily large streaming writes may be better balanced with just SSDs.)
- An ability to integrate domain-specific optimizations into your data-layout, etc, versus treating all data as a blob and laying it all out the same as is done in today's general purpose storage interfaces (POSIX, S3, etc)?
- Flexible data protection and availability mechanisms that would enable you to dynamically specify how many replicas an individual piece of data will need, versus a static definition for the whole system?
- Does computational storage have a role to play here? Seems like potentially another form of performance gain that might now be harder to ignore as the growth on the hardware has slowed? Perhaps another form of tighter coupling within the infrastructure?
It's brilliant!
Visionary.