Huh, interesting idea. I would like to get to this at some point, but basically it seems like it's a way to operate directly on parquet data in memory without having to deserialize it. Will be a while before I do this but perhaps one day!
One question. Please help me understand: Since HFiles are stored in column oriented format. When we retrieve them using Avro serialization, does Avro work in row based manner here since it deals with an abstraction over hfiles and not the actual file stored? Is that the reasoning?
I'm not quite sure what you mean here, but the gist is you can pass in avro serialized data to in turn create parquet files, and then from there use the same writer schema to back out objects from it!
The big part of data are stored in databases, while carrying around these data in files: csv, xml, json, mainly. Is not better make a "standard" type in where the size is optimized, the access speed is optimized and the special use: search is optimized too? Thank you!
you humor get me every time
Thank you for this very useful video!
cool videos. Could you please include one video for Arrow.
Huh, interesting idea. I would like to get to this at some point, but basically it seems like it's a way to operate directly on parquet data in memory without having to deserialize it. Will be a while before I do this but perhaps one day!
One question. Please help me understand:
Since HFiles are stored in column oriented format. When we retrieve them using Avro serialization, does Avro work in row based manner here since it deals with an abstraction over hfiles and not the actual file stored? Is that the reasoning?
I'm not quite sure what you mean here, but the gist is you can pass in avro serialized data to in turn create parquet files, and then from there use the same writer schema to back out objects from it!
@@jordanhasnolife5163 thanks. I get it now.
The big part of data are stored in databases, while carrying around these data in files: csv, xml, json, mainly. Is not better make a "standard" type in where the size is optimized, the access speed is optimized and the special use: search is optimized too?
Thank you!
this is cool, thank you!
super