Google SWE teaches systems design | EP44: Apache Parquet

แชร์
ฝัง
  • เผยแพร่เมื่อ 11 ก.ย. 2024

ความคิดเห็น • 9

  • @higiniofuentes2551
    @higiniofuentes2551 หลายเดือนก่อน +1

    Thank you for this very useful video!

  • @sibasisbhattacharjee2212
    @sibasisbhattacharjee2212 6 หลายเดือนก่อน +1

    One question. Please help me understand:
    Since HFiles are stored in column oriented format. When we retrieve them using Avro serialization, does Avro work in row based manner here since it deals with an abstraction over hfiles and not the actual file stored? Is that the reasoning?

    • @jordanhasnolife5163
      @jordanhasnolife5163  6 หลายเดือนก่อน +1

      I'm not quite sure what you mean here, but the gist is you can pass in avro serialized data to in turn create parquet files, and then from there use the same writer schema to back out objects from it!

    • @sibasisbhattacharjee2212
      @sibasisbhattacharjee2212 6 หลายเดือนก่อน

      @@jordanhasnolife5163 thanks. I get it now.

    • @higiniofuentes2551
      @higiniofuentes2551 หลายเดือนก่อน

      The big part of data are stored in databases, while carrying around these data in files: csv, xml, json, mainly. Is not better make a "standard" type in where the size is optimized, the access speed is optimized and the special use: search is optimized too?
      Thank you!

  • @raj_kundalia
    @raj_kundalia 10 หลายเดือนก่อน +1

    this is cool, thank you!

  • @steephengeorge
    @steephengeorge 5 หลายเดือนก่อน +1

    cool videos. Could you please include one video for Arrow.

    • @jordanhasnolife5163
      @jordanhasnolife5163  5 หลายเดือนก่อน +1

      Huh, interesting idea. I would like to get to this at some point, but basically it seems like it's a way to operate directly on parquet data in memory without having to deserialize it. Will be a while before I do this but perhaps one day!

  • @user-ur2en1zq4f
    @user-ur2en1zq4f ปีที่แล้ว +1

    super