Apache Carbondata: An Indexed Columnar File Format for Interactive Query by Jacky Li/Jihong Ma

แชร์
ฝัง
  • เผยแพร่เมื่อ 19 ต.ค. 2024
  • Realtime analytics over large datasets has become an increasing wide-spread demand, over the past several years, Hadoop ecosystem has been continuously evolving, even complex queries over large datasets can be realized in an interactive fashion with distributed processing framework like Apache Spark, new paradigm of efficient storage were introduced as well to facilitate data processing framework, such as Apache Parquet, ORC provide fast scan over columnar data format, and Apache Hbase offers fast ingest and millisecond scale random access.
    In this talk, we will outline Apache Carbondata, a new addition to open source Hadoop ecosystem which is an indexed columnar file format aimed for bridging the gap to fully enable real-time analytics abilities. It has been deeply integrated with Spark SQL and enables dramatic acceleration of query processing by leveraging efficient encoding/compression and effective predicate push down through Carbondata’s multi-level index technique.

ความคิดเห็น • 3

  • @parneetisood2728
    @parneetisood2728 5 ปีที่แล้ว

    what is MDK ?? If anyone can help...

  • @AviralSrivastava2809
    @AviralSrivastava2809 5 ปีที่แล้ว

    Highly unclear what she is speaking. Even the annotations could not help.

  • @amitbaderia4194
    @amitbaderia4194 5 ปีที่แล้ว

    She is not a good speaker. Explanation is not clear