How to Performance-Tune Apache Spark Applications in Large Clusters

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ต.ค. 2024

ความคิดเห็น • 7

  • @catchritesh2007
    @catchritesh2007 3 ปีที่แล้ว +2

    This is enterprise level explanation which is highly useful. Great work Omkar !!

  • @oldschoolwreak
    @oldschoolwreak 4 ปีที่แล้ว

    Probably the best talk so far citing the real life issues faced and their solutions.

  • @thomsondcruz5456
    @thomsondcruz5456 3 ปีที่แล้ว

    Loved this talk. Just one comment at 8:36 (Referring to example provided of 100 rows) Parquet is not purely columnar. It is actually hybrid, where the rows are divided into RowGroups and each RowGroup is stored in a columnar format. This hybrid format actually helps in row reconstruction. Also, with Apache Delta coming becoming more mainstream (which also uses Parquet but with a commit log) there is little reason to use pure Parquet :)

  • @BuvanAlmighty
    @BuvanAlmighty 3 ปีที่แล้ว

    Very useful ideas from real life scenarios

  • @maheshkolhe1638
    @maheshkolhe1638 28 วันที่ผ่านมา

    how to set columnar compression

  • @MrTigerman6
    @MrTigerman6 4 ปีที่แล้ว

    @omkar thanks for your talk and just to let u know we are facing yarn memory overhead issue with spark 2.4 as well when we are doing spark sql joins

  • @shubhamshingi4657
    @shubhamshingi4657 3 ปีที่แล้ว

    I am new to spark. Can anyone please tell me exactly for which operations 5 stages in left diagram and 2 stages in right diagram are formed?