ORC vs Parquet file format | Hive Interview questions and answers | Session 2 - Trendytech

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ม.ค. 2025

ความคิดเห็น • 13

  • @hellohelloronin
    @hellohelloronin 3 ปีที่แล้ว +7

    I hope your students are going back with these pointers and delving into the official documentation. This video could have been better if it explained why parquet is better with nested for instance; or why ORC works better for hive! Also, on your point regarding compression - If you referring to a comparison conducted by hortonworks; it is very biased towards ORC and does not provide an overall picture. I, personally would not select candidates who are telling me these differences for the sake of it - if they cannot back up their claims!

    • @amanahmed6057
      @amanahmed6057 2 ปีที่แล้ว +4

      Then give the full answer instead of writing this comment..
      So people will get help from your comment.

  • @MrManish389
    @MrManish389 4 ปีที่แล้ว

    Perfectly explained..... Thank you sir very very much... Please explain accumulators and Broadcast variables

  • @mayankmalhotra4672
    @mayankmalhotra4672 4 ปีที่แล้ว +3

    Thanks sumit for perfect explanation. How parquet performs better for nested data, i mean any reason behind this? Can you please help?

    • @MrManish389
      @MrManish389 4 ปีที่แล้ว

      Yes even i want to know sir,, please tell sir..you explanation is perfect and to the point

  • @deepikakumari5369
    @deepikakumari5369 4 ปีที่แล้ว

    wonderful explanation :)

  • @19967747
    @19967747 4 ปีที่แล้ว +1

    Very good explanation 👍

  • @mahesh.h1b339
    @mahesh.h1b339 3 ปีที่แล้ว

    Hi bro...request you to do a video on common hive errors in real-time please...Thanks in advance

  • @vikashlal5290
    @vikashlal5290 4 ปีที่แล้ว

    Nice

  • @joerokcz
    @joerokcz 4 ปีที่แล้ว

    what is nested data? what do you mean by it?

    • @vishalbhardwaj969
      @vishalbhardwaj969 3 ปีที่แล้ว

      Nested data means data inside data
      Suppose u have created country based folders
      Then inside country based folders
      Your data is stored based on bucketing column

    • @amanahmed6057
      @amanahmed6057 2 ปีที่แล้ว

      check example of JSON file

  • @nandakishore5198
    @nandakishore5198 4 ปีที่แล้ว

    similariy-
    orc parq
    both are col based format
    diff:
    1.
    o : takes less storage compared to parq
    2.
    p: better with nested in terms of storage
    3.
    o: suitable for hive
    p: suitable for spark
    4.
    p: more generic in nature
    o: specific designed for hive; but getting matured to be more generic