I hope your students are going back with these pointers and delving into the official documentation. This video could have been better if it explained why parquet is better with nested for instance; or why ORC works better for hive! Also, on your point regarding compression - If you referring to a comparison conducted by hortonworks; it is very biased towards ORC and does not provide an overall picture. I, personally would not select candidates who are telling me these differences for the sake of it - if they cannot back up their claims!
Nested data means data inside data Suppose u have created country based folders Then inside country based folders Your data is stored based on bucketing column
similariy- orc parq both are col based format diff: 1. o : takes less storage compared to parq 2. p: better with nested in terms of storage 3. o: suitable for hive p: suitable for spark 4. p: more generic in nature o: specific designed for hive; but getting matured to be more generic
I hope your students are going back with these pointers and delving into the official documentation. This video could have been better if it explained why parquet is better with nested for instance; or why ORC works better for hive! Also, on your point regarding compression - If you referring to a comparison conducted by hortonworks; it is very biased towards ORC and does not provide an overall picture. I, personally would not select candidates who are telling me these differences for the sake of it - if they cannot back up their claims!
Then give the full answer instead of writing this comment..
So people will get help from your comment.
Perfectly explained..... Thank you sir very very much... Please explain accumulators and Broadcast variables
Thanks sumit for perfect explanation. How parquet performs better for nested data, i mean any reason behind this? Can you please help?
Yes even i want to know sir,, please tell sir..you explanation is perfect and to the point
wonderful explanation :)
Very good explanation 👍
Hi bro...request you to do a video on common hive errors in real-time please...Thanks in advance
Nice
what is nested data? what do you mean by it?
Nested data means data inside data
Suppose u have created country based folders
Then inside country based folders
Your data is stored based on bucketing column
check example of JSON file
similariy-
orc parq
both are col based format
diff:
1.
o : takes less storage compared to parq
2.
p: better with nested in terms of storage
3.
o: suitable for hive
p: suitable for spark
4.
p: more generic in nature
o: specific designed for hive; but getting matured to be more generic