How Meta Models Big Volume Event Data - Full 4 Hour Course - DataExpert.io Free Boot Camp Week 2

แชร์
ฝัง
  • เผยแพร่เมื่อ 17 ธ.ค. 2024

ความคิดเห็น • 80

  • @zaraf-zk5rp
    @zaraf-zk5rp 24 วันที่ผ่านมา +9

    Thanks for this Zach! Kudos for sticking to your promise of releasing this one big consolidated video on Saturday.

    • @EcZachly_
      @EcZachly_  24 วันที่ผ่านมา +4

      Wish it was a little longer but had to pull some of the content since it wasn't quality enough to release

  • @adilbekbulatov1713
    @adilbekbulatov1713 18 วันที่ผ่านมา +4

    Just finished watching week 2 video and doing Labs! Thank you for a content, Zach! It's so informative and helpful even with my little experience in Postgres and Window functions with CTE

  • @elrodlej
    @elrodlej 16 วันที่ผ่านมา +3

    Thanks for the content! I applied some of these concepts in my job and it actually made a difference, excited to get to the spark section of the bootcamp!

  • @dadoll1660
    @dadoll1660 23 วันที่ผ่านมา +4

    solid 4 hrs Zach!!! love it!

  • @Moezm3007
    @Moezm3007 23 วันที่ผ่านมา +2

    Thanks Zack ! Your effort is clear with the content you produce

  • @ACarolinaLedezmaCarrizalez
    @ACarolinaLedezmaCarrizalez 16 วันที่ผ่านมา +1

    In my dreams I can already hear Zach's voice talking about Data, I am fascinated by the intense and amazing course.

  • @MathiasEngineer
    @MathiasEngineer 21 วันที่ผ่านมา +1

    Thank you, this makes intuitive sense make all data as compressed as possible using built-in techniques, I am a fan of the logical AND gate, pretty neat!

  • @HuyNguyen-wc9kh
    @HuyNguyen-wc9kh 16 วันที่ผ่านมา +1

    Reporting in from Australia. Also "for the algorithm" to know this should be higher in the Trendy chart

  • @Guysthar
    @Guysthar 20 วันที่ผ่านมา +1

    Thank you Zach. I've carefully watched and listened to this course. It was amazing. It's also inspiring to see how much you can give to the community. I really hope that the business you are generating is rewarding too

  • @nrminhuseynli4183
    @nrminhuseynli4183 24 วันที่ผ่านมา +2

    Your punctuality is awesome. Thank you, teacher

    • @EcZachly_
      @EcZachly_  24 วันที่ผ่านมา +4

      It was a lot to get this out today but I'm glad we pulled it off!

  • @johnnote7
    @johnnote7 10 วันที่ผ่านมา +1

    It's great, Zach! I just learned something new from you.

  • @muhammadzakiahmad8069
    @muhammadzakiahmad8069 21 วันที่ผ่านมา +1

    I never imagined my Digital Logic Design Knowledge is going to help me understand this lecture🙂.

  • @ananyvaishnav
    @ananyvaishnav 24 วันที่ผ่านมา +1

    This is best, Zach. All materials are consolidated in one video👍

  • @MrMe77-u4y
    @MrMe77-u4y 24 วันที่ผ่านมา +3

    Thank you, I am "slowly changing" my catching up status!

  • @sauravsinghsisodiya8627
    @sauravsinghsisodiya8627 23 วันที่ผ่านมา +1

    Blessed to find this. Thanks, Zack, for leveling me up

  • @thesungodzzz
    @thesungodzzz 19 วันที่ผ่านมา +1

    very helpful. this video course should be top insightful for learning Data Engineer. Thanks Zach!

  • @mosindichidozie4584
    @mosindichidozie4584 23 วันที่ผ่านมา +2

    All the way from nigeria, God bless you sir.😂😂😂😂

  • @madudka777
    @madudka777 21 วันที่ผ่านมา +1

    It's real interesting and usefull.
    Thank you, Zach!
    Keep up the good work!

  • @top10tweets
    @top10tweets 17 วันที่ผ่านมา +1

    Data is the gold of the 21st century

  • @kpicsoffice4246
    @kpicsoffice4246 23 วันที่ผ่านมา +5

    Come on Zach. Listen to the masses. We need a Black Friday discount for dataexpert subscription pls!!

    • @EcZachly_
      @EcZachly_  23 วันที่ผ่านมา +2

      Coming soon

  • @Sfgarcia
    @Sfgarcia 20 วันที่ผ่านมา +1

    Hey zach, thanks for another great video! When do you use the prefix dim for column names in your tables and when don't you use it? I didn't find that convention clear from the video

  • @jeffersonmedeiros1830
    @jeffersonmedeiros1830 23 วันที่ผ่านมา +2

    Thank you man! One more subscribe from Brazil!

  • @RonyMoralesm
    @RonyMoralesm 24 วันที่ผ่านมา +1

    Great job Zach !!! Keen to binge ❤❤

  • @mayravaldes89
    @mayravaldes89 23 วันที่ผ่านมา +2

    This is amazing content. Thank you!!

  • @tiendatphan1740
    @tiendatphan1740 13 วันที่ผ่านมา +1

    Thanks for great insights. I just wonder what are the best practices for maintaining those large volume tables. The table is supposed to be updated, with more data to come regularly. Do you compact your table regularly to avoid fragmentation, given that compaction is one of the most costly actions? Do you have to adjust the Spark config frequently when the table size increases (scaling up cluster, increase the max bytes per partition, etc)?
    Besides, I am really curious about the Data governance in cooperation of such Meta's scale. I bet they have to have access control, data protection (column masking, row filtering, etc.). More importantly, for data lifecycle, data subject right management (user request to access their specific data, or to deleted all their data have been collected, etc.) would be one of the most complicated topics to add into the data modeling at large scale.

    • @EcZachly_
      @EcZachly_  13 วันที่ผ่านมา

      Daily partitioning fam

    • @tiendatphan1740
      @tiendatphan1740 12 วันที่ผ่านมา

      @@EcZachly_ That is what I expect. But it comes with its own problem about performance issue when the regular queries address other columns.

  • @mateofleitas4536
    @mateofleitas4536 20 วันที่ผ่านมา +1

    Thanks for this awesome knowledge!

  • @ansonnn_
    @ansonnn_ 4 วันที่ผ่านมา

    50:45 For hourly deduping with microbatch, according to what you said, it only dedupes for the day, sadly it doesn’t compare and dedupe with yesterday’s data then? Deduping for the day only seems like still not ideal for certain snapshots where we are only taking the latest or earliest user_id row for example

  • @BI-Rahul
    @BI-Rahul 17 วันที่ผ่านมา +1

    At 40:00, Is bucketing same as partitioning?

    • @EcZachly_
      @EcZachly_  17 วันที่ผ่านมา +1

      Nope. Partitioning is putting things in a folder. Bucketing is making new files within a folder

  • @alessiotucci0
    @alessiotucci0 24 วันที่ผ่านมา +2

    hey Zach if you add a 0:00 in the time stamp, the chapter will appear 👍🏻

    • @EcZachly_
      @EcZachly_  24 วันที่ผ่านมา +2

      Thanks for the tip!

  • @lawalexlaw
    @lawalexlaw 5 วันที่ผ่านมา

    Thanks for the great course Zach. In the Day3 lecture Facebook long period analysis example that you gave, did data engineers come up with these kind of analysis? Or it was the data scientists? And the engineers did the implementation with optimisation?

  • @arunattnj
    @arunattnj 24 วันที่ผ่านมา +3

    How's this free? Fantastic course when I was trying to refresh on fact data modelling. Thank you.

    • @EcZachly_
      @EcZachly_  24 วันที่ผ่านมา +2

      It’s coming down on January 31st

  • @AkshatAsthana-p1c
    @AkshatAsthana-p1c 16 วันที่ผ่านมา +1

    why should we name the columns starting with "dim" in the fact tables?

  • @ThilinaKariyawasam
    @ThilinaKariyawasam 10 วันที่ผ่านมา

    @EcZachly_ , isn't that CROSS JOIN you are doing to make date list of bits an expensive operation, which bloats the dataset by 31 times? Could you please explain on that?

    • @EcZachly_
      @EcZachly_  10 วันที่ผ่านมา +1

      CROSS JOINs are only expensive when you match every row to every other row. In this case, it does that but there’s

    • @ThilinaKariyawasam
      @ThilinaKariyawasam 10 วันที่ผ่านมา

      @@EcZachly_ Thanks for the quick reply. doesn't that mean for an example 1 billon record table will ended up with 31 billion rows? Is that a concern or we can neglect compared to the size of the data?

    • @EcZachly_
      @EcZachly_  10 วันที่ผ่านมา +1

      @@ThilinaKariyawasam the original activity data would be 31 billion rows right?

  • @retenim28
    @retenim28 9 วันที่ผ่านมา +1

    can someone explain me the difference between a fact table and an ordinary table in an OLTP database? As far as i understand, both contains fact or events, but fact table is built for analytical purpose while transactional db is for transactional data. So, how maybe a fact table is built from transactional db with some etl processes? sorry i just got confused

    • @EcZachly_
      @EcZachly_  9 วันที่ผ่านมา

      Fact tables are optimized for “whole table analysis”
      OLTP transactions are meant for “single user” analysis.
      Think about the WHERE clause here

  • @domelorinczy2674
    @domelorinczy2674 10 วันที่ผ่านมา

    why was broadcast join with IPv6 not possible anymore? did the size of the to-be-broadcasted dataset change as well? I do not get why the change only inhibits the broadcast. Thanks for the explanation!

    • @EcZachly_
      @EcZachly_  10 วันที่ผ่านมา

      Ipv4 search space is DRAMATICALLY smaller than ipv6.
      There are 4 billion ipv4 ip addresses and. There are 340,282,366,920,938,463,463,374,607,431,768,211,456 ipv6 addresses.
      You can compress an ipv4 address into a trie data structure. You cannot do the same for ipv6

    • @chandarayi5673
      @chandarayi5673 4 วันที่ผ่านมา +1

      I think changing from IPv4 to IPv6 would increase the length of the data type, therefore dataset is bigger?

    • @EcZachly_
      @EcZachly_  วันที่ผ่านมา

      @@domelorinczy2674 yes. IPv6 search space is much bigger

  • @iansaura7777
    @iansaura7777 21 วันที่ผ่านมา +1

    This is gold

  • @youssefboulazrag4363
    @youssefboulazrag4363 24 วันที่ผ่านมา +1

    Thank you zach ❤

  • @_Nova_Prime_
    @_Nova_Prime_ 23 วันที่ผ่านมา +2

    quality content

  • @Teja-b7m
    @Teja-b7m 6 วันที่ผ่านมา

    @zach - Would you mind breaking down the lengthy video into multiple small videos? I think it would make it easier to watch and help increase your viewership as well.

    • @EcZachly_
      @EcZachly_  2 วันที่ผ่านมา

      No

    • @EcZachly_
      @EcZachly_  2 วันที่ผ่านมา

      There are timestamps at the bottom. Longer videos are better

  • @cherryblosoom898
    @cherryblosoom898 22 วันที่ผ่านมา

    Off topic but I have heard quantum computing will replace these languages. So should I start with quantum computing or with you're training

    • @EcZachly_
      @EcZachly_  22 วันที่ผ่านมา

      Who told you that?

    • @cherryblosoom898
      @cherryblosoom898 19 วันที่ผ่านมา

      @@EcZachly_ Saw news somewhere!!

  • @lucidboy9436
    @lucidboy9436 23 วันที่ผ่านมา

    Is this the fourth playlist video or is this the week the 1st week 2 video and the rest videos in the playlist are from week one?

    • @EcZachly_
      @EcZachly_  23 วันที่ผ่านมา

      Correct. This is week 2 all at once. The rest are week 1.

  • @saiananth5857
    @saiananth5857 22 วันที่ผ่านมา

    @EcZachly If I complete the homeworks after 1 month , will i get certificate or do i need to complete by dec 31st

    • @EcZachly_
      @EcZachly_  22 วันที่ผ่านมา +2

      You need to complete by January 31st not December 31st

  • @Daily_Code_Challenge
    @Daily_Code_Challenge 23 วันที่ผ่านมา +1

    Hi Zach, i am from India and i have a question that after completing bootcamp and creating project how i apply for summer internship 2025 .To whom should i contact because no internship anywhere even in LinkedIn.How should i start ?

    • @EcZachly_
      @EcZachly_  23 วันที่ผ่านมา

      Ask in discord plz

  • @kushkl2k6
    @kushkl2k6 19 วันที่ผ่านมา +1

    brilliant

  • @top10tweets
    @top10tweets 17 วันที่ผ่านมา

    what do you think about elon not sharing twitter data. Retrieve up to 1M Posts per month costs $5000/month

  • @jaisahota4062
    @jaisahota4062 24 วันที่ผ่านมา +1

    thanks

  • @aminetaoufik3244
    @aminetaoufik3244 23 วันที่ผ่านมา +1

    thank u

  • @VinayKumar-qv9tq
    @VinayKumar-qv9tq 12 วันที่ผ่านมา +1

    NWT- Not with the Team.
    DNP- Did not play
    DND- Did not dress

  • @VinayKumar-qv9tq
    @VinayKumar-qv9tq 7 วันที่ผ่านมา +1

    Learnt somthing useful that I don't know

  • @YogeshSunilSaswade
    @YogeshSunilSaswade 23 วันที่ผ่านมา

    Spark does not support writing data to a Hive bucketed table because it uses a different hash function. Additionally, it does not provide any major performance benefits when joining two Hive bucketed tables on the bucket key. How did you solve this problem?

    • @EcZachly_
      @EcZachly_  23 วันที่ผ่านมา +1

      Meta allows Spark to use Hives hash function. You can override these things

    • @EcZachly_
      @EcZachly_  23 วันที่ผ่านมา

      Yes it does provide tons of benefits. No shuffle is a huge deal. It has to be configured properly though

    • @YogeshSunilSaswade
      @YogeshSunilSaswade 23 วันที่ผ่านมา

      @@EcZachly_ I agree, but what I meant was that the open-source Spark 3.3 currently does not utilise the advantages of Hive bucketed table and instead treat it as a regular Hive table during read or join operations.

  • @TalkingData247
    @TalkingData247 24 วันที่ผ่านมา +1

    second comment from Kenya

  • @lucidboy9436
    @lucidboy9436 23 วันที่ผ่านมา +1

    W

  • @AugustineNguyenLe
    @AugustineNguyenLe 24 วันที่ผ่านมา +3

    first cmt

    • @TalkingData247
      @TalkingData247 24 วันที่ผ่านมา +1

      lets do this folk