Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 พ.ย. 2024

ความคิดเห็น • 34

  • @mchaves7663
    @mchaves7663 หลายเดือนก่อน +1

    Best tutorial ever on iceberg and aws services. Thank you very much for that.

  • @DataEngUncomplicated
    @DataEngUncomplicated 2 ปีที่แล้ว +5

    Great job Johnny! Im excited about the potential of Iceberg on AWS too!

  • @____prajwal____
    @____prajwal____ 2 ปีที่แล้ว +7

    Great vid. Please make one for Hudi.

  • @tieduprightnowprcls
    @tieduprightnowprcls ปีที่แล้ว +4

    Iceberg table is suitable for transformed layer or curated layer data rather than implementing it for raw data layer, am I right?

  • @fancystacy
    @fancystacy 4 หลายเดือนก่อน

    Thanks. that was fast and quite easy to uderstand. But if you would put cross links to your other videos like about Glue this would become even greater!

  • @lucasgambi
    @lucasgambi 11 หลายเดือนก่อน

    U are the AWS GOAT!!

  • @nic-tf5dx
    @nic-tf5dx ปีที่แล้ว

    Love it! Looking forward to more Apache Iceberg. Maybe in connection with Dremio

  • @BenOgorek
    @BenOgorek ปีที่แล้ว +2

    Great video! Just a heads up that the timestamps are in UTC, so most of us will have to do the offset calculation (5 hours ahead for EST during daylight savings). Maybe there's an easier way to specify that.
    Also, I'm really curious about the distinction between avro and parquet. I noticed that avro files were used in the metadata but parquet were used for the data. I heard Iceberg can accept avro and was wondering if there are advantages to only using avro.

  • @flaviolanfranco
    @flaviolanfranco ปีที่แล้ว

    Nice tutorial! I love how you share your knowledge! Thanks!

  • @alecbg919
    @alecbg919 8 หลายเดือนก่อน +1

    Around 26 minutes after you queried the deleted data it said it scanned 5.76MB. That seems like a lot for just metadata!

  • @faingtoku
    @faingtoku ปีที่แล้ว

    Great video ! Would be great one using streaming from kinesis to iceberg. Like kinesis +EMR + glue catalog + iceberg

  • @RahulSinghPatel-st6yb
    @RahulSinghPatel-st6yb 6 หลายเดือนก่อน +1

    line 3:5: mismatched input 'SYSTEM_TIME'. Expecting: 'TIMESTAMP', 'VERSION'
    I'm getting this error while running the timestamp querry. can you please tell me why?

  • @federicomanueldlouky5231
    @federicomanueldlouky5231 ปีที่แล้ว

    great explanations! love your videos!!! thanks! 🙂

  • @swapnilbhoite902
    @swapnilbhoite902 ปีที่แล้ว

    What a fantastic video. Great learning :)

  • @mickyman753
    @mickyman753 5 หลายเดือนก่อน

    Johnny the speed comes from partition by column we use while creating? Like if I used a different column insyead of date and and used the date related queries , will it still be faster or not?

  • @deepg6139
    @deepg6139 6 หลายเดือนก่อน

    For a very large dataset (like around 15 billion rows overall) is it going to give good performance if we use iceberg to select/delete/update ?

  • @terri1258
    @terri1258 4 หลายเดือนก่อน

    so useful!

  • @sungkim1830
    @sungkim1830 ปีที่แล้ว

    Hello Johnny chivers. Is there a way to create iceberg table with existing metadata and data using Athena or Glue?

  • @naveenkumarmurugan1962
    @naveenkumarmurugan1962 5 หลายเดือนก่อน

    thank you

  • @harivigsp7934
    @harivigsp7934 6 หลายเดือนก่อน

    Can we create an iceberg table to S3 using multi region access point?

  • @jesper6988
    @jesper6988 ปีที่แล้ว

    Love your vids, really appreciate the work you do!

  • @danilomenoli
    @danilomenoli 6 หลายเดือนก่อน

    You are amazing❤

  • @HariPrasadEluri
    @HariPrasadEluri ปีที่แล้ว

    is there any way that it wont create random prefixes while inserting the partitioned data at @18:10?

  • @tieduprightnowprcls
    @tieduprightnowprcls ปีที่แล้ว

    I failed to create nested y/m/d partition for iceberg table in Athena, how to accomplish this?

  • @gregf9160
    @gregf9160 ปีที่แล้ว

    Great intro to Iceberg, Johnny. Quick question, as well as delete can it support Truncate? Deletes are fine for a relatively small number of rows (in traditional DBMS's this is also true) but on millions of rows, Delete takes forever compared with Truncate. With Iceberg updating all those Manifests as it's deleting each row, would that not also be bit of a bottleneck, or is that offset somewhat by the compute resources of AWS?

  • @thiagoa1851
    @thiagoa1851 2 ปีที่แล้ว +2

    After running the SQL delete, iceberg stills query with the time travel feature?

    • @JohnnyChivers
      @JohnnyChivers  2 ปีที่แล้ว +1

      Yes, the snapshots are still present.

  • @viewermm1588
    @viewermm1588 5 หลายเดือนก่อน

    Hi all, when creating iceberg table in Athena , I get " Exception encountered when executing query, this query ran against ...... database, unless qualified by the query . please post the error message on our forum ....., anyone know the solution ?

  • @wuerikehenriquedasilvacava928
    @wuerikehenriquedasilvacava928 ปีที่แล้ว

    After populating the iceberg table, at 18:10, why it creates a folder with random chars before each partition folder? I'd like to have the partitions folders right after the data folder

    • @xorlop
      @xorlop 9 หลายเดือนก่อน

      Ideally, you should not have to deal with this yourself. The idea of iceberg is that it handles things like that for you.

  • @jeffschroeder8875
    @jeffschroeder8875 9 หลายเดือนก่อน

    Can you write me a snippet of code the moves an iceberg column to a different column position? I cannot for the life of me get it to work based on the AWS documention. Thanks.
    Tried several variants similar to:
    ALTER TABLE database.table_name CHANGE field1 string AFTER field2

    • @jeffschroeder8875
      @jeffschroeder8875 9 หลายเดือนก่อน +1

      ALTER TABLE database.table_name CHANGE field1 field1 string AFTER field2