Thanks. that was fast and quite easy to uderstand. But if you would put cross links to your other videos like about Glue this would become even greater!
line 3:5: mismatched input 'SYSTEM_TIME'. Expecting: 'TIMESTAMP', 'VERSION' I'm getting this error while running the timestamp querry. can you please tell me why?
Great video! Just a heads up that the timestamps are in UTC, so most of us will have to do the offset calculation (5 hours ahead for EST during daylight savings). Maybe there's an easier way to specify that. Also, I'm really curious about the distinction between avro and parquet. I noticed that avro files were used in the metadata but parquet were used for the data. I heard Iceberg can accept avro and was wondering if there are advantages to only using avro.
After populating the iceberg table, at 18:10, why it creates a folder with random chars before each partition folder? I'd like to have the partitions folders right after the data folder
Johnny the speed comes from partition by column we use while creating? Like if I used a different column insyead of date and and used the date related queries , will it still be faster or not?
Hi all, when creating iceberg table in Athena , I get " Exception encountered when executing query, this query ran against ...... database, unless qualified by the query . please post the error message on our forum ....., anyone know the solution ?
Great intro to Iceberg, Johnny. Quick question, as well as delete can it support Truncate? Deletes are fine for a relatively small number of rows (in traditional DBMS's this is also true) but on millions of rows, Delete takes forever compared with Truncate. With Iceberg updating all those Manifests as it's deleting each row, would that not also be bit of a bottleneck, or is that offset somewhat by the compute resources of AWS?
Can you write me a snippet of code the moves an iceberg column to a different column position? I cannot for the life of me get it to work based on the AWS documention. Thanks. Tried several variants similar to: ALTER TABLE database.table_name CHANGE field1 string AFTER field2
This seems really really slow. Painfully slow. 6.6seconds for 5MB. ~15 minutes per GB. Is this the real performance?? Currently with Athena we can scan tens GB of data in a couple of seconds.
Best tutorial ever on iceberg and aws services. Thank you very much for that.
Great job Johnny! Im excited about the potential of Iceberg on AWS too!
Great vid. Please make one for Hudi.
Would be nice for hudi
Added to the list!
Iceberg table is suitable for transformed layer or curated layer data rather than implementing it for raw data layer, am I right?
U are the AWS GOAT!!
What a fantastic video. Great learning :)
Nice tutorial! I love how you share your knowledge! Thanks!
great explanations! love your videos!!! thanks! 🙂
Love it! Looking forward to more Apache Iceberg. Maybe in connection with Dremio
Love your vids, really appreciate the work you do!
Thanks. that was fast and quite easy to uderstand. But if you would put cross links to your other videos like about Glue this would become even greater!
Around 26 minutes after you queried the deleted data it said it scanned 5.76MB. That seems like a lot for just metadata!
line 3:5: mismatched input 'SYSTEM_TIME'. Expecting: 'TIMESTAMP', 'VERSION'
I'm getting this error while running the timestamp querry. can you please tell me why?
Great video! Just a heads up that the timestamps are in UTC, so most of us will have to do the offset calculation (5 hours ahead for EST during daylight savings). Maybe there's an easier way to specify that.
Also, I'm really curious about the distinction between avro and parquet. I noticed that avro files were used in the metadata but parquet were used for the data. I heard Iceberg can accept avro and was wondering if there are advantages to only using avro.
Great video ! Would be great one using streaming from kinesis to iceberg. Like kinesis +EMR + glue catalog + iceberg
so useful!
is there any way that it wont create random prefixes while inserting the partitioned data at @18:10?
You are amazing❤
After populating the iceberg table, at 18:10, why it creates a folder with random chars before each partition folder? I'd like to have the partitions folders right after the data folder
Ideally, you should not have to deal with this yourself. The idea of iceberg is that it handles things like that for you.
Johnny the speed comes from partition by column we use while creating? Like if I used a different column insyead of date and and used the date related queries , will it still be faster or not?
I failed to create nested y/m/d partition for iceberg table in Athena, how to accomplish this?
For a very large dataset (like around 15 billion rows overall) is it going to give good performance if we use iceberg to select/delete/update ?
Can we create an iceberg table to S3 using multi region access point?
After running the SQL delete, iceberg stills query with the time travel feature?
Yes, the snapshots are still present.
Hi all, when creating iceberg table in Athena , I get " Exception encountered when executing query, this query ran against ...... database, unless qualified by the query . please post the error message on our forum ....., anyone know the solution ?
thank you
Hello Johnny chivers. Is there a way to create iceberg table with existing metadata and data using Athena or Glue?
Great intro to Iceberg, Johnny. Quick question, as well as delete can it support Truncate? Deletes are fine for a relatively small number of rows (in traditional DBMS's this is also true) but on millions of rows, Delete takes forever compared with Truncate. With Iceberg updating all those Manifests as it's deleting each row, would that not also be bit of a bottleneck, or is that offset somewhat by the compute resources of AWS?
Can you write me a snippet of code the moves an iceberg column to a different column position? I cannot for the life of me get it to work based on the AWS documention. Thanks.
Tried several variants similar to:
ALTER TABLE database.table_name CHANGE field1 string AFTER field2
ALTER TABLE database.table_name CHANGE field1 field1 string AFTER field2
This seems really really slow. Painfully slow.
6.6seconds for 5MB. ~15 minutes per GB.
Is this the real performance??
Currently with Athena we can scan tens GB of data in a couple of seconds.