1. COALESCE: Returns the first non-NULL value from a list of arguments. 2. CARDINALITY: Returns the number of elements in an array. 3. ::: Type cast operator to convert a value from one data type to another. 4. UNNEST: Expands an array into individual rows.
At 23:50, Zach was indeed very close to what he wanted. We don't really need the CTE, the shorter query is: SELECT PLAYER_NAME, (UNNEST(SEASON_STATS)::SEASON_STATS).* FROM PLAYERS WHERE CURRENT_SEASON = 2001 AND PLAYER_NAME = 'Michael Jordan';
This tutorial exploded my brain, love it. I'd never seen `UNNEST` until this video, amazing. I've been working as a junior data engineer for nearly a year, so exciting to see how much more there is to learn with everybody here.
WITH last_season AS ( SELECT * FROM players WHERE current_season = 2000 ), this_season AS ( SELECT * FROM player_seasons WHERE season = 2001 ) INSERT INTO players SELECT COALESCE(ls.player_name, ts.player_name) as player_name, COALESCE(ls.height, ts.height) as height, COALESCE(ls.college, ts.college) as college, COALESCE(ls.country, ts.country) as country, COALESCE(ls.draft_year, ts.draft_year) as draft_year, COALESCE(ls.draft_round, ts.draft_round) as draft_round, COALESCE(ls.draft_number, ts.draft_number) as draft_number, COALESCE(ls.seasons,ARRAY[]::season_stats[]) ||
CASE WHEN ts.season IS NOT NULL THEN ARRAY[ROW(ts.season,ts.pts,ts.ast,ts.reb, ts.weight)::season_stats] ELSE ARRAY[]::season_stats[] END as seasons,
CASE WHEN ts.season IS NOT NULL THEN (CASE WHEN ts.pts > 20 THEN 'star' WHEN ts.pts > 15 THEN 'good' WHEN ts.pts > 10 THEN 'average' ELSE 'bad' END)::scoring_class ELSE ls.scorer_class END as scoring_class,
ts.season IS NOT NULL as is_active, 2001 AS current_season FROM last_season ls FULL OUTER JOIN this_season ts ON ls.player_name = ts.player_name
--------------------------------------------------------------------------------------------------------------------------- SELECT player_name, * FROM players SELECT player_name, UNNEST(seasons) -- CROSS JOIN UNNEST LATERAL VIEW EXPLODE FROM players WHERE current_season = 1998 AND player_name = 'Michael Jordan';
WITH UNNESTED AS( SELECT player_name, UNNEST(seasons)::season_stats AS season_stats FROM players WHERE current_season = 1998 )
SELECT player_name, (season_stats::season_stats).* from UNNESTED
SELECT player_name, (seasons[CARDINALITY(seasons)]::season_stats).pts/ CASE WHEN (seasons[1]::season_stats).pts = 0 THEN 1 ELSE (seasons[1]::season_stats).pts END FROM players WHERE current_season = 2001 AND scorer_class = 'star'
Thought of just setting up postgres on my machine and do the easy way, but thought why not do it zach's way and set it up on docker. But i had zero knowledge on setting up docker. 3 days in with the struggle, now i know how to setup and run dockerized containers, do port mapping, setting up bash script to load the sql files into the pg container and persist the data by setting up volumes. There's no easy way without a struggle. All thanks to your initiative zach.!
This is a compelling insight into cumulative table design and temporal cardinality! None of the courses or educational content I've come across so far has offered such a deep understanding of how beautifully tables can be modeled for efficient querying. I can't thank you enough for sharing this knowledge. Time to dive into the homework now- more power to you, Zach!
Zach, you're on a whole different level! The way you explain cumulative data modeling is simply out of this world. Your approach makes complex concepts feel so intuitive. Thanks for sharing such amazing content-it’s truly inspiring for beginners like me!
I explored lot of contents since last 2 years. But This is truely awesome and most unique knowledge content on internet. You are just a fabulous Guru in Data Engineering domain.
Amazing! Now I know how to properly organize historical data for the analytical end consumers so they can work on their cumulative analysis, and without killing the cluster with complex queries 😃
Hi Zach, thanks so much for posting this! I was finally able to understand what kind of structure a cumulative dimension table should take on, but I am still wondering how does the accumulation actually occur? Do you run a scheduled script to regularly run full outer join on historical / current? Would there be a way to make a backfill cumulative table?
Thank you for the video. Instead of cumulative table design, can't we using upsert merge statement, match then update else insert. I understand the compression part but not sure the usages of cumulative table
Hi Zach, thanks for the video! Could you pls explain why you didn’t use INT for “draft_year” column with NULL meaning “undrafted”? Thank you in advance!
wow, this is such a powerful modeling technique!! Especially for someone like me who lowkey hates GROUP BY hahaha. Just need to recognize when to appropriately apply it 🤓
I understood the point of cumulative table design, but I still can't understand why do we have rows ordered by player_name at 26:30. Our initial table player_seasons wasn't ordered by player_name. Then we nest-unnest in, and somehow it becomes ordered? 🤯Is this some kind of inner optimization mechanism?
The cumulation process sorts in implicitly by how we add values to the array. There’s no ORDER BY but the way we do the array concat is how it sorts itself
@@EcZachly_ thank you Zach! You replied faster than I put all inputs into ChatGPT! 😄 Do you think this answer of ChatGPT makes sense? "The players table has a primary key defined as (player_name, current_season). This means PostgreSQL stores the data in an order that optimizes lookups based on this key. When you query the table without an ORDER BY clause, PostgreSQL often retrieves rows in the order they are stored in the table, which in this case is likely sorted by player_name due to the primary key."
mind blown on this nested and unnested level of info. Thanks! I am going to post this to discord chat too but can you loop through the years(in this instance) to store all the metrics?
@zach, My question is simple @26:55 timestamp, you have drilled down the stats column i.e. gp,pts,reb,ast, but where as in player we are inserting as a array column right. So how did Postgre SQL automatically fetched the columns as it is. We neither stated any where in DDL right, within the SQL while inserting we collected the data and loaded right. Can you please help me in this regard.
In general you are frequently getting data (either via CDC or something else) from an OLTP db, so in this case player_seasons would probably be 2 or more table (one for players and one for player_year_statistics) in 3nf. In this case would you have a prior step where you extract the data into say parquet copies, then build players directly from it? or would you build something else?
Question- why do we need a record for every season per player. Since stats are accumulating, one record with array of struct could be enough. Is it to track changes in other columns that are not part of the struct?
Great video zack.. quick question.. this design has a current season as a column.. how about we want to do aggregates across seasons? How would that change the table design?
Hello Zach. There is something that confuses me. You have the player name and season as primary key right? I don't get how the table is updated if you are always trying an insert not an update.
That's the same question I had. So, I was under the impression that with each insert we are only maintaining a single row for each player, but after the last query update, he was able to query both by season = 2000 and season = 2001. Not sure how did that happen?
Where do you download the existing DB? I see the queries in the git Repo but no table scripts or anything like that. Am I missing something, or are we supposed to just randomly generate data ourselves to work with
please use data.dump and restore it in your postgres db in your 1-dimensional-data-modeling folder you could use this command from the same folder if your postgres is installed pg_restore -U postgres -d postgres data.dump
i had issues with windows, and downloaded the postgres 13 instead of docker image and then run this. please watch out where am i, and the exact path that i call the pg_restore.exe C:\Users\yourusername\Downloads\data-engineer-handbook\bootcamp\materials\1-dimensional-data-modeling> & 'C:\Program Files\PostgreSQL\13\bin\pg_restore.exe' -U postgres -d postgres data.dump i was with in the folder 'C:\Users\yourusername\Downloads\data-engineer-handbook\bootcamp\materials\1-dimensional-data-modeling', where the data.dump is
1. Create a New Database (If Needed) : Using pgAdmin Interface Right-click on Databases under your connected server and select Create > Database... Database: Enter a name for your new database (e.g., my_database). Click Save. 2. Navigate to the Target Database: In the Browser panel, expand Servers > Your Server > Databases > my_database. Right-click on the target database (my_database) and select Restore... navigate to the /XXXX/XXXX/XXXX/XXXX/XXXX/data.dump. Select the data.dump file. Click the Restore button to begin the process. Monitor the Process: A dialog will display the progress. Wait until it completes. 3. Once the restore process is complete: Refresh the Browser panel by right-clicking on Databases and selecting Refresh. Expand your database (my_database) to view Schemas, Tables, Views, etc., ensuring that your data and schema objects are present.
@@emrahe468 You helped me so so much! I used to run pg_restore command in a wrong directory in Terminal, not in pg_restore one. Thank you for your advise and very good luck!
Its good that you are showing new features in the postgre sql. But when we try to do it in sql server its little different as we dont have UNNEST, CARDINILITY and creating new data type. But thanks for this new features
Thanks for sharing such valuable information! A bit off-topic, but I wanted to ask: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
Thanks for the launch! So aren't non-sql databases a better fit for big data that contains array info? With Mongodb or Cosmosdb, you would be able to skip the compile and decompile steps, and the data is already stored compressed in bson format, so for big data applications, wouldn't SQL be a big slow down and resource hog?
Thank you very much for the video, I might be wrong here but won't the data (specifically the division) be skewed if a player only plays fewer years say just 2001 if we are checking which player improved the most through the seasons by pts? ```sql SELECT player_name, (season_stats[CARDINALITY(season_stats)]::season_stats).pts/ CASE WHEN (season_stats[1]::season_stats).pts = 0 THEN 1 ELSE (season_stats[1]::season_stats).pts END FROM players WHERE current_season = 2001 ``` unless I am wrong would we want to modify this to have the same number of seasons for everyone?
Hey Zach, I'm trying to follow you in the video and I've finished setting up my docker and postgres but i don't see the player_seasons table.. please let me know how i can access it.
Thanks Zach , you are very expert but you need to hold down and your goal from the beginning what's the meaning of arrays the data types you used , why you are doing that ? it will take from you 15mins maybe every video but to be on the same page , by the way I found only the script to create the table but I didn't find the script to insert the data Thanks Again
Guys, after installing Postgres server on my local machine, after loading data dump and installing DBeaver client, the tables in the db are empty, no data in it. I tried to rerun the command postgres=# \\i data.dump as mentioned in the tutorial, but it doesn't help.....
Would really appre iate if you can make a complete setup video of how to do this. Anyway is it possible to end up making the setup in snowflake or databricks free trial?
@@EcZachly_yes but i can see the same TH-cam video player there earlier it was different video player and i watched the 1st 2 videos and it got counted. But for lec 2 there is a TH-cam link only
Hi Zack, I am loving your bootcamp ; ). I got struct at installation part, exactly at step-3 I followed everything but I am getting this error: ERROR: permission denied for table teams SQL state: 42501 Please help me
i am facing this error can any one please resolve it at recordng time:29:00 ERROR: column "current_season" is of type integer but expression is of type scoring_class LINE 33: case ^ HINT: You will need to rewrite or cast the expression. SQL state: 42804 Character: 877
Hello guys, spinned up postgres DB using docker image in Windows. Accessed it using dbeaver, but can't find any tables in the public schema. Anyone faced the same issue? Need help here please!!!
In my local machine I moved the DB objects from data.dump to my local_db using ps_restore command pg_restore -U postgres -d my_local_db /path/to/my_dump.dump
In the handbook, the hint was given in a section called " Tables not loading!? 🚨" The "psql" command given in the handbook didn't work for me and kept throwing the "ERROR: syntax error at or near "toc" LINE 1: toc.dat actor text" Instead I treated it as custom format and used the following -> its opening fine in dbeaver now "pg_restore --username="$POSTGRES_USER" --dbname="$POSTGRES_DB" --no-owner /docker-entrypoint-initdb.d/data.dump" Hope that helped - it took me hours to get that far 🙃💪
One of the weird things about your course is you actually focus on the data modeling part. For some reason it seems like there's been a push to just say that DE should be focused on pushing tons of "however it looks in the source" into parquet lake tables, then "solve" it with putting everything in "one big table" without any thought to design. Cool. I think part of this comes from not thinking about logical/conceptual vs physical/implementation. Regardless of whether your creating a Microsoft SASS cube or flattened tables, you still need to think about things like grain, persistent entities, facts, etc.
1. COALESCE: Returns the first non-NULL value from a list of arguments.
2. CARDINALITY: Returns the number of elements in an array.
3. ::: Type cast operator to convert a value from one data type to another.
4. UNNEST: Expands an array into individual rows.
At 23:50, Zach was indeed very close to what he wanted. We don't really need the CTE, the shorter query is:
SELECT
PLAYER_NAME,
(UNNEST(SEASON_STATS)::SEASON_STATS).*
FROM
PLAYERS
WHERE
CURRENT_SEASON = 2001
AND PLAYER_NAME = 'Michael Jordan';
Thank you!
This tutorial exploded my brain, love it. I'd never seen `UNNEST` until this video, amazing.
I've been working as a junior data engineer for nearly a year, so exciting to see how much more there is to learn with everybody here.
CREATE TYPE season_stats AS (
season Integer,
pts REAL,
ast REAL,
reb REAL,
weight INTEGER
);
CREATE TYPE scoring_class AS
ENUM ('bad', 'average', 'good', 'star');
DROP TABLE players;
CREATE TABLE players (
player_name TEXT,
height TEXT,
college TEXT,
country TEXT,
draft_year TEXT,
draft_round TEXT,
draft_number TEXT,
seasons season_stats[],
scorer_class scoring_class,
is_active BOOLEAN,
current_season INTEGER,
PRIMARY KEY (player_name, current_season)
);
Select * from player_seasons
WHERE player_name = 'Michael Jordan'
Select * from players
WHERE player_name = 'Michael Jordan'
and current_season=2001
---------------------------------------------------------------------------------------------------------------------------------------
WITH last_season AS (
SELECT * FROM players
WHERE current_season = 2000
), this_season AS (
SELECT * FROM player_seasons
WHERE season = 2001
)
INSERT INTO players
SELECT
COALESCE(ls.player_name, ts.player_name) as player_name,
COALESCE(ls.height, ts.height) as height,
COALESCE(ls.college, ts.college) as college,
COALESCE(ls.country, ts.country) as country,
COALESCE(ls.draft_year, ts.draft_year) as draft_year,
COALESCE(ls.draft_round, ts.draft_round) as draft_round,
COALESCE(ls.draft_number, ts.draft_number)
as draft_number,
COALESCE(ls.seasons,ARRAY[]::season_stats[])
||
CASE WHEN ts.season IS NOT NULL
THEN ARRAY[ROW(ts.season,ts.pts,ts.ast,ts.reb, ts.weight)::season_stats]
ELSE ARRAY[]::season_stats[] END as seasons,
CASE
WHEN ts.season IS NOT NULL THEN
(CASE WHEN ts.pts > 20 THEN 'star'
WHEN ts.pts > 15 THEN 'good'
WHEN ts.pts > 10 THEN 'average'
ELSE 'bad' END)::scoring_class
ELSE ls.scorer_class END as scoring_class,
ts.season IS NOT NULL as is_active,
2001 AS current_season
FROM last_season ls
FULL OUTER JOIN this_season ts
ON ls.player_name = ts.player_name
---------------------------------------------------------------------------------------------------------------------------
SELECT player_name, * FROM players
SELECT player_name, UNNEST(seasons) -- CROSS JOIN UNNEST LATERAL VIEW EXPLODE
FROM players
WHERE current_season = 1998
AND player_name = 'Michael Jordan';
WITH UNNESTED AS(
SELECT player_name, UNNEST(seasons)::season_stats AS season_stats
FROM players
WHERE current_season = 1998 )
SELECT player_name, (season_stats::season_stats).* from UNNESTED
SELECT
player_name,
(seasons[CARDINALITY(seasons)]::season_stats).pts/
CASE WHEN (seasons[1]::season_stats).pts = 0 THEN 1 ELSE (seasons[1]::season_stats).pts END
FROM players
WHERE current_season = 2001
AND scorer_class = 'star'
Thought of just setting up postgres on my machine and do the easy way, but thought why not do it zach's way and set it up on docker. But i had zero knowledge on setting up docker.
3 days in with the struggle, now i know how to setup and run dockerized containers, do port mapping, setting up bash script to load the sql files into the pg container and persist the data by setting up volumes.
There's no easy way without a struggle. All thanks to your initiative zach.!
Oh cool! @archray9938? Let me try it out too.
which extension or service are u using for connecting to the docker postgres container and how are u doing it ?
how to setup can you help me in this
same same - the struggle was real 😅
Hey, i need your help, plesae give some step by step instruction.
This is really a gem video. I haven't seen anyone making video on Cumulative Dimensions, Appreciate it!!
This is a compelling insight into cumulative table design and temporal cardinality! None of the courses or educational content I've come across so far has offered such a deep understanding of how beautifully tables can be modeled for efficient querying. I can't thank you enough for sharing this knowledge. Time to dive into the homework now- more power to you, Zach!
Zach, you're on a whole different level! The way you explain cumulative data modeling is simply out of this world. Your approach makes complex concepts feel so intuitive. Thanks for sharing such amazing content-it’s truly inspiring for beginners like me!
I explored lot of contents since last 2 years. But This is truely awesome and most unique knowledge content on internet. You are just a fabulous Guru in Data Engineering domain.
I know we never communicate but you inspire me to love my data engineering field. Please continue the hard work. God bless
Since I am new to DE, had to watch this lecture twice and code along while watching 2nd time.. Thanks Zach
Thanks Zach! I am still a student but I hope I can give back to the community like you one day.🙂
This is crazy interesting. Thinking about moving from Backend Engineer to Data. Thank you Zach
Just finished this exercise! It’s incredibly well-curated and will be immensely helpful for my upcoming projects. Thanks a lot Zach
I dont have data in the tables, can you help me with this? I did \i data.dump but still all the tables are empty
Need to watch it 3 times to digest everything in here . Thanks mate 🤝
Just finished the exercise. It was magic. Thanks zach.
Amazing! Now I know how to properly organize historical data for the analytical end consumers so they can work on their cumulative analysis, and without killing the cluster with complex queries 😃
Thanks for sharing! Never saw people using those functions here. Learning a lot with you!!!
Thanks Zach, Just what I needed for Christmas!! ❤
This is truly a Gold mine of knowledge. Thanks Zach!!
Thanks a bunch mate , you're really doing god's work .
Teaching us everything from scratch , we owe you a big one^^.
Amazing. Love both conceptual and lab.
man, these stuffs is so beyond me right now. But watching you creating all these script live is live watching orchestra. Thank you
Definitely said you need to know sql before doing this boot camp
You deserve a million subscribers man!!
Zach you are a star level player! On which circumstances you feel like cumulative dimensions are something to be avoided?
This content is pure Gold , need to start saving to join the full course !!!!!
I am really enjoying these tutorials, you are a guru amazing videos!!!
Hi Zach, thanks so much for posting this! I was finally able to understand what kind of structure a cumulative dimension table should take on, but I am still wondering how does the accumulation actually occur? Do you run a scheduled script to regularly run full outer join on historical / current? Would there be a way to make a backfill cumulative table?
Very good lecture. I"m using DBeaver which works fine.
ENGENHARIA DE DADOS PARA HOMENS! WHAT A MASTERPIECE! THANK YOU FROM BRAZIL!!!
This is for the brokies like me. Thank you Zack. It’s gold.
It’s okay to be poor. I’ll make you rich by December 31st. At least rich in knowledge
@ absolutely Zach. I am on God’s side and I am at peace. As long as I put on work.. the rest will follow. Thank you.
Thanks Zach! I don't seem to see this video on your platform. Much appreciated!
It’s on the platform for sure too. Everything is on both
Can you please add the link on the set up required for the lab exercise?
Excellent explaination, tysm Zach
Awesome learning. As it is for analytics why did you choose postgres a OLTP database over any OLAP like redshift, bigquery??
Because it’s free and local?
Thank you for the video. Instead of cumulative table design, can't we using upsert merge statement, match then update else insert. I understand the compression part but not sure the usages of cumulative table
This is gold. Thanks Zach.
Hi Zach, thanks for the video! Could you pls explain why you didn’t use INT for “draft_year” column with NULL meaning “undrafted”? Thank you in advance!
That’s a good addition to make things better!
amazing video, thanks Zach!
When is the last day that I can sign up for the Jan bootcamp?
wow, this is such a powerful modeling technique!! Especially for someone like me who lowkey hates GROUP BY hahaha. Just need to recognize when to appropriately apply it 🤓
I understood the point of cumulative table design, but I still can't understand why do we have rows ordered by player_name at 26:30. Our initial table player_seasons wasn't ordered by player_name. Then we nest-unnest in, and somehow it becomes ordered? 🤯Is this some kind of inner optimization mechanism?
The cumulation process sorts in implicitly by how we add values to the array. There’s no ORDER BY but the way we do the array concat is how it sorts itself
@@EcZachly_ thank you Zach! You replied faster than I put all inputs into ChatGPT! 😄
Do you think this answer of ChatGPT makes sense?
"The players table has a primary key defined as (player_name, current_season). This means PostgreSQL stores the data in an order that optimizes lookups based on this key.
When you query the table without an ORDER BY clause, PostgreSQL often retrieves rows in the order they are stored in the table, which in this case is likely sorted by player_name due to the primary key."
mind blown on this nested and unnested level of info. Thanks! I am going to post this to discord chat too but can you loop through the years(in this instance) to store all the metrics?
Zach if u can tell whether this cumulative table relates to the DW dimensional design or it's separate from that? Thanks
@zach, My question is simple @26:55 timestamp, you have drilled down the stats column i.e. gp,pts,reb,ast, but where as in player we are inserting as a array column right. So how did Postgre SQL automatically fetched the columns as it is. We neither stated any where in DDL right, within the SQL while inserting we collected the data and loaded right. Can you please help me in this regard.
In general you are frequently getting data (either via CDC or something else) from an OLTP db, so in this case player_seasons would probably be 2 or more table (one for players and one for player_year_statistics) in 3nf. In this case would you have a prior step where you extract the data into say parquet copies, then build players directly from it? or would you build something else?
Question- why do we need a record for every season per player. Since stats are accumulating, one record with array of struct could be enough. Is it to track changes in other columns that are not part of the struct?
Thank Zach
Great video zack.. quick question.. this design has a current season as a column.. how about we want to do aggregates across seasons? How would that change the table design?
What are the prerequisites for the course @data with Zach
Hello Zach. There is something that confuses me. You have the player name and season as primary key right? I don't get how the table is updated if you are always trying an insert not an update.
That's the same question I had. So, I was under the impression that with each insert we are only maintaining a single row for each player, but after the last query update, he was able to query both by season = 2000 and season = 2001. Not sure how did that happen?
Where do you download the existing DB? I see the queries in the git Repo but no table scripts or anything like that. Am I missing something, or are we supposed to just randomly generate data ourselves to work with
please use data.dump and restore it in your postgres db in your 1-dimensional-data-modeling folder
you could use this command from the same folder if your postgres is installed
pg_restore -U postgres -d postgres data.dump
the bottom section of the dimensional data modelling page explains what to do if the tables don't download right away
i had issues with windows, and downloaded the postgres 13 instead of docker image
and then run this. please watch out where am i, and the exact path that i call the pg_restore.exe
C:\Users\yourusername\Downloads\data-engineer-handbook\bootcamp\materials\1-dimensional-data-modeling> & 'C:\Program Files\PostgreSQL\13\bin\pg_restore.exe' -U postgres -d postgres data.dump
i was with in the folder 'C:\Users\yourusername\Downloads\data-engineer-handbook\bootcamp\materials\1-dimensional-data-modeling', where the data.dump is
1. Create a New Database (If Needed) : Using pgAdmin Interface
Right-click on Databases under your connected server and select Create > Database...
Database: Enter a name for your new database (e.g., my_database).
Click Save.
2. Navigate to the Target Database:
In the Browser panel, expand Servers > Your Server > Databases > my_database.
Right-click on the target database (my_database) and select Restore...
navigate to the /XXXX/XXXX/XXXX/XXXX/XXXX/data.dump. Select the data.dump file.
Click the Restore button to begin the process.
Monitor the Process:
A dialog will display the progress. Wait until it completes.
3. Once the restore process is complete:
Refresh the Browser panel by right-clicking on Databases and selecting Refresh.
Expand your database (my_database) to view Schemas, Tables, Views, etc., ensuring that your data and schema objects are present.
@@emrahe468 You helped me so so much! I used to run pg_restore command in a wrong directory in Terminal, not in pg_restore one.
Thank you for your advise and very good luck!
Its good that you are showing new features in the postgre sql. But when we try to do it in sql server its little different as we dont have UNNEST, CARDINILITY and creating new data type. But thanks for this new features
Thanks for sharing such valuable information! A bit off-topic, but I wanted to ask: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How should I go about transferring them to Binance?
Thanks for the launch! So aren't non-sql databases a better fit for big data that contains array info? With Mongodb or Cosmosdb, you would be able to skip the compile and decompile steps, and the data is already stored compressed in bson format, so for big data applications, wouldn't SQL be a big slow down and resource hog?
Run an analytical query on those and suffer and you'll see why complex data types in SQL might be better
The below worked. Thank you for existing Zach, you are making a lot of impact!
SELECT player_name,
(UNNEST(season_stats)::season_stats).*
FROM players
Thank you very much for the video, I might be wrong here but won't the data (specifically the division) be skewed if a player only plays fewer years say just 2001 if we are checking which player improved the most through the seasons by pts?
```sql
SELECT
player_name,
(season_stats[CARDINALITY(season_stats)]::season_stats).pts/
CASE WHEN (season_stats[1]::season_stats).pts = 0 THEN 1 ELSE (season_stats[1]::season_stats).pts END
FROM players
WHERE current_season = 2001
```
unless I am wrong would we want to modify this to have the same number of seasons for everyone?
really well done
why loosing compression is a problem ?
So you going to post all the videos here right ?
Yep indeed
Im struck with Setup. Can someone ping details of discord
Hey Zach, I'm trying to follow you in the video and I've finished setting up my docker and postgres but i don't see the player_seasons table.. please let me know how i can access it.
Join the discord
@ just did thank you!
Thanks Zach , you are very expert but you need to hold down and your goal from the beginning what's the meaning of arrays
the data types you used , why you are doing that ? it will take from you 15mins maybe every video but to be on the same page
, by the way I found only the script to create the table but I didn't find the script to insert the data
Thanks Again
Exactly . Need script to insert data for hands on
Guys, after installing Postgres server on my local machine, after loading data dump and installing DBeaver client, the tables in the db are empty, no data in it. I tried to rerun the command postgres=# \\i data.dump as mentioned in the tutorial, but it doesn't help.....
UPD: I figured it out. Thank you everyone for commenting here on your data visibility issues
Would really appre iate if you can make a complete setup video of how to do this. Anyway is it possible to end up making the setup in snowflake or databricks free trial?
Definitely not unless they sponsor me
I want to from can i get the table contents?
Love that you start from basic.
Launch video
Is the dataset availalbe on your github? I saw the code but not the nba data.
Yes. All the instructions are there
Hi Zach, where can I find the dataset that you are using in this video and whichever in the future videos?
in the GitHub repository, there are SQL commands only
There’s a data.dump file. Follow the instructions on how to stand up the Postgres instance
I really don't understand the concepts of cumulative here when you have to run the query manually. PLEASE EXPLAIN WHAT I AM MISSING HERE.
This was awesome
Please how can I join the discord channel for the free data boot camp? I am starting late, and I am a little confused by everything. Thanks
bootcamp.techcreator.io will get you an invite
THANK YOU ZACH
I really enjoyed this video. However its been days i don't know how to access the assignment. Its been difficult to login to my discord account.
bootcamp.techcreator.io is where you can get the assignments
Very well said.
We can the youtube video link in the dataexpert website,will the video be uploaded later in order to count the wath mins ?
If you watch on the website, you get credit correct
@@EcZachly_yes but i can see the same TH-cam video player there earlier it was different video player and i watched the 1st 2 videos and it got counted.
But for lec 2 there is a TH-cam link only
🎉Thank you Zack
Guys, anyone know why the enumarate class and why not only go for "text" and do the the case when?
Probably would do text in a data lake environment
can someone help me with the env setup🙃
Join the boot camp: bootcamp.TechCreator.io and we’ll send you the discord invite via email. Then the community can help you
Please let me know where can I get the dataset
I dont have data in the tables, can someone help me with this. I did \i data.dump but still all the tables are empty
Join the discord to troubleshoot
where i get the dataset
amazing, tks video
can someone tell me for all of these table records from where I can get to start the lab?
Data engineer handbook repo
Hi Zack, I am loving your bootcamp ; ). I got struct at installation part, exactly at step-3 I followed everything but I am getting this error: ERROR: permission denied for table teams
SQL state: 42501
Please help me
Join the discord
what's the discord channel name if any one have joined and made friends.....
Did you get?
Join bootcamp.TechCreator.io
please any one explain me how can i setup the environment, ...
Where is the dataset for this lab?
Google data engineer handbook
@@EcZachly_
Got it, thanks.
Hey I am stuck at the point where the season stats is accumulated over the years
Join
Discord. We will help you!
discord.gg/qvJa9N8F
i am facing this error can any one please resolve it at recordng time:29:00
ERROR: column "current_season" is of type integer but expression is of type scoring_class
LINE 33: case
^
HINT: You will need to rewrite or cast the expression.
SQL state: 42804
Character: 877
Join the discord!
Hello guys, spinned up postgres DB using docker image in Windows.
Accessed it using dbeaver, but can't find any tables in the public schema.
Anyone faced the same issue? Need help here please!!!
In my local machine I moved the DB objects from data.dump to my local_db using ps_restore command
pg_restore -U postgres -d my_local_db /path/to/my_dump.dump
In the handbook, the hint was given in a section called " Tables not loading!? 🚨"
The "psql" command given in the handbook didn't work for me and kept throwing the "ERROR: syntax error at or near "toc" LINE 1: toc.dat actor text"
Instead I treated it as custom format and used the following -> its opening fine in dbeaver now
"pg_restore --username="$POSTGRES_USER" --dbname="$POSTGRES_DB" --no-owner /docker-entrypoint-initdb.d/data.dump"
Hope that helped - it took me hours to get that far 🙃💪
Join the discord at bootcamp.techcreator.io
@@seaniam Thanks 👬Mate
Thank you 🙏
Can llms do these complex sql queries?
I have a video on that near the end of the boot camp
anyone having insert statement with them?
Has anyone successfully connected to the postgres and got table ?
Join the free boot camp bootcamp.techcreator.io and we’ll get your sorted.
Struggling. Working on it too.
I got tables, but there are no data in it. Tried to rerun postgres=# \\i data.dump, but it didn't help....
Hi guys
can anyone help me load the data and make the setup so that i can start the lab
Join the discord bootcamp.techcreator.io
Sure
@Prakhar537 let me know how and when can we connect
One of the weird things about your course is you actually focus on the data modeling part. For some reason it seems like there's been a push to just say that DE should be focused on pushing tons of "however it looks in the source" into parquet lake tables, then "solve" it with putting everything in "one big table" without any thought to design. Cool.
I think part of this comes from not thinking about logical/conceptual vs physical/implementation. Regardless of whether your creating a Microsoft SASS cube or flattened tables, you still need to think about things like grain, persistent entities, facts, etc.
Focusing on data modeling is the where I saw the biggest impacts in my big tech career
My PgAdmin does not read the password anymore. Did anyone have the same error?
Join the discord discord.gg/K4faeNy43w
I watched all the video in the website but it says 67.3% watched
💎💎💎
I Feel you are rushing because it's free?
It’s because I have ADHD and I needed to make sure the lab fit into a 1 hour time slot. I filmed this live a year ago in front of 150 people
please zoom out a little bit, the text is so big and you scrolled a lot make me so hard to follow
🎉Thank you Zack