End to End Data Analytics Project (Python + SQL)
ฝัง
- เผยแพร่เมื่อ 30 ก.ย. 2024
- In this video we will do an end to end data analytics project using python and SQL. We will use Kaggle API to download the dataset and to data processing and cleaning using pandas and load the data into sql server. Lastly we will answer some interesting questions using SQL.
github link:
github.com/ank...
Data Analytics high quality content: www.namastesql...
Zero to hero(Advance) SQL Aggregation:
• All About SQL Aggregat...
Most Asked Join Based Interview Question:
• Most Asked SQL JOIN ba...
Solving 4 Trick SQL problems:
• Solving 4 Tricky SQL P...
Data Analyst Spotify Case Study:
• Data Analyst Spotify C...
Top 10 SQL interview Questions:
• Top 10 SQL interview Q...
Interview Question based on FULL OUTER JOIN:
• SQL Interview Question...
Playlist to master SQL :
• Complex SQL Questions ...
Rank, Dense_Rank and Row_Number:
• RANK, DENSE_RANK, ROW_...
#sql #dataengineer #datanalysis #dataanalytics
Please like the video as it takes lots of effort to record these videos.
Checkout my high quality data analytics courses :
www.namastesql.com/
Hi Ankit, can you please kindly help the below requirement.
create table #temp ( DepartmentId int, Name varchar(255),Hiredate date,Sal float,Ruleid int)
Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Sai','2021-10-23',5500)
Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Sairam','1999-10-23',6000)
Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Saikrishna','2002-10-23',3000)
Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Sair','2021-10-23',5000)
Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'Raj','2015-10-23',8000)
Insert into #temp (DepartmentId,Name,Hiredate,Sal) values ( 10,'SRK','2021-10-23',5000)
Create table #Rules (Departmentid int,Ruleid int, Condition Varchar(2000))
Insert into #Rules values ( 10,1,'Name like ''Sai%''')
Insert into #Rules values ( 10,2,'Hiredate >= ''2000-01-01''')
Insert into #Rules values ( 10,3,'sal >= 5000')
output
Departmentid Name Hiredate sal ruleid
10 Sai 10/23/2021 5500 2,10,30
10 Sairam 10/23/1999 6000 2,30
10 Saikrishna 10/23/2002 3000 2,10
10 Sair 10/23/2021 5000 2,10,30
10 Raj 10/23/2015 8000 10,30
10 SRK 10/23/2021 5000 10,30
DepartmentId is used to join #temp and #Rules.
check each condition dynamically from #Rules against the corresponding rows in #temp.
If a condition is satisfied, concatenate the Ruleid value from #Rules to the existing Ruleid in #temp
@@revathigangisetty9065 send the problem on sql.namaste@gmail.com
@@ankitbansal6 i shared the details in the email ankit. its urgent requirement ankit please kindly help me. i will be eagerly waiting for your input.
i shared the problem to you via email and tried multiple times but i didnot get the exact solution. please kindly help ankit
Bro I think you have done mistake in top 10 highest revenue generating products category you should use SUM(orders_quantity*selling_price) because sum of only selling price cannot produce the revenue it should be multipied by quantity
While watching the video I also has thought the same. Could you please clarify it @ankitbansal6
Hi Ankit, You have made this video at the correct time, I was looking for something like this. Could you also make a project on end to end problems while loading data in CSV to MySQL. I recently had a issue where I was not able to fully load all the data , only partially even though I did null treatment. Please make a video on it and also post it in your LinkedIn once you have done it so i may get notified about it
Sure
Can't add the ODBC Sql Server even after having the MySQL that was installed for working on your SQL Course. I am getting an error message saying TEST FAILED. Can you please share any link that can help us to get that server name present in the list of servers available to connect or better a follow up video showing how to do it on your system. Thanks in advance
Anaconda3\lib\site-packages\sqlalchemy\exc.py", line 258 except Exception, e:
how to fix sqlalchemy error?
Did you get the solution?😅
@@DA_Guy123 no but instead i inserted record using pandas - df_orders_table_create=("""create table if not exists df_orders( )
cur.execute(df_orders_table_create)
conn.commit()
how do you document your findings. please make a video on how you put everything together for a portfolio showcasing your skills in python and sql
There is no folder called .kaggle..
Create it
First of all, thank you very much for the project. But there’re couple of concerns which you may have overlooked -
1. In every query where we’re doing analysis based on sales, I guess it’s better to consider sale_price*quantity as sales rather than sale_price only
2. In the last problem, profit should have been taken into consideration rather than sale_price
But again I’m repeating that, above are some modification we can make to get the best business answers, otherwise in the video you’ve shown the approach how to handle an end to end project - from that pov it’s absolutely fine. We students can make necessary changes.
Thanks for the feedback. Appreciate it 🙂
I have noticed same thing but overall good Video for learning
I assumed the sale_price is the result of this calculation, which would make the way he does it correct.
Hi Ankit sir, I'm facing continuous issues connecting Jupyter Notebook to SQL Server Management Studio (SSMS). I've installed the necessary drivers and tried all the solutions, but I'm still unable to connect. As a workaround, I saved the updated file from Jupyter Notebook and manually loaded it into SSMS, changing all instances of df_orders to updated_orders in my queries. Could you please advise me on which drivers and software are needed? I've watched several TH-cam videos, but they often lead to lengthy errors.
Hello Bhaiya!
Thank You.
If i need help in my career.Will you help me with your knowledge and experience?
It would be very grateful of you.
Very nice video. Just a little doubt. In the last question we have to calculate subcategory with highest growth by profit but in the video highest growth by sales is calculated. So the one given in the video is right or not?
cant't we direclty open csv file in python using read_csv no need of kaggle API
Thank you ankit it's really helpfull.
Can you make a practical video on A/B Testing plzz
Sure
Jesus is the only way to healing, restoration and salvation to all souls. Please turn to him and he will change your life, depression into delight, soul heading from hell to heaven all because of what he did on the cross
“Whoever calls upon the name of the Lord shall be saved” Romans 10:13
Sir when you solved the first question why didnt you multiplied the quantity column with the sale price?? For total revenue by product I'd?
Yes, I think likewise. Revenue should have been qty*sale_price
Showing error in changing order date to datetime
create table df_orders(
[order_id] int primary key
,[order_date] date
,[ship_mode] varchar(20)
,[segment] varchar(20)
,[country] varchar(20)
,[city] varchar(20)
,[state] varchar(20)
,[postal_code] varchar(20)
,[region] varchar(20)
,[category] varchar(20)
,[sub_category] varchar(20)
,[product_id] varchar(50)
,[quantity] int
,[discount] decimal(7,2)
,[sale_price] decimal(7,2)
,[profit] decimal(7,2))
Lopez Charles Allen Joseph Hernandez Kenneth
Hi Ankit where can I can these type of queries to get hands on can you please tell
Hi Ankit, I am trying to connect and load the data to my local mysql workbench client. Continuously getting error. Any help?
same here with me
You are the torchbearer for many Ankit, trust me!
Cheers!
Thank you for the valuable content that i found so helpful for my profile .
I have been going through a problem where after transforming the data in Jupyter and loading it to SSMS , the code runs in the notebook with no error but I can't find that data file in SSMS, can you clarify Please.
Thanks.
You need to commit
after cleaning i.e., after droping those 3 columns, can we connect this jupyter notebook to mysql workbench??
if yes, how do i that pls help me! if not, jst tell me the reason y it cant be connected to workbench.
wht is difference b/w mysql workbench & Ms Sql server mngmt studio?
You can connect to MySQL. Just change the connection string
How can I log into the MS SQL Server database when following along with a Mac, not Windows?
I've installed Azure Data Studio but it seems the steps in connecting to the db engine are different.
Try this, when connecting to MS SQL Server in Azure Data Studio in Mac
import pyodbc
conn = pyodbc.connect('Driver={ODBC Driver 18 for SQL Server};'
'Server=localhost;'
'Database=master;'
'UID=your username;'
'PWD=your password;'
'TrustServerCertificate=yes;')
I'm 40 and I am trying to switch career to Data Analytics from a completely different background. I learned PowerBI + SQL. Can I get entry into this field with these two modules? Can I learn Python later have a good career? That is my plan over the next 5 years.
Yes that works. Make sure your SQL is strong .
@@ankitbansal6 Definitely. I was recommended your channel for SQL by a youtuber and actually made a lot of progress in the past two months from your channel. Thanks is very small word for the effort you put into your content. But Thanks anyways 👍😊
Brother I'm 30 and trying to get into data analytics from the Mechanical domain.
Many times I get frustrated and lose hope but your comment restored my faith in the process.
I wish you all the best for your endeavours 👍.
Great video! Loved the easy explanation of the full ETL process and data analysis. Keep up the good work!
Jesus is the only way to healing, restoration and salvation to all souls. Please turn to him and he will change your life, depression into delight, soul heading from hell to heaven all because of what he did on the cross
“Whoever calls upon the name of the Lord shall be saved” Romans 10:13
Hi Ankit , What should I do If I don't see ODBC driver for SQL server in ODBC data source administrator ?
For first question, Don't we need to multiply sale_price with the quantity for revenue generated for each product?
We can do that. I assumed it was total sales in the sale price.
@@ankitbansal6 ok sir
same question. my answer is
SELECT top 10 product_id, SUM(sale_price * quantity) AS total_revenue
FROM df_order
GROUP BY product_id
ORDER BY total_revenue DESC;
Hi, In my C drive i dont have .Kaggle folder so do we need to create new folder for .Kagglejson file to save
Yes
How to load the data in to MySQL?
Hello and congratulations on the course! In the first part where I have to put the json file in the .kaggle file this is doesn't t seed to exist So what can I do?
Create one folder
Hello Ankit when I am Importing date from python to SQL and running code in my Jupyter Notebook I am getting below error
AttributeError: 'Connection' object has no attribute 'cursor'
Please let me know
with t1 as (select month(order_date) as month1,sum(sale_price) as sales1 from orders where year(order_date)=2022 group by 1),
t2 as
(select month(order_date) as month2,sum(sale_price) as sales2 from orders where year(order_date)=2023 group by 1)
select month1 as months,sales1 as 22sales, sales2 as 23sales from t1 inner join t2 on t1.month1=t2.month2 order by 1 asc;
another way for the year on year query
Hi Sir! I am not able to connect the SQL Server at 22:10. I have used the same syntax because the driver name is same still getting errors. Please help
Bro, I'm facing the same problem! do you have resolve it?
@@Hustler19 hey were u able to solve this,m having same issue as well
please share the link for Microsoft Sql server management studio installation.
Great work sir , sir i have one problem im using mysql and for third questionsy output showing me only 2 rows for same query why is that
Brother, I am new to data science. I want to know which language will be good to learn data analysis? I have learned python, numpy and am currently learning panda.
You sir, are a wonderful teacher! I am currently learning Data Engineering. This video has enabled me to catch up and understand some core concenpts that I found challenging because I missed live classes. Thank you!
Glad to help!
Hello Ankit, all these queries can be made while using pandas library, can you tell why your made sql server database and used sql for queries?
"for each category which month had highest sales"
above question has different output from your youtube video and sql queries that u have provided in sql file.
Very Helpful video Ankit! Thanks a lot for the efforts you have put for creating this guided project. Looking forward for many more guided projects :)
Liked your video, I am from Australia.
Thanks for watching!
Sir please upload more SQL projects with datasets from kaggle.. Thank you for this..
In the last question of sql
Should we calculate growth by profit change or sale change because the question is for profit change
Also in total revenue quantity of each order is not included
Bro can you pls help me ! Download the dataset in vscode when it runs it show an error
can i use alternativley mongodb?
Yes
Hi all, i am not able to see my odbc drivers details for sql.can anybody please help me out
Do i need to learn python to start doing Data analytics project?
Yes that would be better
Sir kindly provide the dataset kaggle link
Description box
Hi Ankit, Wonderful Explanation. Can I have those DDL statement in SqL Server. Not able to find in Github
Can you explain how to connect sql server on mac with jupytor
Hello sir i got an error in this line can u help me out df['order_date']=pd.to_datetime(df['order_date'],format="%Y-%m-%d")
What's the error ?
@@ankitbansal6
df.drop(columns=['list_price','cost_price','discount_percent'],inplace=True)
ERROR = errors: str = "raise",
4816 ):
4817 """
4818 Drop specified labels from rows or columns.
4819
...
-> 6644 raise KeyError(f"{list(labels[mask])} not found in axis")
6645 indexer = indexer[~mask]
6646 return self.delete(indexer)
KeyError: "['list_price', 'cost_price', 'discount_percent'] not found in axis"
@@ankitbansal6 df.drop(columns=['list_price','cost_price','discount_percent'],inplace=True)
in this code Error == "['list_price', 'cost_price', 'discount_percent'] not found in axis"
where is SQL dataset how to download it
Hi Ankit
Can you please also show us
How to add triggers
To update the data every month on database
Thanks
Bhai tumne company ka experience karwa diya kaise kaam hota he ❤it man
Thanks for this awesome project.
Thankyou so much Ankit sir For this project👍
I thoroughly enjoyed this video and followed along with you. Thanks for this. Please keep posting more of such end to end analysis problems. Thanks a ton for taking the effort to make these videos so that we keep learning :)
More to come!
hi ankit can you please explain this below hierarchy query (or) make a video on it with your own style.........
I was unable to crack this need your support thanks in advance......❤❤✌✌
CREATE TABLE company
(
employee varchar(10) primary key,
manager varchar(10)
);
INSERT INTO company values ('Elon', null);
INSERT INTO company values ('Ira', 'Elon');
INSERT INTO company values ('Bret', 'Elon');
INSERT INTO company values ('Earl', 'Elon');
INSERT INTO company values ('James', 'Ira');
INSERT INTO company values ('Drew', 'Ira');
INSERT INTO company values ('Mark', 'Bret');
INSERT INTO company values ('Phil', 'Mark');
INSERT INTO company values ('Jon', 'Mark');
INSERT INTO company values ('Omid', 'Earl');
SELECT * FROM company;
/* Given graph shows the hierarchy of employees in a company.
Write an SQL query to split the hierarchy and show the employees corresponding to their team.*/
WITH RECURSIVE cte_teams AS (
SELECT mng.employee,
CONCAT('Team', ROW_NUMBER() OVER (ORDER BY mng.employee)) AS teams
FROM company root
JOIN company mng ON root.employee = mng.manager
WHERE root.manager IS NULL
),
cte AS (
SELECT c.employee, c.manager, t.teams
FROM company c CROSS JOIN cte_teams t
WHERE c.manager IS NULL
UNION
SELECT c.employee, c.manager, COALESCE(t.teams, cte.teams) AS teams
FROM company c JOIN cte ON cte.employee = c.manager
LEFT JOIN cte_teams t ON t.employee = c.employee )
SELECT teams, GROUP_CONCAT(employee, ', ') AS members
FROM cte
GROUP BY teams
ORDER BY teams;
Please send the question and expected output on SQL.namaste@gmail.com
Hi Ankit hope you received my mail waiting for your response
Hi sir how can I add this to my resume and what are the exact project description should I write for this project..please help ..So that I can write the same thing by myself for other kind of projects
Create a GitHub profile and put it there and use that link in your resume .
Hello ankit i followed your video its great to learn . I have one scenario where i have two database one is postgrey and second mysql. what i need is there are 25 lakh product in mysql product table which has pricing and inventory which i want to update in postgrey database based on sku column which will be same. I used query and api its taking 4 to 5 hours to update can we do something with database to update pricing and inventory using procedure database to database. Please suggest
Using python you can move data from MySQL to postgres and then run update on postgres
Ankit bhai no words.. new things learned today..❤
Glad to know 😊
Ankit sir i am using same step but facing this error please guide me
import sqlalchemy as sql
engine = sql.create_engine('mssql://LENOVO\SQLEXPRESS/master?driver=ODBC+DRIVER+17+FOR+SQL+SERVER')
conn = engine.connect()
conn
df.to_sql("df_data", con=conn, index=False, if_exists="replace")
AttributeError: 'Connection' object has no attribute 'cursor'
Try chatgpt
Hi sir
I have a doubt
Top 5 highest selling products should be in terms of quantity right?
sales.
hey Ankit, great content, thanks fro the video! How did you obtain the list the columns names with the datatypes and memory allocation to create the new empty table?
You can right click on the table name in the browser and choose create to
Thanks Ankit for this wonderful end to end course,I have gone through it and cleared many of the doubts.
Thank you so much!! You are the motivation and this is the first step for me!! It means a lot thank you so much Ankit🙏🙏
Keep it up💪
can i use my pdf
instead of kaggle
hi , kaggle.json file is not found in home directory . please advise on this
Create yourself
Thank you bro for the best explanation i’ve ever seen about this topic.
Thank you very much. The explanation is very helpful.
You are welcome!
Great video. Thanks for sharing
Hey Ankit I'm my PC isn't showing any .kaggle icon in my users - username- file how to get that.. I have installed kaggle pip install kaggle.
Create yourself
Got it
bro can you show us an project where you have done the analysis in python why did you do that and store in s3 bucket and so on .Can you make an vedio on pyspark as well
Sure
I dont have .kaggle folder in my directory... What to do?
Create manually
Thank You so much for your content
mssql server connect code does not work
thanks bro and kindly make video on ETL process
Hello Ankit Sir, am a fresher and am looking forward to start my career in data analytics or data engineering field. This video is awesome and very helpful for me. Thank you so much for making the End ot End videos like this. And the way you explaining each and everything is very nice and clear. Keep it up Sir
Keep it up 👍
thanks , i was waiting for this ....honestly
Thank you very much. Never believed I could really joy it as much as I did. I really really appreciate your effort
Awesome 👍
I liked the way you explained every query... Keep it up..
Thank you so much 🙂
Can this be done using VScode?
Sir will not we multiply the quantity with sales price to get the overall sales price of particular product and then sum it????
We can do that. I assumed it was total sales in the sale price.
@@ankitbansal6 ok sir🙌
Sir will we have a PowerBI course like tableau ?
Yes After Tableau
To insert or load data frames into sql what is the maximum number of rows that can be inserted
Bro, there is no limitations.
hi i am not getting .kaggle folder in my pc why so
You can create it .kaggle
Not able to kaggle API understand
Today only i was seraching for data analytics ..and just saw your vedio ...thank you so much..pls post more such content on data analytics and any course on this as well please 😊
Your a goat 🐐 in data filed sir
i didnt understand why did we use kaggle Api. We could have directly downloaded the dateset from kaggle and read the file using pandas ,right?
Yes but how we will learn to use API 😄
@@ankitbansal6Can you make small video on setup of mysql server how to make server in it
😂😅@@ankitbansal6
bhai tuze padhana nhi aata bilkul, sorry to say
How to find length of a feature in df to create identical column in database table
df.dtypes
Ji how to know what is the maximum length of a feature with decimal postions so that there will not be any data loss while uploading data to table
where can i find dataset ankit ?
Excellent tutorial, thank you!
You're very welcome!
Thanks a lot bhai love nd support from berhampur❤
Thankyou so much for this Vedio Ankit.
Thank you so much for the video! great video👍
Can we expect something related to cloud as well?
Here you go
th-cam.com/video/52CWagk3-jw/w-d-xo.html
This was really helpful as a beginner me. thanks a lot & we need more n more videos like this.
really love this kind of content, please make more video like this.