Pandas is a monster for Data wrangling. One of its key pros is that it is 2 dimensional for data manipulation SQL is 1 dimensional, i.e., record-based but not field-based. here is just one of the examples... finding rolling mean with 10 periods and inserting in a new column. Pandas can do this in just One small line of code. In SQL you have to ALTER, ADD and derive the mechanism in a very complex way for the same with multiple codes.
Interesting video, but i feel that the comparison examples between SQL and pandas are unfair. For example, in the "age>30" filtering comparison, you said that SQL is better than pandas, becasue in pandas you need to first import the library, and then add the line "df=pd.read('customers.csv')" before filtering by age>30, but I'm pretty sure that in SQL you are going to need to import the table from a related database, from a first look at SQL documentation, I underestand that the complete code for SQL would be something like: CREATE TABLE customers ( id INT PRIMARY KEY, name VARCHAR(255), age INT ); LOAD DATA LOCAL INFILE 'customers.csv' INTO TABLE customers FIELDS TERMINATED BY ',' LINES TERMINATED BY ' '; and then, finally SELECT * FROM customers WHERE age > 30; that vs: import pandas as pd df=pd.read('customers.csv') df[df[age]>30]
Pandas is a monster for Data wrangling. One of its key pros is that it is 2 dimensional for data manipulation SQL is 1 dimensional, i.e., record-based but not field-based. here is just one of the examples... finding rolling mean with 10 periods and inserting in a new column. Pandas can do this in just One small line of code. In SQL you have to ALTER, ADD and derive the mechanism in a very complex way for the same with multiple codes.
Interesting video, but i feel that the comparison examples between SQL and pandas are unfair. For example, in the "age>30" filtering comparison, you said that SQL is better than pandas, becasue in pandas you need to first import the library, and then add the line "df=pd.read('customers.csv')" before filtering by age>30, but I'm pretty sure that in SQL you are going to need to import the table from a related database, from a first look at SQL documentation, I underestand that the complete code for SQL would be something like:
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(255),
age INT
);
LOAD DATA LOCAL INFILE 'customers.csv'
INTO TABLE customers
FIELDS TERMINATED BY ','
LINES TERMINATED BY '
';
and then, finally
SELECT * FROM customers
WHERE age > 30;
that vs:
import pandas as pd
df=pd.read('customers.csv')
df[df[age]>30]
Super helpful! Thank you
But will it blend?
Always do pandas..