WIth pandas 0.23.0 and python >= 3.6, the order of columns specified in the dictionary is preserved. We do not need to specify addition column parameter. Thanks for the great series btw.
Great video! It gives a FutureWarning now when we concatenate (using pd.concat), saying "Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default". We can pass the sort=True to by-pass the warning.
Nice work. Now to extend this further consider this question: How do you assign values to 1) columns, 2) rows and 3) cells? Here I am hinting at using `apply`, `applymap` for columns and tables, then using the .ix, .iloc, .loc, .at., .iat methods to assigned specific cells. Example `df.loc[3, "State"] = '"California"'`
Hi! Thank you so much for this video. Not much of a 'Pythoner', I am looking for a way to subtract the trend from a computed time series numpy array x(t) and process the remainder after the subtraction. I think I need to pack the array into a Pandas data frame. So here your video exactly tells me how to proceed. But can you a suggest a Pandas function to get the trend of the x values now packed into the Dataframe as a function of time t?
I just released a video about the pandas MultiIndex: th-cam.com/video/tcRGa2soc-c/w-d-xo.html You might also like my "best practices with pandas" series: th-cam.com/play/PL5-da3qGB5IBITZj_dYSFqnd_15JgqwA6.html
Is there a way in which we can add a column called grades as a third column and give the grades according to certain criteria like between 70 to 80 it's B, between 95 - 102 it's A, something like that.
I'm trying to concat two dataframes with 2 columns of same name and same type. I used both concat and append methods and got same result. The format is correct, though there are Nan values being showed and both my dataframes have no missing values.
I'm sorry, it's hard to diagnose this problem without seeing your code and data. If you can post it as a notebook on GitHub Gist, I'd be happy to take a quick look and see if I can spot the problem.
I've watched 30 of your videos so far so thank you so much for the help. Could you make a video about emailing pandas dfs? Is there an easy way to format pandas dfs to a neat table in email? Does this require html knowledge?
hey guy first of all, thanks for your tips it helped me so much. Man i got a question for you and i think you can help me. i got TWO dataFrame like this. Order Company STATUS 1.1 TOTAL OPEN 1.3 CARRIES OPEN 1.4 SPEED FAIL 1.5 TOTAL OPEN 1.6 SPEED IN TRANSIT 1.7 TOTAL OPEN Order Company STATUS 1.1 TOTAL FINISHED 1.3 CARRIES OPEN 1.4 SPEED FAIL 1.5 TOTAL OPEN 1.6 SPEED FINISHED 1.7 TOTAL FINISHED I want to make a filter by company and then upgrade the STATUS making something like vlooup in excell, so get back it to the first dataframe. Im trying to do it but im kinda lost
A vlookup is essentially a merge in pandas. I don't have a public video about merging, but I do have a 30-minute lesson video on merging that is available to members of the Data School Insiders community. Click "December 19" on this page if you want to join Data School Insiders and check out the lesson: www.patreon.com/posts/master-list-of-25133912
With a for loop I'm iterating rows of a Data Frame for filtering. ( each rows a series ) Here is my question : *** How can i create a new data frame with filtered rows in loop or after loop ?
It's rare that you need to use a for loop with a DataFrame, including for filtering. Instead, you can filter using a condition. Check out this video and let me know if that helps: th-cam.com/video/2AFGPdNn4FM/w-d-xo.html
Hi Kevin, I'm getting an error when concatenating df and s. I'm using Python 3.6.6 (Conda) and Pandas 0.23.4. I get an error: ValueError: Shape of passed values is (3, 5), indices imply (3, 3) My code, after creating both the DF and the Series is: pd.concat([df, s], axis=1, sort=False) I've been looking over StackOverflow, and it seemed in newer versions the sort=False would do the trick. Unfortunately, I keep getting the same error as if it were sort=True or sort=None) Not including the sort=x would get me FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. The ValueError: Shape of passed values is (3, 5), indices imply (3, 3) remains the same. On the other hand, if I concatenate on axis=0 I get 5x3 df with 7 NaN values. Would you mind sharing your thoughts on this? Many thanks in advance! :)
Are you sure you don't have any typos? I verified the code in that version. You can see all of the code here: nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb
Hi, if my dictionaries have different lengths , I'm taking an error : 'arrays must all be same length' Actually, i had expected to see nan on the table for missing data . is there any simple way to do that ? :/
this is the one way to handle with this problem which i found for now DataFrame(dict([ (k,Series(v)) for k,v in d.iteritems() ])) but, it not looks so nice to that ..
I'm sorry, I don't understand what you mean by "my dictionaries have different lengths". I'm asking because all of the examples I show in the video contain only one dictionary, and you've indicated you're using multiple dictionaries. Could you share the code sample that is resulting in the error?
sorry for missing description of the problem. At your first sample ,we formed a id-color table and we used two dictionary which are same sized (each has 3 elements ) and passed them in pd.DataFrame() 'id ': [101, 102, 103] 'color ': ['red', 'blue','red'] if i have a missing information, for instance color info is missing for id = 102, --> (( 'color ': ['red', ,'red'] )) then pd.DataFrame() returns an error : 'arrays must all be same length' so do my dictionaries have to be perfect to create DataFrame ? :)
Got it, thanks for clarifying! Here's what you need to do: import numpy as np pd.DataFrame({'id':[100, 101, 102], 'color':['red', np.nan, 'red']}) 'nan' stands for "not a number", and is how you can explictly denote missing values. By the way, that is actually a single dictionary of length 2, meaning it has 2 key-value pairs. The dictionary values are lists of length 3. Just wanted to clarify some terminology... hope that helps!
Hi, I wanted to execute this below correlated query in PANDAS... I have searched many vedios on net but cant find a way to solve this in Pandas. Below SQL query doesn't work with dataframes..but works with tables.. (Basically , want to update one column of one dataframe(1st) with average values of a column from 2nd dataframe based on some condition.) Do you have any hint for solving these type of queries? update dataframe1 set dataframe1.average_x = ( select avg(dataframe2.rank) from dataframe2 where dataframe1.id=dataframe2.id & dataframe2.date >= dataframe1.date & dataframe2.date
could you please explain how to get a nested dictionary(like below) in to a dataframe. thanks in advance! nested_dict = { 'dictA': {'key_1': 'value_1', 'key_2': 'value_2', 'key_3': 'value_3' }, 'dictB': {'key_1': 'value_4', 'key_2': 'value_5', 'key_3': 'value_6'}, ..., ..., ...}
WIth pandas 0.23.0 and python >= 3.6, the order of columns specified in the dictionary is preserved.
We do not need to specify addition column parameter.
Thanks for the great series btw.
Great point, thank you!
Great video! I'm currently learning Python and this was exactly what I needed to do. Thanks again for sharing.
Awesome! Great to hear!
Great video!
It gives a FutureWarning now when we concatenate (using pd.concat), saying "Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default". We can pass the sort=True to by-pass the warning.
Thanks for sharing, much appreciated!
Excellent explanation. I like your teaching style. Thank you.
You're welcome!
I LOVE YOU, U SAVE MY FINAL PROJECT TUT
Awesome! Hope your project went well 👍
Nice work. Now to extend this further consider this question:
How do you assign values to 1) columns, 2) rows and 3) cells?
Here I am hinting at using `apply`, `applymap` for columns and tables, then using the .ix, .iloc, .loc, .at., .iat methods to assigned specific cells.
Example `df.loc[3, "State"] = '"California"'`
Hi! Thank you so much for this video. Not much of a 'Pythoner', I am looking for a way to subtract the trend from a computed time series numpy array x(t) and process the remainder after the subtraction. I think I need to pack the array into a Pandas data frame. So here your video exactly tells me how to proceed. But can you a suggest a Pandas function to get the trend of the x values now packed into the Dataframe as a function of time t?
How can I use conditional statement in Data Frame
can you please make some videos for more advanced topics like stack, melt, multi index, and more advanced time series methods?
Or let's say a complete series for intermediate/advanced users?
Thanks for the suggestion! It's definitely under consideration :)
I just released a video about the pandas MultiIndex: th-cam.com/video/tcRGa2soc-c/w-d-xo.html
You might also like my "best practices with pandas" series: th-cam.com/play/PL5-da3qGB5IBITZj_dYSFqnd_15JgqwA6.html
Can you show how to load a JSON data from the API into pandas dataframe.
Thank you!
You're welcome!
Great videos!
Thanks!
Great vid!
Thanks!
Thank you so much. ^^
You're welcome 😊
Hi Sir... Hope you are doing well. Love from INDIA
This might be of help to you: www.dataschool.io/launch-your-data-science-career-with-python/
I have a question at 2:21 the output you get. how i can replace 0 1 2 as the timestamp because i am getting the real time sensor data on serial port.
It depends on the exact format of the time sensor data. This video might be helpful to you: th-cam.com/video/yCgJGsg0Xa4/w-d-xo.html
Hope that helps!
Is there a way in which we can add a column called grades as a third column and give the grades according to certain criteria like between 70 to 80 it's B, between 95 - 102 it's A, something like that.
Hi! have you found the answer to your question? Cause I'm facing the same problem. Please help and thanks!
I'm trying to concat two dataframes with 2 columns of same name and same type. I used both concat and append methods and got same result. The format is correct, though there are Nan values being showed and both my dataframes have no missing values.
I'm sorry, it's hard to diagnose this problem without seeing your code and data. If you can post it as a notebook on GitHub Gist, I'd be happy to take a quick look and see if I can spot the problem.
Thanks for this!!
You're very welcome :)
I've watched 30 of your videos so far so thank you so much for the help. Could you make a video about emailing pandas dfs? Is there an easy way to format pandas dfs to a neat table in email? Does this require html knowledge?
I'm not sure the best way to do this, sorry!
hey guy first of all, thanks for your tips it helped me so much.
Man i got a question for you and i think you can help me.
i got TWO dataFrame like this.
Order Company STATUS
1.1 TOTAL OPEN
1.3 CARRIES OPEN
1.4 SPEED FAIL
1.5 TOTAL OPEN
1.6 SPEED IN TRANSIT
1.7 TOTAL OPEN
Order Company STATUS
1.1 TOTAL FINISHED
1.3 CARRIES OPEN
1.4 SPEED FAIL
1.5 TOTAL OPEN
1.6 SPEED FINISHED
1.7 TOTAL FINISHED
I want to make a filter by company and then upgrade the STATUS making something like vlooup in excell, so get back it to the first dataframe.
Im trying to do it but im kinda lost
A vlookup is essentially a merge in pandas. I don't have a public video about merging, but I do have a 30-minute lesson video on merging that is available to members of the Data School Insiders community. Click "December 19" on this page if you want to join Data School Insiders and check out the lesson: www.patreon.com/posts/master-list-of-25133912
Thank you..
You're welcome!
With a for loop I'm iterating rows of a Data Frame for filtering. ( each rows a series )
Here is my question :
*** How can i create a new data frame with filtered rows in loop or after loop ?
It's rare that you need to use a for loop with a DataFrame, including for filtering. Instead, you can filter using a condition. Check out this video and let me know if that helps: th-cam.com/video/2AFGPdNn4FM/w-d-xo.html
i watched it again and put the lack stones of the wall to solve my new problem :D
Sir I came across this function stack() in dataframe object. can you please help me in clarifying this ?
I don't have a short explanation of that function, but I'll consider covering it in a future video!
Hi Kevin, I'm getting an error when concatenating df and s.
I'm using Python 3.6.6 (Conda) and Pandas 0.23.4.
I get an error: ValueError: Shape of passed values is (3, 5), indices imply (3, 3)
My code, after creating both the DF and the Series is: pd.concat([df, s], axis=1, sort=False)
I've been looking over StackOverflow, and it seemed in newer versions the sort=False would do the trick. Unfortunately, I keep getting the same error as if it were sort=True or sort=None)
Not including the sort=x would get me FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. The ValueError: Shape of passed values is (3, 5), indices imply (3, 3) remains the same.
On the other hand, if I concatenate on axis=0 I get 5x3 df with 7 NaN values.
Would you mind sharing your thoughts on this?
Many thanks in advance! :)
Are you sure you don't have any typos? I verified the code in that version. You can see all of the code here: nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/pandas.ipynb
Hi,
if my dictionaries have different lengths , I'm taking an error : 'arrays must all be same length'
Actually, i had expected to see nan on the table for missing data .
is there any simple way to do that ? :/
this is the one way to handle with this problem which i found for now
DataFrame(dict([ (k,Series(v)) for k,v in d.iteritems() ]))
but, it not looks so nice to that ..
I'm sorry, I don't understand what you mean by "my dictionaries have different lengths". I'm asking because all of the examples I show in the video contain only one dictionary, and you've indicated you're using multiple dictionaries.
Could you share the code sample that is resulting in the error?
sorry for missing description of the problem. At your first sample ,we formed a id-color table and we used two dictionary which are same sized (each has 3 elements ) and passed them in pd.DataFrame()
'id ': [101, 102, 103]
'color ': ['red', 'blue','red']
if i have a missing information, for instance color info is missing for id = 102, --> (( 'color ': ['red', ,'red'] ))
then pd.DataFrame() returns an error : 'arrays must all be same length'
so do my dictionaries have to be perfect to create DataFrame ? :)
Got it, thanks for clarifying! Here's what you need to do:
import numpy as np
pd.DataFrame({'id':[100, 101, 102], 'color':['red', np.nan, 'red']})
'nan' stands for "not a number", and is how you can explictly denote missing values.
By the way, that is actually a single dictionary of length 2, meaning it has 2 key-value pairs. The dictionary values are lists of length 3. Just wanted to clarify some terminology... hope that helps!
Thanks Kevin. Terminology will continue to be a problem fro a while,
all freshmen has similar problem :D
Hi,
I wanted to execute this below correlated query in PANDAS... I have searched many vedios on net but cant find a way to solve this in Pandas.
Below SQL query doesn't work with dataframes..but works with tables.. (Basically , want to update one column of one dataframe(1st) with average values of a column from 2nd dataframe based on some condition.)
Do you have any hint for solving these type of queries?
update dataframe1 set
dataframe1.average_x = ( select avg(dataframe2.rank) from dataframe2
where
dataframe1.id=dataframe2.id &
dataframe2.date >= dataframe1.date &
dataframe2.date
I'm sorry, I won't be able to help you with this! Maybe check out resource 5 on this page? www.dataschool.io/best-python-pandas-resources/
what if index was duplicated in both tables?
I'm not sure I understand your question. However, the index does not have to be unique in a DataFrame.
could you please explain how to get a nested dictionary(like below) in to a dataframe. thanks in advance!
nested_dict = { 'dictA': {'key_1': 'value_1', 'key_2': 'value_2', 'key_3': 'value_3' },
'dictB': {'key_1': 'value_4', 'key_2': 'value_5', 'key_3': 'value_6'},
..., ..., ...}
I'm sorry, I don't have an immediate answer for you - I would need to play around with some code to figure it out. Good luck!
You rock! You say k (ok) too much though.
Ha! Well, everyone has their own unique way of speaking, and I have mine! :)