Great explanation! Even though I used pandas a lot, I never really understood how multi-indexing worked. Thank you so much for this tutorial, this helped me out a lot!
Hello from Hungary Kevin! After a period away from the technicalities of python working mostly with Google Analytics and CRO techniques I'm glad to be back to your videos playing around Jupyter. I appreciate a lot the work you do on Data School, so here is a bit of my honest appreciation, thank you and keep it up! John Ostrowski :)
Thank you for sharing your knowledge and experience beautifully. I am new to Python and Pandas, was struggling to understand how to select a subset of the data in a MultiIndex scenario. You made it a piece of cake by putting it all in one video, and by comparing and contrasting between Series and Data Frame.
Thanks for making this multi-index topic so easy for all of us. I understood, pivot table perfectly after this video. I am having difficulty in the following : I want to select all Symbols and calculate the difference between the closing price of all SYMBOLs from 4th to 05th .
This was so cool! I am already thinking of a dozen ways to use multi-indexing in my newbie projects and adding them to my little presentation next week. I owe a lot of my comfort with pandas to you. I hope your holidays went well, Kevin!
Thank you, this helped me out so much! I have one question however, I hope you can answer for me here: I have a multi index with ID as the first and Day as the second. Now I want to select all ID's that has more than 6 days of data. How can I pass this logical statement to the selector?
Would you normally go through all of this? It seems like one can just stick to groupby() without complicating the syntax for selection or changing the shape of the dataframe. In other words, have you encountered multi-indexing often in the field?
Great question! Personally, I try not to use the MultiIndex very much. That being said, it is still useful to be able to work with it when you have to.
time about 19:55 - when it comes to selecting rows according to values of deeper levels of MultiIndex, we can also do it that way: (assumption: 'stocks' has ['Symbol', 'Date'] MultiIndex set up) selection by one value: stocks[stocks.index.get_level_values(1) == '2016-10-04'] and selection by values from list: stocks[stocks.index.get_level_values(1).isin(['2016-10-03', '2016-10-04'])] and when the MultiIndex levels have their names (like here: 'Symbol', 'Date'), we can use these names in the 'get_level_values' method instead of numbers, e.g.: stocks.index.get_level_values('Date")
@@dataschool - and Reuven Lerner in his film 'Retrieving from a multi-index in Pandas' (time 9min 35 sec) shows one more way - with the 'xs' method: stocks.xs('2016-10-04', level='Date') but I think my way is most flexible one - for example we can make logical expressions like this one (it corresponds to Reuven's dataframe): df[ df.index.get_level_values('Year').isin([1993, 1920, 2010]) & df.index.get_level_values('Sport').isin(['Archery', 'Judo']) & (df.Age < 24.0) ]
I am trying to merge from pivot table and although it succeeds, it gives an error. merging between different levels can give an unintended result (1 level on the left,2 on the right) warnings.warn(msg, UserWarning) I tried to check if it is because of multi-indexing, it wasn't. not sure how to resolve it
Nice video, anyway I would have liked a final section in which you could have explained the usage of logic over the selection on multi-indexes. Eg: what if I wanted to select data for each symbol and for each date except one? Is it possible to use the same logic as .loc with simple indexes? Thanks :)
Thank you for sharing this information. I would like to ask if I wanted to get this to do portfolio optimization so calculating expected returns and covariance on multi-index dataframe how would you set this up?
Great video! is there a way to must index the column so there is an outer column and inner column? And if so how do you sort that? Because I know sort_column is not a callable function on the datafram
Thanks Alot, really a good series for pandas and easy to learn. I would request if you can make a video on Iteration in pandas like iterrows, itertuples and iteritems. Many Thanks In advance.
I learn something from your video👌but I have some doubts. Kindly explain me for example we are having a issues in our computer in different dates . We need to filter out with O/P as computer type, computer model, windows 10,7,8 , dates in sequence it got fault
very good explanation in this vedio on multi-indexes, could you please explain, why do we need multi levels of columns and multilevels of row labels and again there is droplevel option, could you please explain on this.
Hoping a new version of Pandas will fix the slice(None) necessity and accept the : instead. This is the type of complexity we can spend a lot of time before we find the solution.
Hi Thanks for the video, I learned a lot, I currently struggling with concat, join and merge functions in Pandas, do you have any videos to help me understand it better. Thanks
Thank you for the videos! I have an additional question on the slice(None) object you made. Let's say you have 100 more symbols and I don't want all of the symbols but want a specific slice of symbols (for example the all symbols after "CSCO" ) how would I do that? stock.loc[("CSCO": , ['2016-10-03', '2016-10-04']), : ] does not work :(
hello Kevin, great video, got 2 questions for you. data looks like this "date","time","open","high","low","close" 04/21/82,10:01,528.55,528.55,528.45,528.45 04/21/82,10:03,528.40,528.45,528.40,528.45 04/21/82,10:04,528.40,528.40,528.35,528.40 04/21/82,10:06,528.45,528.45,528.40,528.40 I am trying to do a day HIGH/LOW studies on 1-min S&P futures data. I have 2 questions. 1) How could I print the day HIGH / LOW for each day 2) How could I print the day HIGH / LOW for the first hour of trading I've tried the followings: df.groupby(['date']).high.max() and got this date 1982-04-21 529.80 1982-04-22 530.60 1982-04-23 531.95 1982-04-26 532.80 I am kind of stuck here. I want to have the output displayed all in 2 lines day by day like the following. Date, Time, High Date, Time, Low Could you please help ? thanks a lot in advance. I appreciate your time. data type date datetime64[ns] time datetime64[ns] open float64 high float64 low float64 close float64 dtype: object ptcm2011@gmail.com
Thanks. Do you know how to change the background color in dataframe.plot graphs ( the background on which there are: labels and graph name). In subplots you can use 'facecolor' param, but it doesn't work in plots generated from multiindex dataframes.
Multiindex is very similar to the functionality of the xarray package. And the pandas creators recommend using xarray for multiple dimensions, especially >3. Does anyone has experience with pandas Multiindex vs xarray? What would be the better choice?
Thanks. The aggregation function of the pivot table can be specified as follows: df = stocks.pivot_table(values='Close', index='Symbol', columns='Date', aggfunc=min) df = stocks.pivot_table(values='Close', index='Symbol', columns='Date', aggfunc=max) df = stocks.pivot_table(values='Close', index='Symbol', columns='Date', aggfunc='mean') # the need for the quotes confused me
pivot_table is used when you have multiple values for each row and column combination. those multiple values can be aggregated (like mean, median etc). pivot is used when you have only one value for one row and column combination. the default aggfunc for pivot_table is mean and if you have one value for a row and column combination, both pivot_table and pivot will return same result.
5:30 - How to combine 2 multiindex columns in one but in different rows? So first row will be AAPL(combined) - Close - Volume. Second row AAPL 2016-10-03 - Close Volume. Third row AAPL 2016-10-04 - Close - Volume and so on???
Hi , thanks for great videos, I have a question , If I have many csv file in one folder each file is data frame of symbol of stocks (APPL.csv, BA.csv, CSCO.csv, MSFT.csv,....) but I just wanna pick a few of them but not all, instead of I use pd.read_csv one by one it's so slow and manually so I would like to use for loop but I still not find out the way. Would you please help me. Thank you
It's rare to find exactly what you're looking for when it comes to Pandas but this was it.
Great to hear! 😄
Life saver video! i have been panic for 2 days , read many pieces and still confused. this helped me out! thank you!
You're very welcome!
Slice(None) trick was what I looked everywhere to find and found it here. Thank you so much!
You're very welcome! :)
slice(None) instead of : was a very important input. I struggled with it for quite a while before watching this video. Thank you.
You're very welcome!
I couldn't find the Corey Schafer video on MultiIndex so I'm here.
And I'm not disappointed... subscribed!
Welcome to the channel! 👋
24 hours struggling to get the answer, and the answer just single line *unstack()* method. Thank you so much, I'll subscribe immediately!
Great to hear!
You made this complicated taboo topic so simple to understand. Worth half an hour i spent
Thanks for your kind words!
Great explanation! Even though I used pandas a lot, I never really understood how multi-indexing worked. Thank you so much for this tutorial, this helped me out a lot!
Glad it was helpful!
Hello from Hungary Kevin! After a period away from the technicalities of python working mostly with Google Analytics and CRO techniques I'm glad to be back to your videos playing around Jupyter. I appreciate a lot the work you do on Data School, so here is a bit of my honest appreciation, thank you and keep it up! John Ostrowski :)
Thanks so much John! :)
Thank you for sharing your knowledge and experience beautifully. I am new to Python and Pandas, was struggling to understand how to select a subset of the data in a MultiIndex scenario. You made it a piece of cake by putting it all in one video, and by comparing and contrasting between Series and Data Frame.
That's awesome to hear! 🙌
ive been stuck trying to understand this reading tru the documentation for ages. 10 mins and i get it this vid is great thanks
Awesome to hear!
Thanks for helping us out with multiindexing in pandas...Cleared my daunting confusions about it.
Happy to help!
19:50 "slice(None)", I never would have figured that out..... thank you!
You're welcome!
Your lesson is sooo simple and clear. I found everything I wanted to know. Thank you
Great to hear!
Thanks for making this multi-index topic so easy for all of us. I understood, pivot table perfectly after this video.
I am having difficulty in the following : I want to select all Symbols and calculate the difference between the closing price of all SYMBOLs from 4th to 05th .
Brilliant tutorial. This is the 2nd of yours that I have watched and it covered what I need in my new job. Thank you very much.
You're very welcome!
You saved my life with slice(None) (19:45). Thank you so much.
You're welcome!
This is really a value adding video for data specialists. Thanks a lot for this brother.
Glad it was helpful!
Finallllly, I understand how the multi-index works. Thanks man.
You're welcome!
Thanks a lot. Including slice(None) was very helpful.
Great to hear - you're very welcome!
As always great explaination. You are a gifted teacher, so surely well worth the Patreon membership for you.
Thank you, Caroline, both for your kind words and for your support through Data School Insiders! :)
Thanks for the excellent tutorial, I have been confused by this issue for a long time.
Glad it helped!
Extremely informative, thanks for making it so simple - really good job
Thank you! 🙏
This was so cool! I am already thinking of a dozen ways to use multi-indexing in my newbie projects and adding them to my little presentation next week. I owe a lot of my comfort with pandas to you. I hope your holidays went well, Kevin!
Awesome, thanks so much for your kind comments, Adam! 👌
Thank you sir. I will shoot you a donation if/when my quant journey presents some alpha. You are a lifesaver.
Thank you so much!
u r the bestest ever teacher to me.......damn lucid ua guideliness......thanks a tonne sir
Thank you! :)
Thanks a lot for this simplified Explanation 😍
You're very welcome!
pandas is awesome and you are too! I cant belive that pandas is free and this video is free. Thank you.
Thank you so much! 😄
Thank you so much. You are such a talented teacher. Where can I find out how I can structurally learn courses from you? Thank you and greetings!
Thanks! This post might be helpful to you: www.dataschool.io/launch-your-data-science-career-with-python/
Thanks for the video! I have a question. How to access a particular index without naming the index, like say without naming 'AAPL'
Very good explanation of a confusing topic. Thanks you.
Glad it was helpful to you!
Thank you. Nice explanation. how to get max values with the dates
Thank you gazillion, you are the best! I finally understand the functionality of the Groupby method!
Great to hear!
Thank you, this helped me out so much! I have one question however, I hope you can answer for me here:
I have a multi index with ID as the first and Day as the second. Now I want to select all ID's that has more than 6 days of data. How can I pass this logical statement to the selector?
Would you normally go through all of this? It seems like one can just stick to groupby() without complicating the syntax for selection or changing the shape of the dataframe. In other words, have you encountered multi-indexing often in the field?
Great question! Personally, I try not to use the MultiIndex very much. That being said, it is still useful to be able to work with it when you have to.
Very helpful. I will be back with questions about machine learning.
Great! I've got a free course: courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn
This is the solution I was looking for. Thanks!
You're welcome!
Thanks. This topic had been bugging me but you make it seem really simple.
It is complex, but glad you are feeling good about it now!
thank you so much, finally I could change from a Dataframe to a multiindex
You're very welcome!
Clear explanation with examples. This is exactly what I needed to know. You have yourself a new subscriber.
Awesome! Thanks :)
time about 19:55 - when it comes to selecting rows according to values of deeper levels of MultiIndex, we can also do it that way:
(assumption: 'stocks' has ['Symbol', 'Date'] MultiIndex set up)
selection by one value:
stocks[stocks.index.get_level_values(1) == '2016-10-04']
and selection by values from list:
stocks[stocks.index.get_level_values(1).isin(['2016-10-03', '2016-10-04'])]
and when the MultiIndex levels have their names (like here: 'Symbol', 'Date'), we can use these names in the 'get_level_values' method instead of numbers, e.g.:
stocks.index.get_level_values('Date")
Thanks so much for sharing!
@@dataschool - and Reuven Lerner in his film 'Retrieving from a multi-index in Pandas' (time 9min 35 sec) shows one more way - with the 'xs' method:
stocks.xs('2016-10-04', level='Date')
but I think my way is most flexible one - for example we can make logical expressions like this one (it corresponds to Reuven's dataframe):
df[
df.index.get_level_values('Year').isin([1993, 1920, 2010]) &
df.index.get_level_values('Sport').isin(['Archery', 'Judo']) &
(df.Age < 24.0)
]
Perfect video for what I needed. Thanks!
You're welcome!
The most helpful video I've ever seen about Python. Thanks a lot!
Awesome! Glad it was helpful to you :)
Thank you so much. 'unstack()' has alleviated my headache.
You're welcome!
u make things very simple....easy to understand ....thanks man
You're welcome!
I am trying to merge from pivot table and although it succeeds, it gives an error.
merging between different levels can give an unintended result (1 level on the left,2 on the right)
warnings.warn(msg, UserWarning)
I tried to check if it is because of multi-indexing, it wasn't. not sure how to resolve it
Thanks so much for this video. Really great to see a range of options with such clear advice.
You're very welcome!
man thank you soooooooooooooo much, I had been stuck on one thing for hours and finally figured it out!! !
You're so very welcome!
Well explined. Would be great a video about data visualization with multiindex data structures also
Nice video, anyway I would have liked a final section in which you could have explained the usage of logic over the selection on multi-indexes. Eg: what if I wanted to select data for each symbol and for each date except one? Is it possible to use the same logic as .loc with simple indexes? Thanks :)
Thanks for your suggestion!
Great content man! Exactly what I needed!
Great to hear! 🙌
Thanks alot. Very useful. Can u teach how to use multiple groupby in single expression.
Excellent - yes a few 💡 moments indeed. Very good
Great to hear!
Thank you for sharing your knowledge!
My pleasure!
Why slice(None).... omg. Thank you for your amazing video! Substribed
Thanks!
Thank you for sharing this information. I would like to ask if I wanted to get this to do portfolio optimization so calculating expected returns and covariance on multi-index dataframe how would you set this up?
Great video! is there a way to must index the column so there is an outer column and inner column? And if so how do you sort that? Because I know sort_column is not a callable function on the datafram
Thanks Alot, really a good series for pandas and easy to learn.
I would request if you can make a video on Iteration in pandas like iterrows, itertuples and iteritems.
Many Thanks In advance.
I think I cover it in this video: th-cam.com/video/B-r9VuK80dk/w-d-xo.html
Excellent Tutorial ! Thank You!
You're welcome!
Thank you ! Your video is very helpful!
You're welcome!
Thanks for additional topics. Thanks for your time.
You're welcome!
Hi Mark! Great video on a very Syntax confusing matter such as multiindex. Thank you very much!
You're very welcome!
I learn something from your video👌but I have some doubts. Kindly explain me for example we are having a issues in our computer in different dates . We need to filter out with O/P as computer type, computer model, windows 10,7,8 , dates in sequence it got fault
it's so helpful!!! thank you so much!
You're so welcome!
Nice presentation Kevin well done. How about accessing a range of dates for say apple, date:date ?
I'm not sure if that can be done using the colon, I'd have to check...
15:06 indeed… you are eye opener… Also I’m wondering can we able to update cell by condition in side multi index value?
unstack() + reset_index() for the win!
Great explanation, very clear, thank you!
Thanks!
As clear as day now , thank you
You're welcome!
Thanks for the video, I found it very useful and enjoyable to watch :)
Great to hear!
very good explanation in this vedio on multi-indexes, could you please explain, why do we need multi levels of columns and multilevels of row labels and again there is droplevel option, could you please explain on this.
The short answer is that you should use a MultiIndex if it helps you to represent the structure of the data.
How to use range selection (:) while inner and outer row labels are inside tuples? Use slice which probably did not allow range selection. Thank you!
Sorry, it's hard for me to say off-hand, good luck!
Thanks for the video but i have a question how to merge columns and indexes in order to not have blank spaces in headers in my html view ?
Thanks for the thorough explanation. Very helpful.
Great to hear!
Hoping a new version of Pandas will fix the slice(None) necessity and accept the : instead. This is the type of complexity we can spend a lot of time before we find the solution.
Extremely clear, thank you
You're welcome!
Thank you. I have learned so much from you already...
Great to hear!
Exactly what I was looking for. Thanks :)
You're welcome!
This is awesome, thanks for making this Video Kevin.
You are very welcome!
Top shelf content right here
Thank you!
Thanks Kevin. That's brillant explanation.
Awesome! Thanks for your kind words :)
Hi Thanks for the video, I learned a lot, I currently struggling with concat, join and merge functions in Pandas, do you have any videos to help me understand it better. Thanks
I cover concat in this video: th-cam.com/video/15q-is8P_H4/w-d-xo.html
But you should watch this video first: th-cam.com/video/OYZNk7Z9s6I/w-d-xo.html
Thanks man! You helped me a lot!
Great to hear!
Thank you for the videos! I have an additional question on the slice(None) object you made. Let's say you have 100 more symbols and I don't want all of the symbols but want a specific slice of symbols (for example the all symbols after "CSCO" ) how would I do that?
stock.loc[("CSCO": , ['2016-10-03', '2016-10-04']), : ] does not work :(
Slice(some)
Great video and many thanks. Greetings from Amsterdam
Thanks for watching!
hello Kevin, great video, got 2 questions for you.
data looks like this
"date","time","open","high","low","close"
04/21/82,10:01,528.55,528.55,528.45,528.45
04/21/82,10:03,528.40,528.45,528.40,528.45
04/21/82,10:04,528.40,528.40,528.35,528.40
04/21/82,10:06,528.45,528.45,528.40,528.40
I am trying to do a day HIGH/LOW studies on 1-min S&P futures data.
I have 2 questions.
1) How could I print the day HIGH / LOW for each day
2) How could I print the day HIGH / LOW for the first hour of trading
I've tried the followings:
df.groupby(['date']).high.max()
and got this
date
1982-04-21 529.80
1982-04-22 530.60
1982-04-23 531.95
1982-04-26 532.80
I am kind of stuck here. I want to have the output displayed all in 2 lines day by day like the following.
Date, Time, High
Date, Time, Low
Could you please help ? thanks a lot in advance. I appreciate your time.
data type
date datetime64[ns]
time datetime64[ns]
open float64
high float64
low float64
close float64
dtype: object
ptcm2011@gmail.com
nice work! ..but why dont you add this video into 'Data analysis in Python with pandas' playlist. is this a seperate topic?
Great idea! I have added it there as video 31: th-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
Thanks. Any advice hot to plot multiindex?
Thanks. Do you know how to change the background color in dataframe.plot graphs ( the background on which there are: labels and graph name). In subplots you can use 'facecolor' param, but it doesn't work in plots generated from multiindex dataframes.
Great videos may I request to show how to use .iloc with multi-index
Thanks for your suggestion!
Thank you very much for this explanation ❤️
You're welcome 😊
great tutorial
Thank you!
Multiindex is very similar to the functionality of the xarray package. And the pandas creators recommend using xarray for multiple dimensions, especially >3. Does anyone has experience with pandas Multiindex vs xarray? What would be the better choice?
I haven't used xarray, sorry!
Very useful as always. Many thanks!
My pleasure!
Thanks. How to filter multiindex?
Thanks. The aggregation function of the pivot table can be specified as follows:
df = stocks.pivot_table(values='Close', index='Symbol', columns='Date', aggfunc=min)
df = stocks.pivot_table(values='Close', index='Symbol', columns='Date', aggfunc=max)
df = stocks.pivot_table(values='Close', index='Symbol', columns='Date', aggfunc='mean') # the need for the quotes confused me
Thanks for sharing!
thanks! can u make a plot with multiindex?
whats the difference between pivot() and pivot_table(). I tried both and it worked fine
pivot_table is used when you have multiple values for each row and column combination. those multiple values can be aggregated (like mean, median etc). pivot is used when you have only one value for one row and column combination. the default aggfunc for pivot_table is mean and if you have one value for a row and column combination, both pivot_table and pivot will return same result.
@@funnybunnysunny thank you very much for the explanation sir
5:30 - How to combine 2 multiindex columns in one but in different rows?
So first row will be AAPL(combined) - Close - Volume. Second row AAPL 2016-10-03 - Close Volume. Third row AAPL 2016-10-04 - Close - Volume and so on???
Hi , thanks for great videos, I have a question , If I have many csv file in one folder each file is data frame of symbol of stocks (APPL.csv, BA.csv, CSCO.csv, MSFT.csv,....) but I just wanna pick a few of them but not all, instead of I use pd.read_csv one by one it's so slow and manually so I would like to use for loop but I still not find out the way. Would you please help me. Thank you