Great visualisation, it helps to understand :) I have also just started tutorials but on ML in Python, yours are already very professional, looking forward what's next.
Hi Kimberly, this is really interesting and clear comparison between box and violin plots. I got a question about the whiskers of violin plots. Does it also show the minimum and maximum of the samples excluding the outliers as the box plot did? Another question is the upper and lower bound values of the violin plots, do they represent the minimum and maximum values of the samples in including the outliers? Thanks
Great questions! The inner "whiskers" of the violin plot are the same ones that you will see for the boxplot (min/max excluding outliers) by default. You can change what's displayed in the center of your violin plot, however, by referencing its "inner" argument. The violin plot typically extends beyond what you see for a box plot though because its borders are KDE plots. The KDE plot is built by adding up little kernels (typically a small Gaussian centered about each point) for your data. Even points farthest away from the center get their own kernels (Gaussians centered at the point), which makes the violin boundaries extend even further than the maximum or minimum values.
Typically, the whiskers in both the violin plot and the box plot will mark the 25% to 75% percentiles; note that the violin plot whiskers are marked in black and are usually inside the violin. The violin plot will often extend beyond the whiskers because it shows the entire KDE and does not truncate or show outliers like the box plot does. The left vs right whisker should show the same thing in the violin plot and the box plot, but again, the violin plot whiskers are marked by the black line inside the violin.
Hi kimberly, Thanks for this wonderful tutorial. At 7:36, when you say " violin plot is actually doing is first splitting our data into four and six cylinders and then scaling by count", I do not fully understand what is happening here. It would be helpful if someone explains this.
Glad you are enjoying the tutorials! What I meant by that is that seaborn first groups by cylinder then scales the widths of the violin by the proportion of each origin. In this example, there were 130 4-cylinder cars roughly 50% from each origin, so the width of the two components (blue and orange) are roughly the same. There were only 10 6-cylinder cars: 60% from Japan and 40% from Europe, so for the second violin, the blue side is wider, occupying 60% of the total width. Right after this part though, I say this is a bit misleading because the width didn't tell us that there were 130 4-cylinder cars vs only 10 6-cylinder cars because both violins were normalized to the same total width. Setting scale_hue=False allows us to scale over all the counts in all segments, making the second violin (with only 10 observations) much narrower.
If you enjoyed 😄, please subscribe and check out my full "Introduction to Seaborn" playlist: th-cam.com/play/PLtPIclEQf-3cG31dxSMZ8KTcDG7zYng1j.html
Just learned it at college today, didn't figure it out in class, but here it helps a lot. Thank you!
Feeling honored coz we are getting knowledge from ph.D professionalist. Thank you so much, dude!
You are most welcome - cheers!
@@KimberlyFessel Thanks!
Thank you so much for such a clear and detailed video
You are most welcome - cheers!
God bless you proffesor
Thank you!
You look like a scientific version of Ema Watson 😅 love your videos Professor ❤️
Great visualisation, it helps to understand :) I have also just started tutorials but on ML in Python, yours are already very professional, looking forward what's next.
Thank you -- glad the visuals were helpful!
Subscribed! Please keep going with such videos.
Thanks for the support -- will do!
Thanks for this video. I am confused about how to read the violin plot. Are there some cases where boxplots are better than Violinplots?
Hi Kimberly, this is really interesting and clear comparison between box and violin plots. I got a question about the whiskers of violin plots. Does it also show the minimum and maximum of the samples excluding the outliers as the box plot did? Another question is the upper and lower bound values of the violin plots, do they represent the minimum and maximum values of the samples in including the outliers? Thanks
Great questions! The inner "whiskers" of the violin plot are the same ones that you will see for the boxplot (min/max excluding outliers) by default. You can change what's displayed in the center of your violin plot, however, by referencing its "inner" argument. The violin plot typically extends beyond what you see for a box plot though because its borders are KDE plots. The KDE plot is built by adding up little kernels (typically a small Gaussian centered about each point) for your data. Even points farthest away from the center get their own kernels (Gaussians centered at the point), which makes the violin boundaries extend even further than the maximum or minimum values.
@@KimberlyFessel Thanks, Kimberly, excellent explanation!
Thank you, your video helps me alots
good video
good explanation
please create more videos :)
Thanks very much, and I will definitely create more videos! Any requests?
please make more videos , with different visual libraries !
Thanks much! I'm currently working on some matplotlib content. Also considering Plotly and Altair.
thank you
Most welcome!
I really enjoy your videos, informational and to the point explanations!
Glad you are enjoying my videos!
Thank you
also, you look the cutest with your hair like this
Thanks much!
This channel is so underrated. you are worth of at least 2 million subscribers to be honest. Keep up the good work
awesome.. such a clear explanation. thank you soooooooooo much!
So cheerful explanation. Thank you, ma'am!
Great explanation mam 😄
thank you so much
I see, left and right whiskers in violin plot are not of same length. but in box plot, they are of same length. how?
Typically, the whiskers in both the violin plot and the box plot will mark the 25% to 75% percentiles; note that the violin plot whiskers are marked in black and are usually inside the violin. The violin plot will often extend beyond the whiskers because it shows the entire KDE and does not truncate or show outliers like the box plot does. The left vs right whisker should show the same thing in the violin plot and the box plot, but again, the violin plot whiskers are marked by the black line inside the violin.
Thank you Dr. Fessel :) My proteomics figures are going to be fire now!
🔥 Nice!! 🔥
Good it's helps us to learn easily..,...thanks for the vedieos
Glad you enjoyed!
Hi kimberly, Thanks for this wonderful tutorial. At 7:36, when you say " violin plot is actually doing is first splitting our data into four and six cylinders and then scaling by count", I do not fully understand what is happening here. It would be helpful if someone explains this.
Glad you are enjoying the tutorials! What I meant by that is that seaborn first groups by cylinder then scales the widths of the violin by the proportion of each origin. In this example, there were 130 4-cylinder cars roughly 50% from each origin, so the width of the two components (blue and orange) are roughly the same. There were only 10 6-cylinder cars: 60% from Japan and 40% from Europe, so for the second violin, the blue side is wider, occupying 60% of the total width. Right after this part though, I say this is a bit misleading because the width didn't tell us that there were 130 4-cylinder cars vs only 10 6-cylinder cars because both violins were normalized to the same total width. Setting scale_hue=False allows us to scale over all the counts in all segments, making the second violin (with only 10 observations) much narrower.
bro tamil la
It wil be better if u use CSV file data to show those graph.
Thanks for the feedback!
Thank you! Very good explanation
You're very welcome! Glad you found my explanation useful. 😄
It was awesome, thanks Kimberly
Very glad to hear that - most welcome!
Thanks
Welcome!