Really great episode! An alternative to "random" jitter, which can cause overlapping data points, could be ggbeeswarm that might make it a bit easier to look at the distribution of the data. github.com/eclarke/ggbeeswarm
@@RasmusKirkegaard Thanks! In the next episode I'll talk about adding some summary layers to the plot and the tradeoff of having way too much stuff going on when you have more than a handful of taxa that you are looking at
Hi man, thank you for your guide! I've a tip for you to get the median line with a specific color. If you set "fill=variable, color=variable" and then in the stat_summary you add color="red", you keep all the lines and they change colour.
Excellent! I love how jittered strip plots show the actual data points. What if I have millions of data points and don't want to use box plots. Can heat-map like charts help? Do you have any experience with these?
You should check out the documentation for the stat_summary function. They hvae several examples. The default is to use the mean and standard error - ggplot2.tidyverse.org/reference/stat_summary.html
Thanks @russtin1! I covered those types of plots in other episodes. I'm not a fan of violin plots because I think they emphasize the mode rather than the median of the distribution and when you have a pretty uniform distribution, all you get are rectangles. For this much data, I'd prefer to use stat_summary with geom="pointrange" and fun.data=median_hilow. I cover this type of plot (and box plots) in the next episode of the series - th-cam.com/video/7TaGcHsoQpM/w-d-xo.html
What do you think? Besides the critiques I give of the jitter plot in the video, what do you see as the pros and cons of these plots?
Really great episode! An alternative to "random" jitter, which can cause overlapping data points, could be ggbeeswarm that might make it a bit easier to look at the distribution of the data. github.com/eclarke/ggbeeswarm
@@RasmusKirkegaard Thanks! In the next episode I'll talk about adding some summary layers to the plot and the tradeoff of having way too much stuff going on when you have more than a handful of taxa that you are looking at
Hi man, thank you for your guide! I've a tip for you to get the median line with a specific color. If you set "fill=variable, color=variable" and then in the stat_summary you add color="red", you keep all the lines and they change colour.
Great - thanks!
Excellent video
Excellent! I love how jittered strip plots show the actual data points. What if I have millions of data points and don't want to use box plots. Can heat-map like charts help? Do you have any experience with these?
Eh I’m not a big fan of heat map like things. Perhaps a violin plot? I think a boxplot is really your best bet
How to add error bar for standard deviation when using mean in crossbar?
You should check out the documentation for the stat_summary function. They hvae several examples. The default is to use the mean and standard error - ggplot2.tidyverse.org/reference/stat_summary.html
Nice plot. Did you reject box plots or violin plot because you lose sample size information?
Thanks @russtin1! I covered those types of plots in other episodes. I'm not a fan of violin plots because I think they emphasize the mode rather than the median of the distribution and when you have a pretty uniform distribution, all you get are rectangles. For this much data, I'd prefer to use stat_summary with geom="pointrange" and fun.data=median_hilow. I cover this type of plot (and box plots) in the next episode of the series - th-cam.com/video/7TaGcHsoQpM/w-d-xo.html
You R my hero!.. hahah got it? (-.-) :D
Hilarious! You’re not a dad are you? Because that was 💯 dad joke material 😂