Powerful R Functions Every Data Analyst Should Know
ฝัง
- เผยแพร่เมื่อ 23 ก.ค. 2024
- In this video I show you how you can quickly calculate summary statistics in R for one or more categorical variables of the diamonds dataset.
The summarize() function can be used to calculate many different results for one continuous variable. group_by() allows you to replicate these results for different levels of a categorical variable. And with pivot_wider() you can calculate a result of interest for the intersections of the levels of two categorical variables. With the geom_tile() function I show you how to produce a visualization of these results.
⏱ Time Stamps ⌚
0:00 - Intro
1:20 - exploring the diamonds data set
3:02 - summarize() for summary statistics of price
5:23 - group_by() for 1 category
6:36 - pivot_wider() for 2 categories
8:30 - dealing with missing data
11:30 - heatmap to visualize the results
Link to heatmap video:
• How to Create Heatmaps...
Aggregation and pivoting are super central concepts in analysis. Neat summary!
Your videos are always very useful. Thank you!
Thanks you, that's the purpose. Let me know if you want some specific topics covered or if you find something that I could have explained better. Always try to get better :)
Amazing video, thanks!
Thank you. I am glad you liked it. I will do more videos into the direction of data manipulation, base and dplyr R-functions and data exploration in the near future.
@@TheDataDigest i Will stay tuned
Awesome! That is a great trick piping into the table() and converting the output into a dataframe. I always wanted to use pivot_wider() to get proportions similar to base R prop.table() or crosstab. I got stuck, but now, where you summarize and get the mean I could instead get percentagem = n/sum(n)*100 and then proceed to pivot_wider() and get the proportion table. Am I right?
As always I find your tutorials informative and surely commendable.