Powerful R Functions Every Data Analyst Should Know

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ก.ค. 2024
  • In this video I show you how you can quickly calculate summary statistics in R for one or more categorical variables of the diamonds dataset.
    The summarize() function can be used to calculate many different results for one continuous variable. group_by() allows you to replicate these results for different levels of a categorical variable. And with pivot_wider() you can calculate a result of interest for the intersections of the levels of two categorical variables. With the geom_tile() function I show you how to produce a visualization of these results.
    ⏱ Time Stamps ⌚
    0:00 - Intro
    1:20 - exploring the diamonds data set
    3:02 - summarize() for summary statistics of price
    5:23 - group_by() for 1 category
    6:36 - pivot_wider() for 2 categories
    8:30 - dealing with missing data
    11:30 - heatmap to visualize the results
    Link to heatmap video:
    • How to Create Heatmaps...

ความคิดเห็น • 7

  • @floriansitte-kratzsch7355
    @floriansitte-kratzsch7355 ปีที่แล้ว +2

    Aggregation and pivoting are super central concepts in analysis. Neat summary!

  • @forrestoakley4882
    @forrestoakley4882 ปีที่แล้ว +2

    Your videos are always very useful. Thank you!

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      Thanks you, that's the purpose. Let me know if you want some specific topics covered or if you find something that I could have explained better. Always try to get better :)

  • @luisacuna1729
    @luisacuna1729 ปีที่แล้ว +4

    Amazing video, thanks!

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +2

      Thank you. I am glad you liked it. I will do more videos into the direction of data manipulation, base and dplyr R-functions and data exploration in the near future.

    • @luisacuna1729
      @luisacuna1729 ปีที่แล้ว +2

      @@TheDataDigest i Will stay tuned

  • @osoriomatucurane9511
    @osoriomatucurane9511 ปีที่แล้ว +1

    Awesome! That is a great trick piping into the table() and converting the output into a dataframe. I always wanted to use pivot_wider() to get proportions similar to base R prop.table() or crosstab. I got stuck, but now, where you summarize and get the mean I could instead get percentagem = n/sum(n)*100 and then proceed to pivot_wider() and get the proportion table. Am I right?
    As always I find your tutorials informative and surely commendable.