Three Important Lessons You can Learn from Anscombe's Datasets

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ม.ค. 2025

ความคิดเห็น • 13

  • @darrylmorgan
    @darrylmorgan ปีที่แล้ว +2

    Really Interesting And Helpful Tutorial..Thank You

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว

      Thanks for the comment, I am glad you liked it.

  • @trident3652
    @trident3652 ปีที่แล้ว +2

    Again, a great video - thanks for all the hard work!

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      Glad you liked it and left a comment. It is hard, but also enjoyable work, especially if others appreciate it. So thanks again for the comment.

  • @Zoyfad
    @Zoyfad ปีที่แล้ว +2

    Thanks for the video. There were a few things I did not know.
    btw, a slightly different approach:
    library(tidyverse)
    library(HistData)
    anscombe %>%
    select(contains("y")) %>%
    summarise_all(~sd(.x))
    dictinary %
    pivot_longer(everything()) %>%
    extract(name, into = c("variable", "dataset"), regex = "(x|y)(\\d)", convert = TRUE) %>%
    left_join(dictinary, by = "dataset") %>%
    mutate(
    id = rep(1:11, each = 8),
    dataset = paste("dataset", dataset)
    ) %>%
    pivot_wider(names_from = variable) %>%
    group_by(description) %>%
    summarise(mean = mean(x))

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      Thanks for the additional code. I had to try it out right away. I was aware of the contains() function within select but haven't used much summarise_all yet, especially not with such an expression "~sd(.x)". It work quite well and I really like the dictionary and left_join approach. These things are really powerful and helpful and replace case_when write ups etc. I have to learn more about extract and regular expressions a bit more. Glad to have you as a follower and thanks again for the good comment. I hope others see it as well and learn from the code.

    • @Zoyfad
      @Zoyfad ปีที่แล้ว +1

      @@TheDataDigest glad to be here.
      ~ is the lambda function in R.
      I use `summarise_all(~mean(is.na(.x)))` to quickly find the parentage of all the missing values in all columns of a database.
      Or if I have a large table with 100 columns, where all the columns containing "pct" or "loc" should be divided by population column to get a relative value I use:
      mutate(across(contains(c("pct", "loc")), ~.x/population)) ; you can use this as well: mutate(across(where(is.numeric), ~.x/100)) to convert everything to proper percentages.
      example:
      iris %>%
      mutate(across(contains("Sepal"), ~.x/Petal.Width))

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว +1

      @@Zoyfad Thanks for clarifying the code even further. I think that would be a very useful video if I could collect time saving "hacks" like these that experienced R programmers like you found and use during their analysis. I will probably ask around on twitter as well and do some online research and then try to compile a top-10 list. :) Will see, so many topics one can cover....

  • @Iguan059
    @Iguan059 ปีที่แล้ว +1

    Great video, thanks a lot !

    • @TheDataDigest
      @TheDataDigest  ปีที่แล้ว

      Thanks for leaving a comment. I am glad you liked it. 😀

  • @ElGeneral106
    @ElGeneral106 ปีที่แล้ว +1

    Great Video

  • @shuchisingh237
    @shuchisingh237 4 หลายเดือนก่อน

    Sir pls explain anscombe quartet in any video

    • @TheDataDigest
      @TheDataDigest  4 หลายเดือนก่อน

      Hi Shuchi, what do you mean? I thought the video above explained the Anscome quartet?