Tidy Data and tidyr -- Pt 2 Intro to Data Wrangling with R and the Tidyverse

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ต.ค. 2024
  • Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. Keep your code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
    tidyr.tidyverse...
    tidyr.tidyverse...
    tidyr.tidyverse...
    tidyr.tidyverse...
    tidyr.tidyverse...
    ----------------
    Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup • What is data wrangling...
    /01:44 Intro and what’s covered
    Ground Rules
    /02:40 What’s a tibble
    /04:50 Use View
    /05:25 The Pipe operator:
    /07:20 What do I mean by data wrangling?
    Pt. 2: Tidy Data and tidyr • Tidy Data and tidyr --...
    00:48 Goal 1 Making your data suitable for R
    01:40 `tidyr` “Tidy” Data introduced and motivated
    08:10 `tidyr::gather`
    12:30 `tidyr::spread`
    15:23 `tidyr::unite`
    15:23 `tidyr::separate`
    Pt. 3: Data manipulation tools: `dplyr` • Data Manipulation Tool...
    00.40 setup
    /02:00 `dplyr::select`
    /03:40 `dplyr::filter`
    /05:05 `dplyr::mutate`
    /07:05 `dplyr::summarise`
    /08:30 `dplyr::arrange`
    /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
    /11:45 `dplyr::group_by`
    /15:00 `dplyr::group_by`
    Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins • Working with Two Datas...
    Combining two datasets together
    /00.42 `dplyr::bind_cols`
    /01:27 `dplyr::bind_rows`
    /01:42 Set operations
    `dplyr::union`, `dplyr::intersect`, `dplyr::set_diff`
    /02:15 joining data
    `dplyr::left_join`, `dplyr::inner_join`, `dplyr::right_join`, `dplyr::full_join`,
    ______________________________________________________________
    Cheatsheets: www.rstudio.co...
    Documentation:
    `tidyr` docs: tidyr.tidyverse.org/reference/
    `tidyr` vignette: cran.r-project...
    `dplyr` docs: dplyr.tidyverse...
    `dplyr` one-table vignette: cran.r-project...
    `dplyr` two-table (join operations) vignette: cran.r-project...
    ______________________________________________________________

ความคิดเห็น • 29

  • @MikeDolanFliss
    @MikeDolanFliss 4 ปีที่แล้ว +8

    Looking forward to an update on this for the new pivot_longer() and pivot_wider() grammar!

    • @keanujack605
      @keanujack605 3 ปีที่แล้ว

      I know Im randomly asking but does anyone know of a method to get back into an instagram account?
      I stupidly lost my account password. I would appreciate any assistance you can give me.

    • @mackgerardo4152
      @mackgerardo4152 3 ปีที่แล้ว

      @Keanu Jack Instablaster =)

    • @keanujack605
      @keanujack605 3 ปีที่แล้ว

      @Mack Gerardo i really appreciate your reply. I found the site thru google and Im trying it out now.
      Takes quite some time so I will reply here later with my results.

  • @chamodhperera2485
    @chamodhperera2485 4 ปีที่แล้ว +3

    Can't download EDAWR from github.
    Error: Failed to install 'EDAWR' from GitHub:
    (converted from warning) cannot remove prior installation of package ‘backports’

  • @eliebordron5599
    @eliebordron5599 4 ปีที่แล้ว

    I learned just so much by watching this. I regret I wasn't able to download the datasets, I don't know if it's me or the venerable age of the video

  • @My-NaMeS_jEfF
    @My-NaMeS_jEfF 3 ปีที่แล้ว

    Do we really want to pivot_wider pollution?
    pollution %>%
    ggplot(aes(city, amount, group = size))+
    geom_bar(aes(fill = size), stat = 'identity', position = 'dodge')

  • @comditek4264
    @comditek4264 5 ปีที่แล้ว +2

    Great video! nicely explained and well delivered with graphics!

  • @johnsonmshiu4837
    @johnsonmshiu4837 4 ปีที่แล้ว +1

    Hi, is it possible to use function separate() to separate more than one column using the pipe operator or any other method? thanks

  • @tekoeko
    @tekoeko 3 ปีที่แล้ว

    So are gather and spread replaced by pivot_longer and pivot_wider?

  • @shibukalidhasan5815
    @shibukalidhasan5815 6 ปีที่แล้ว +1

    Nicely presented - short and succinct

  • @SensiStarToaster
    @SensiStarToaster 4 ปีที่แล้ว +1

    Out of date! Please post update with *pivot_* functions, scoped variables and something on non-standard evaluation pleeeeze....

  • @eyadha1
    @eyadha1 2 ปีที่แล้ว

    thank you very much for this helpful video

  • @tamal_sen
    @tamal_sen 6 ปีที่แล้ว +2

    @Garret : Please advise how to import the data sets? I have installed "devtools" package, but unable to install package "EDAWR". Looking for your help. thank you .

    • @williambiggs2308
      @williambiggs2308 5 ปีที่แล้ว +1

      package ‘EDAWR’ is not available (for R version 3.4.1)

    • @VercingetoR3x
      @VercingetoR3x 5 ปีที่แล้ว

      @@williambiggs2308 Using anaconda, how does one create an environment with an older version of base-r (3.5.1)? Is base-r 3.4.1 needed to access the EDAWR package?

    • @FancyTreer032
      @FancyTreer032 5 ปีที่แล้ว +1

      HI, you can create theme by yourself.
      country

    • @amendez721
      @amendez721 4 ปีที่แล้ว

      @@FancyTreer032 thank you very much!

  • @AkshayRasal10
    @AkshayRasal10 4 ปีที่แล้ว

    Great Video - Well explained and Easy to understand

  • @sadiafarzana6208
    @sadiafarzana6208 4 ปีที่แล้ว

    Excellent presentation

  • @YouTube_ZMS
    @YouTube_ZMS 5 ปีที่แล้ว

    I would prefer more coding examples. 8 minutes in before tidyr package is even introduced. Lets goooooooo

  • @ゴリラ-w3h
    @ゴリラ-w3h 5 ปีที่แล้ว +2

    すごく分かりやすい!

  • @kvs123100
    @kvs123100 3 ปีที่แล้ว

    12:23 you gave life to me!

  • @InsiderMiner
    @InsiderMiner 3 ปีที่แล้ว

    Looking at your first use of gather, it seems that you have not properly assessed what an observation is, have you? I would think an observation here would best be defined as a country. Then, the columns, should be country name, count for 2011,count for 2012,count for 2013, shouldn't it? The way you have it, Country, Year, N; what are the observational units? A year-country? Why not make it a country, as I have suggested?

  • @vincenzo4259
    @vincenzo4259 2 ปีที่แล้ว

    Thanks

  • @musicspinner
    @musicspinner 3 ปีที่แล้ว

    very helpful

  • @InsiderMiner
    @InsiderMiner 3 ปีที่แล้ว +1

    nice presentation but the audio is pretty poor. the concept of observation is key

  • @jamespaz4333
    @jamespaz4333 3 ปีที่แล้ว

    He looks like Marty Mcfly Senior!!!