Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ต.ค. 2024

ความคิดเห็น • 93

  • @JohnMKaya-lm1ry
    @JohnMKaya-lm1ry 4 ปีที่แล้ว +2

    Thank you very much for the great tutorial!
    By the way, slice function has new features. This is from the dplyr version history:
    slice() gains a new set of helpers:
    slice_head() and slice_tail() select the first and last rows, like
    head() and tail(), but return n rows per group.
    slice_sample() randomly selects rows, taking over from sample_frac()
    and sample_n().
    slice_min() and slice_max() select the rows with the minimum or
    maximum values of a variable, taking over from the confusing top_n().

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Thanks for sharing!

  • @ishansgyan8665
    @ishansgyan8665 7 ปีที่แล้ว +1

    Really great pair of Dplyr videos. Solved 90% of my task of exploring how to perform data manipulation tasks. It will be really helpful if you make similar videos on Functions and loops in R with some good use cases.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Glad you liked the dplyr videos, and thanks for your suggestion!

  • @anthonystaines
    @anthonystaines 9 ปีที่แล้ว

    Really useful pair of tutorials - I'm an experienced R user trying to get my head around the dplyr way of thinking, and these were helpful.
    Thanks again,

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Anthony Staines Awesome! Thanks for the kind compliment!

  • @RootedTango1212
    @RootedTango1212 8 ปีที่แล้ว +6

    Thank you very much! Can't emphasise how much these two tutorials have helped me :)

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      +hima sagar You're very welcome! So glad to hear! :)

  • @MarcTelesha
    @MarcTelesha 9 ปีที่แล้ว +3

    Great video on more advance dplyr in R. This makes manipulating of data so smooth, readable and fast. HIGHLY recommended.

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Marc Telesha Thanks Marc, glad the video was helpful to you! I agree that readability is so important, which is one reason I'm a big fan of dplyr.

    • @MarcTelesha
      @MarcTelesha 9 ปีที่แล้ว

      Data School Can I give a slight criticism???
      SPACE after every %>% and not lines over 80 characters with three or four %>% it is driving me CRAZY! I know alt+enter is a nice shortcut but ctl-shift-P also works.
      Smiles

    • @dataschool
      @dataschool  9 ปีที่แล้ว +1

      Marc Telesha Ha! I used nicer formatting in my previous dplyr tutorial, but this time, I decided to write my tutorial code the way I write my real code :) However, I'll consider changing back for my next tutorial!

    • @MarcTelesha
      @MarcTelesha 9 ปีที่แล้ว

      Data School Thank you if you do :) Even if you don't it is a really good run through of dplyr. It will be intresting to see if Juypter (AKA iPython) Notebook might work better for your tutorials?

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Marc Telesha Definitely worth considering! I teach some of my data science classes using IPython notebook, but up to now have not used them for R code. Thanks for the idea!

  • @BmyofMe
    @BmyofMe 4 ปีที่แล้ว

    best data manipulation tutorial I ve ever found on youtube, keep up your good work, you are a great tutor.

  • @cherub6958
    @cherub6958 7 ปีที่แล้ว

    Intense, to-the-point but very informative, have learned a lot from the pair of dplyr videos, I feel very much comfortable with dplyr.

  • @gksujay3465
    @gksujay3465 7 ปีที่แล้ว

    Very clear and concise explanation in both the parts. Thank you so much for making the learning awesome !! Please continue to make more such videos on new concepts in R

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Glad it was helpful to you! Currently, I'm only making videos on Python, I'm sorry!

  • @mosesotieno1629
    @mosesotieno1629 4 ปีที่แล้ว +1

    You step by for the newbies like me! I love your teaching approach. You are real great tutor!

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Thanks!

    • @waynebruno7051
      @waynebruno7051 3 ปีที่แล้ว

      you prolly dont give a damn but does any of you know of a way to log back into an Instagram account??
      I stupidly forgot my account password. I would appreciate any tips you can give me

    • @shawnkoda8146
      @shawnkoda8146 3 ปีที่แล้ว

      @Wayne Bruno Instablaster ;)

    • @waynebruno7051
      @waynebruno7051 3 ปีที่แล้ว

      @Shawn Koda thanks so much for your reply. I got to the site thru google and I'm trying it out atm.
      Seems to take quite some time so I will get back to you later when my account password hopefully is recovered.

    • @waynebruno7051
      @waynebruno7051 3 ปีที่แล้ว

      @Shawn Koda it did the trick and I now got access to my account again. Im so happy:D
      Thank you so much you really help me out :D

  • @deepak39754
    @deepak39754 7 ปีที่แล้ว

    Easy to understand with your series of videos on dplyr ...Thanks for lucid explanations

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      You're very welcome!

  • @asneogy
    @asneogy 8 ปีที่แล้ว

    Very nice pair of tutorials, love this package! Quick thing about matching of the columns 'color'= 'col' - you need to keep them in the order of the tables a and b in the join statement. Meaning, if you did 'col'= 'color' it would give an error, since a does not have 'col' and b does not have 'color'.

    • @dataschool
      @dataschool  8 ปีที่แล้ว

      Great point, thanks!

  • @endalealtaye3147
    @endalealtaye3147 9 ปีที่แล้ว

    Very Precise, Clear and quite helpful .Thanks

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Endale Altaye You're very welcome! Thanks for your kind words.

  • @jasontarimo3997
    @jasontarimo3997 5 ปีที่แล้ว

    Amazing videos. Are you going to do anymore videos on R? Such an amazing tool to get stuffs done on your day (wrangle) very fast. I have a request. How could you do a map() with dataframe in R, like giving different names for values in a column. Is there any function in dplyr for this?

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Glad you like the video! No, I'm not planning to do any more videos with R - I'm sorry! I work in Python now, and I haven't used R in years. I like both languages, but I prefer to get as good as possible in one language.

  • @RockMonkeyLV
    @RockMonkeyLV 8 ปีที่แล้ว +1

    When referring to objects created by data_frame as being local data frames and ones created by data.frame as not local. What do you mean by local vs not local?

    • @dataschool
      @dataschool  8 ปีที่แล้ว +1

      +Glenn Z Great question! I just answered this on Stack Overflow: stackoverflow.com/a/35605110/1636598

  • @j7andrew
    @j7andrew 5 ปีที่แล้ว

    What if you're looking to find a combination of sequences that occurred? For example, I want to know how many times X then Y occurred

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Sorry, I'm not sure I fully understand your question!

  • @tribibpal
    @tribibpal 5 ปีที่แล้ว

    Awesome man . Now you own me officially.

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      Ha! Maybe you would like to support Data School on Patreon: www.patreon.com/dataschool

  • @MR-eg6np
    @MR-eg6np 4 ปีที่แล้ว

    this was a great video, thank you! Any chance you will make other R videos again?

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      Not any time soon, sorry!

  • @BurningR
    @BurningR 9 ปีที่แล้ว

    great tutorial, thank you! coming from stata, this makes R datamanipulation seem much less confusing

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Emil Begtrup-Bright Awesome, glad it was helpful to you! Welcome to the world of R :)

  • @toddc1021
    @toddc1021 7 ปีที่แล้ว

    Hi thank you very much for the tutorial. This is very helpful. I have one question though. With dplyr 0.50, I could not get the same result as yours at 10:40 of the video. The arrange function sorts all the rows from the highest dep_delay the lowest which messed up the order created by group_by(day, month). Is there an alternative? Thank you.

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      I'm sure there's an alternative, though I haven't used the latest version of dplyr and so I can't say for sure. Let me know if you figure it out!

    • @roshanmr2011
      @roshanmr2011 7 ปีที่แล้ว +2

      Todd Cho use all 3 variables in arrange(month, day, desc(dep_delay))

    • @toddc1021
      @toddc1021 7 ปีที่แล้ว

      Thank you Roshan that's the answer!

  • @dheerajkura5914
    @dheerajkura5914 7 ปีที่แล้ว

    Excellent...! too good explanation
    Really really helpful to understand the Rstudio and to munge the Data

    • @dataschool
      @dataschool  7 ปีที่แล้ว

      Thanks for the kind comment!

  • @KreshnikMorina
    @KreshnikMorina 4 ปีที่แล้ว

    Thank you very much, wonderful explanation!

  • @shivibhatia1613
    @shivibhatia1613 9 ปีที่แล้ว

    Perfect video for data manipulation in R. The second video is also great in continuation to the first one. Please upload more tutorials on R.
    Just one question though- when i run the same code:
    june%>% group_by(type, city)%>% top_n(3,frieght). Here june is the name of my excel file. and grouping based on type city where top 3 freight to be filters. this gives correct output though as i have 28 columns hence in the R console it only shows 6 columns and remaining reads as variables not shown.
    Is there a possibility to show all or max columns in the console because i could then take the result to the business user or any alternate you could suggest.
    Thanks again for the videos. ROCKS!!!!

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Shivi Bhatia Glad you are enjoying the videos! To answer your question, you can indeed show more columns. Check out the "Viewing more output" section of this document: rpubs.com/justmarkham/dplyr-tutorial-part-2

  • @tonkouts
    @tonkouts 9 ปีที่แล้ว

    Great video. Thanks for sharing Kevin!
    At the moment I'm interested in replacing "for" loops when possible, using dplyr package and the "do" command.
    I have the following script :
    ## split initial dataset based on a grouping variable/column
    ## and save each (new) dataset as a different .csv file
    data.frame(mtcars) %>%
    group_by(cyl) %>%
    do(d=data.frame(.)) %>%
    do(write.csv(.$d, paste0("data_cyl_",.$cyl,".csv")))
    Seems to work, as I can see the .csv files created in my workspace, but it also returns the following error:
    Error: Results are not data frames at positions: 1, 2, 3
    Any ideas or thoughts?

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      tonkouts I'm sorry, I wish I could help but I'm not that familiar with the do() function!

    • @tonkouts
      @tonkouts 9 ปีที่แล้ว

      Data School No problem. Do function is very helpful. Especially when it's combined with "broom" package to create tidy models (model outputs).

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      tonkouts Neat! I'll have to check out "broom", thanks!

    • @tonkouts
      @tonkouts 9 ปีที่แล้ว +1

      Data School Really keeps (and influences) you thinking from a data frame point of view all the time... cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      tonkouts Wow, great vignette!

  • @rupeshmohanasundaram6718
    @rupeshmohanasundaram6718 6 ปีที่แล้ว +1

    Hi, I like this video which is also very useful to me. I need your help that in case we pass column names of df as arguments to a function how to use those variables in functions like select, arrange, distinct, summarise of Dplyr verb. Kindly reply as soon as possible

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Sorry, I don't quite understand your question. Good luck!

  • @rogerwilcoshirley2270
    @rogerwilcoshirley2270 4 ปีที่แล้ว

    Nice job, very helpful !

  • @luisvalesilva8931
    @luisvalesilva8931 9 ปีที่แล้ว

    Great video! Lots of useful tricks.

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      ***** Thank you!

  • @stewartli5395
    @stewartli5395 6 ปีที่แล้ว

    Great tips. Thank you very much.

  • @picasso1334
    @picasso1334 7 ปีที่แล้ว

    Great video. So helpful, thank you

  • @pranavsatbhai4489
    @pranavsatbhai4489 6 ปีที่แล้ว

    my_db

    • @dataschool
      @dataschool  6 ปีที่แล้ว

      Sorry, I'm not sure why you are getting this error!

  • @asdfghjkl12904
    @asdfghjkl12904 5 ปีที่แล้ว

    Thank you for the nice tutorial! :)

    • @dataschool
      @dataschool  5 ปีที่แล้ว

      You're very welcome!

  • @asbcllc
    @asbcllc 9 ปีที่แล้ว

    Spread the power of dplyr and R

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Alex Bresler Ha, dplyr is indeed awesome!

  • @IndianYJ
    @IndianYJ 9 ปีที่แล้ว

    I have tweeted it! Awesome!!!

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Amit Ugle Thanks for sharing! :)

    • @IndianYJ
      @IndianYJ 9 ปีที่แล้ว

      can you please create a tutorial on shiny apps? You are an awesome teacher!

    • @dataschool
      @dataschool  9 ปีที่แล้ว

      Amit Ugle I will definitely consider it! Shiny does have some great written tutorials: shiny.rstudio.com/tutorial/

  • @chengPin
    @chengPin 6 ปีที่แล้ว

    Great!

  • @harishmehra5956
    @harishmehra5956 6 ปีที่แล้ว

    Awsome

  • @13statistician13
    @13statistician13 5 ปีที่แล้ว

    You speak waaaay too slow for my liking. I'd recommend speeding up the speech in your next videos. Other than that, the content is good! Thanks.

    • @dataschool
      @dataschool  5 ปีที่แล้ว +1

      Glad you like the content!

  • @hplcdadong
    @hplcdadong 8 ปีที่แล้ว

    Great tutorials. Thanks a lot.