How to write a simple regular expression in R using sub and str_replace (CC183)

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ย. 2024

ความคิดเห็น • 39

  • @mitchdobbs6296
    @mitchdobbs6296 2 ปีที่แล้ว +1

    Pat this is awesome -- I'm just getting to work on regular expressions and this video was the next puzzle piece for me . You rock!

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว +1

      Fantastic! Thanks Mitch. Thursday will have another regex episode with some more advanced concepts. Let me know if there’s anything you’re wondering about and maybe we could keep it going 😊

    • @mitchdobbs6296
      @mitchdobbs6296 2 ปีที่แล้ว

      @@Riffomonas Heck yeah - can’t wait!

  • @piyushchauhan1955
    @piyushchauhan1955 2 ปีที่แล้ว +1

    This helped alot . Thank you so much

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว +1

      Wonderful! I’m so glad to be helpful

  • @russtin1
    @russtin1 2 ปีที่แล้ว +3

    Regex is great, but you can really pull your hair out trying to figure it out

  • @elcheff
    @elcheff 2 ปีที่แล้ว +1

    thank you very much

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      My pleasure - thanks for watching!

  • @AndreaDalseno
    @AndreaDalseno 2 ปีที่แล้ว +1

    Hi, actually I'm much more comfortable with Python, but I like to improve my skills with R (the other side of the moon) and your videos are simply awesome.
    This is how I would solve the task in python (where df is a Dataframe and after having imported pandas as pd and re):
    (pd.DataFrame(df["samples"]
    .map(lambda y : re.match('(\w)(\d+)(\w)(\d+)',y).group(1,2,4))
    .to_list(),
    columns=['gender', 'sample_n', 'day'],
    index=df["samples"])
    .assign(gender = lambda x : x["gender"].map({'F':'Female', 'M':'Male'}),
    sample_n = lambda x : x["sample_n"].astype(int),
    day = lambda x: x["day"].astype(int)))
    While in R I would do ("translate into") something like this:
    dist_tbl %>%
    select("samples")%>%
    mutate(as_tibble(str_match(matrix(unlist(samples)), "(?\\w)(?\\d*)(?\\w)(?\\d*)")[,c(2,3,5)]))%>%
    mutate(day = as.integer(day))%>%
    mutate(sample = as.integer(sample))%>%
    mutate(gender=ifelse(gender == "F", "Female", "Male"))
    It should work pretty fine. The good parte of Regular Expressions is that you can solve complex tasks in just one command.
    PS group matching in python starts from 0 while in R starts from 1 so the numbers in group selection are different. In R I gave a name to each group, while in python it's easier to name the columns since the names, AFAIK, are not returned but only used inside RE.

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว +1

      Thanks for watching - I’ll talk about groups in the next episode!

  • @CoachPegasus
    @CoachPegasus 2 ปีที่แล้ว +1

    In date column ,
    I need to change ' 04-04-2020' to ' 04/04/2020' ,
    then I need to convert to datetime.
    i did it with stringr
    after printing it shows all NAN.

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      With the 04-04-2020 format try using the mdy or dmy functions from lubridate depending on if it’s month-day or day-month

  • @ahmed007Jaber
    @ahmed007Jaber 2 ปีที่แล้ว +1

    Great topic. Any tips/ resources to grab a certian text eg xxxx-xxx in string content? Just to grab this pattern and ignore anything else?

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      Wrap the text you want in parentheses and put .* on both sides and then use \\1 as the replacement value

  • @alexw5126
    @alexw5126 3 วันที่ผ่านมา

    Great teacher, one of the things I could never get my head around, simlar to the taxonomy levels :D Thank you!

    • @Riffomonas
      @Riffomonas  2 วันที่ผ่านมา

      Wonderful - thanks for watching! 🤓

  • @sven9r
    @sven9r 2 ปีที่แล้ว +3

    Today I struggled with regex and now this video. This channel is so much underrated. Thank you so much Pat.

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว +1

      Wonderful! I’ll have another regex episode on Thursday. Let me know if you have any regex-related questions and we can keep them going 😊

    • @sven9r
      @sven9r 2 ปีที่แล้ว

      @@Riffomonas You already helped me so much with the matrix problem the other day! Today I nested 350 matrices with the help you provided in under 4 hours. Such a good feeling

  • @jamesleleji6984
    @jamesleleji6984 ปีที่แล้ว

    How do you find and replace a string in different column names

  • @tstratton1
    @tstratton1 2 ปีที่แล้ว +2

    I think these are my favorite youtube videos of all time. Even if I already know what he's doing.

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      Lol thanks! I’ll do another regular expression video later in the week. Let me know if you have ideas on other things you’d like to see

    • @dariushghasemi6476
      @dariushghasemi6476 2 ปีที่แล้ว

      @@Riffomonas I'm so enthusiast to dynamic programming, e.g. running multiple linear regression for a bunch of features, or even running mediation analysis! Thanks Patrick :)

  • @spencermartin4846
    @spencermartin4846 2 ปีที่แล้ว +1

    And suddenly it makes sense

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 ปีที่แล้ว +2

    Speaking of naming samples. NASA has a great system for naming meteorite samples. For instance the sample "ALH84001". This sample was collected in Antartica in the Allan Hills region during a collecting mission in 1984 (hence the 84). It was the first sample collected (hence the 001). This is a pretty famous meteorite since it is of Martian origin and NASA scientists thought at one point that they had discovered fossil bacteria in it.

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      Very cool! I remember reading the “bacterial fossil” paper in a journal club.

    • @haraldurkarlsson1147
      @haraldurkarlsson1147 2 ปีที่แล้ว

      @@Riffomonas I worked on water in Martian meteorites as NRC fellow at NASA for a couple years. The "big" discovery came the year after I left and my sponsor at NASA was one of the authors. Talk about timing...

  •  2 ปีที่แล้ว +2

    Excellent! Regular expressions was always a difficult topic to implement in R. Thanks for the video!

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      My pleasure! There will be another regular expression video out later this week. Let me know if there’s anything else you’d like to learn about regular expressions

    • @jyotikataria129
      @jyotikataria129 11 หลายเดือนก่อน

      @@Riffomonas where is this datatable? Im not able to download it.

  • @ErionMaxhari
    @ErionMaxhari 2 ปีที่แล้ว +1

    Excellent job. In fact you can use substr to extract fixed length chars. Especially useful for extracting female or male. It's always the first char

  • @roymccormick5328
    @roymccormick5328 2 ปีที่แล้ว +1

    sooooo super helpful for what I was stuck with today thx😎

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      Wonderful! Glad it was helpful 🤓

  • @jameswhitaker4357
    @jameswhitaker4357 ปีที่แล้ว

    This is easily the best explanation of regex. Idk how I’ve made it this far without really utilizing them, but some new projects are looking like I’m going to have to use it. You’re a godsend

  • @tlange5091
    @tlange5091 2 ปีที่แล้ว +1

    This is really, really helpful! Thank you

    • @Riffomonas
      @Riffomonas  2 ปีที่แล้ว

      Thanks for watching!