Reformat and clean Excel data in minutes with Regular Expressions

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ย. 2024
  • 👨‍💻 I use Sublime text in this video, which you can find here: www.sublimetex...
    📓 An alternative that I’ve been using for years is Notepad++ (notepad-plus-p... - Windows only).
    🍎 Macs come with TextEdit, which appears to come with a pattern matching find and replace feature similar to regex, but possibly more beginner friendly. From what I’ve read it should work for most standard use cases. Macmost have a guide on how to use it here: macmost.com/te...
    📈 You can find the data set here:
    drive.google.c...
    🕶 Regex cheat sheet:
    www.rexegg.com...
    cheatography.c...

ความคิดเห็น • 8

  • @productivityexchange
    @productivityexchange  3 ปีที่แล้ว

    I have used this to clean large datasets (particularly of stray or excessive characters in weird spots), to reformat data into new shapes or formats, to take text copied from websites (like CMSs) and get it into a tabular format I could use in Excel, and to target specific issues from a list in a Jira query for bulk changes. I use regular expressions multiple times a week in a variety of contexts and it has made working with Excel easier, as well as managing large projects.
    What do you see yourself using Regular Expressions for?

  • @mjscpr
    @mjscpr ปีที่แล้ว

    That was very helpful, thanks!

  • @vrbaac1641
    @vrbaac1641 2 ปีที่แล้ว +2

    hi... thank you for the very informative video... just want to ask, what can I do in case there are random blank entries in some columns/rows? how could I skip them? thank you ^^

    • @productivityexchange
      @productivityexchange  2 ปีที่แล้ว +1

      Hi VR Baac,
      The good news is that blank rows or anything that doesn't match your pattern will get ignored. You can find these afterwards with a separate regular expression by looking for anywhere you have two newline characters with zero or more tabs between them (either
      \t*
      or
      \t*
      depending on the program you use) - which you can replace with nothing when you replace or replace all. You may need to run this a couple of times in a row to get any instances of multiple blank rows in a row
      Empty cells in columns, however, can be dealt with in a number of ways:
      * The simplest is to do several regular expression find and replacements as multiple steps, to capture different patterns separately.
      * In the case where there is nothing in a column, you can use .* where . means any character and * means 0 or more of them. Be careful with this as . can also match to a tab character and can cause some weird behaviour in your pattern matching if you aren't careful.
      * you can get fancy with operations like or using | (pipe) and ? for lazy matches or ^ and $ for start and end of line respectively. It starts getting really messy so I would avoid doing anything too fancy as much as possible because it becomes a nightmare to test and debug if you come back to it at a later stage.

    • @vrbaac1641
      @vrbaac1641 2 ปีที่แล้ว

      @@productivityexchange thank you for taking your time in explaining ^^ I greatly appreciate it...

  • @chidiogwuniru6120
    @chidiogwuniru6120 2 ปีที่แล้ว +1

    Hi, Lazy man's request here. How do I reverse this using regex. i have a dataset that looks like your result but would like to spread across columns.

    • @productivityexchange
      @productivityexchange  2 ปีที่แล้ว +1

      Hi Chidi,
      You'll still want to capture the text bits in a similar way to what I've done, ignoring bits like "Name: ", but instead of going to the next column's data with \t for tab you'll need to go to the new line with
      or
      (this will depend on the editor you're using). You should be able to capture multi-line input that way. Then, for the replacement string you'll want to use almost what I had, but with \t instead of
      .

    • @chidiogwuniru6120
      @chidiogwuniru6120 2 ปีที่แล้ว

      @@productivityexchange thank you very much will apply and give a feed back