Man, this is gold. Chandeep, I have managed to do most of these steps to an irregularly formatted PDF. I have the entire page of each PDF in a single row. Now I need to extract from each the date of an event, the event number which has a specific alphanumeric format, and also extract when there are multiple event numbers same format on same date. I'm thinking using the split functions, but getting stuck here - maybe you can consider this for future videos😅
Thank you for the video - I have been able to adapt it to my particular problem - like your example - over 700K lines and has largely cleaned the data but most importantly placed the appropriate data on a single line.
Your automating column renaming process is fantastic and simple and useful in many scenarios. Pity it gets hidden at the end of this video as it deserves a video on its own.
the challenge with uneven columns in Real life is slightly different, you might have Name and Number as the only two fields available on a record. While your solution is pretty amazing, but if you can address this case it will be real help!!!
In any case, you need a reliable indicator for changing a data record. In Chandeeps example it was as NULL-Cell. If this is not the case on your case then the only way is to identify either it is a name or a number (no text inside) . That approach could be done by check if the datatype could be a number or not.
Awesome , Thank you so much. Just an addition : Instead of creating excel table for column names we can create nested list within Power query and use it to rename columns. It will save us from an extra connection of table. Limitation: It won't be as flexible as Excel table for new users but advance users can do it pretty easily.
Great video...Icwas working on this issue I had with a client's PDF Report that I impoerted into Power Query. Had to do a manual transfer. Wish this video was created sooner. Power Query is a BIG Game Changer for Data Analysis . I have gotten many jobs just from my limited (but growing) knowledge. 😂
Even if we want to have all steps in one step without using Index and Fill Up we can do it in one step as below: = Table.Combine(Table.Group(Source, {"Data"}, {{"GpData", each Table.FromRows({List.RemoveNulls([Data])})}}, GroupKind.Local, (x, y) => Number.From(y[Data] = null))[GpData])
I created a custom index to do my grouping which performs about the same as your code but yours is much simpler, I can see you are using the result of the logic math 0 or 1 to create a change in the column being grouped on for groupkind local to create buckets, did know you could do that. let Pos = List.Buffer( List.PositionOf(Source[Data],null,Occurrence.All) ), Source = Excel.CurrentWorkbook(){[Name="LargeData"]}[Content], Custom1 = [ Co = List.Count(Source[Data]), St = List.Buffer({0} & List.RemoveLastN(List.Transform(Pos, each _ +1),1) & (if Lst < Co then {Lst} else {})), En = List.Buffer({ Pos{0}-1 } & List.Transform(List.Skip(Pos,1), each _-1) & (if Lst < Co then {Co -1} else {})), Lst = List.Last(Pos)+1, GetPos = List.Transform(List.Zip({St,En}), each {_{0} .. _{1}}), Grp = List.Combine(List.Transform(List.Buffer(GetPos), each List.Repeat({_{0}},List.Count(_)))) ], Custom2 = Table.RenameColumns( Table.Combine( Table.Group( Table.FromColumns({List.RemoveNulls(Source[Data]),Custom1[Grp]}), {"Column2"}, {"Record",each Table.FromRows({[Column1]})} )[Record] ), Renames ) in Custom2
That was really great, I'd been trying to do something similar, and getting in a bit of a mess, I was trying to use List transform( and then Table from List( , not my favorite function. Thanks for all your hard work.
at 4:40 Any idea why I keep getting Expression.Error: The column 'Data' of the table wasn't found {{"Data", each [Data]}}) in this line below (since I have no null rows) = Table.Group(#"Inserted Integer-Division", {"Integer-Division"}, {{"Data", each [Data]}}) Thanks
Excellent view. Thanks for sharing. A query: If I have multiple column unstacking problem (example, data is in multiple rows with each row having a different combination of id, course, grade, university). Is it possible to unstack it and have data reorganized with columns headings id, course, grade, and university)? Thanks,
Thank you Chandeep, they way you explain is amazing! one question how can we do it when we have two columns instead of one. The second column has repeated Headers.
Very good technique short and sweet... If multiple columns has uneven records can we apply same logic. Please make clips multiple uneven row and more than 10 column having uneven data
Hi Chandeep... Great video as always.. Can you create a video on connecting power bi to service now and Jira.. Adding to it.. There is a complex procedure to set up a data refresh in workspace... If you can make a video on that... It will help many.. Trust me
Hi Chandeep, what if the columns missed in the records are not at the end, but in between, For instance if age is missing in one record with all other values available.
This is another great video, though, but how would you deal with this little change in scenario? The same data set, but where the employees don’t have all 5 pieces of data, you don’t know that the ones they have are always the first 3, or 4 etc. what I mean is that some incomplete records might be (Name, City and Age) and others might be (Name, Age, Phone). Do you e what I mean? I think this is a more likely scenario. All you can be sure of is that the data will always be supplied with the fields in a specific order, if they exist. I’d love to see this. Thanks **I see this was answered previously - apologies **
Such a wonderful video....But I have a query.....What if I do not have an identifier (space in between each item) as in your example...Could you help me out
I have a pdf containing four pages table. every page has same headers but when i load into power query then last page columns values goes to another cells. how to get rid our of it. could you help me.
Hello Goodly. I have been watching your videos and they are really helpful. I have a situation that is the inverse of what you did in this video. I have a table with ID, name, and email. What happens is that some rows have 1 email, others 2 and some have 10 emails. I want to generate a list or a table with the ID and individual emails on on top of the other so that I can submit it to a email higiene pass to eliminate hard bounces and invalid emails. Would be really cool if you could help me out. I am faily new to Power Query and sometimes it gets a bit overloaded and freezes me up. Best regards
What if you got more than 3 nulls (2 intersect by a text row, and 1 before the next set)? The Filled up step did not come out right. Any ideas is appreciated. Thanks.
Good video. I just wish that it did not assume that everyone understood more simple steps like writing what if commands etc. Because I got all the way to the index column but after that everything got lost. It just assumes that everyone understands every step and something can be skipped.
I have 1 m record on which I want to apply group by to get max value of each category and then same value should reflect against each category in separate custom column.....I tried it but it will take almost 2 hours and eventually system hanged....any suggestions pls
Hi Goodly I have a column in which there is employee id and on a different row there is the bank name.I want to these 2 information on 2 separate columns.please help Thks
Hi Chandeep...great video. I was following successfully until I hit 4:49 to 4:50 after you edited the formula converting a Table to a list. I get Error for every row instead of a list despite the formula being exactly the same. I noticed you cut the video at that exact moment and continued....did you encounter a similar error or why might my Data Column give me Error?
I solved my own question: The original column had to be renamed to 'Data' since that is the column being grouped. I had it named Column1. I was confused because the original column and the new column were both named 'Data'. To clear things up I would ensure your original column is renamed Data and the new column is Data2. New code: = Table.Group(#"Renamed Columns", {"Custom"}, {{"Data2", each _[Data]}})
I also had my source data in two columns, so I used concatenate with a comma , separator and cleaned up the data using Text to Columns once it was back in Excel. To do the same, select both columns in PQ and go to Transform> Text Column> look for a function that allows to merge columns. Once that was done, I was able to follow the steps in the video, but remember to change the column names in the formula to the 'Merged Column name'
You have the data which follows an order. What if we have name & age or name & phone number. What happens then.. how could the data be organized in a proper manner?
One way to do that would be to detect the column's data type and then define it's name. But it becomes very difficult to solve the problem when you have to same data types. For e.g., Power Query won't understand Peter is a name and Paris is a city since both are text values.
i saw this trick somewhere - you have data & record # - you group the by the record # and SUM the data value - this causes an error - in the group step manually change the List.Sum function to Text.Combine set your delimiter - split the value and you are done
Could be this one www.google.com/url?sa=t&source=web&rct=j&url=m.youtube.com/watch%3Fv%3DjLpgt-wptH4&ved=2ahUKEwiblMfs0Nr-AhVX7LsIHUggD2AQtwJ6BAgeEAI&usg=AOvVaw1uRBemalphbDdtFVi5e3h8
Rather than splitting the data into separate four or five columns, how can we combine each (name, contact, etc.) into a single column for each record using space or comma as separator.
i saw this trick somewhere - you have data & record # - you group the by the record and SUM the data value - this causes an error - in the group step manually change the List.Sum function to Text.Combine set your delimiter - split the value and you are done
I'm getting a Network Error when I try to download the file whether or not I subscribe. Since I couldn't get the file, I entered data manually, but the video didn't show that there was a fifth field, so when I tried to do the Rename Columns I got an error because there was no fifth column. Luckily I remembered that Table.RenameColumns has a third option which defines how to handle missing fields. They are: MissingField.Error (default if no third option present), MissingField.Ignore (which left out the 5th column for me), and MissingField.UseNull (which creates the column and fills it with nulls).
Thanks sir, I want to transpose data like this format Heading A B C Heading1 A B C D Heading2 A B C D Heading3 A B C D E Now need to data below format, that is possible sir, if yes could please help Here is not black row, this an example for data. Actually I have 1 lac row and I want to transpose in this format Heading A B C Heading1 A B C D Heading2 A B C D Heading3 A B C D E
Good trick! btw, in the step of = Table.Group(#"Filtered Rows", {"Custom"}, { { "Data", each _[Data] } } ) why not just use the following syntax? =Table.Group(#"Filtered Rows", {"Custom"}, { { "Data", each Table.Transpose( _[[Data]] ) } }) With this and Table.Combine, you only need one or two more steps to get the query done🫡
This is 100% EXACTLY what I was looking for, down to the blank rows as separators! Astounding as always.
Great video!!
lambda alternative: VUNSTACK(v) Vector Unstack
=LAMBDA(v,LET(a,(v="")*SEQUENCE(ROWS(v)),b,FILTER(a,a),c,VSTACK(0,DROP(b,-1)),d,b-c-1,e,SEQUENCE(ROWS(b))^0+SEQUENCE(,MAX(d))-1,IF(e
- no need to input nr. of fields, no refresh, calc time instant
Thank you for sharing your knowledge so generously. Super helpful for me!
Great!! I have used a mix of your two unstack videos to solve a big problem here. Thanks a lot!!
Man, this is gold. Chandeep, I have managed to do most of these steps to an irregularly formatted PDF. I have the entire page of each PDF in a single row. Now I need to extract from each the date of an event, the event number which has a specific alphanumeric format, and also extract when there are multiple event numbers same format on same date. I'm thinking using the split functions, but getting stuck here - maybe you can consider this for future videos😅
Love the clear way you explain power query. Great teaching method backed up by very intelligent animations! Great job!
Glad you like our work Alexandru!
Thank you for the video - I have been able to adapt it to my particular problem - like your example - over 700K lines and has largely cleaned the data but most importantly placed the appropriate data on a single line.
Super helpful as always Chandeep! Thank you for sharing!
Glad it was helpful Yami!
You have a new follower! Thank you!
Your automating column renaming process is fantastic and simple and useful in many scenarios. Pity it gets hidden at the end of this video as it deserves a video on its own.
This is perfect timing! I am working on a dirty data set with this exact problem right now!
Your videos are the best Chandeep!
Glad it was helpful Ted!
the challenge with uneven columns in Real life is slightly different, you might have Name and Number as the only two fields available on a record. While your solution is pretty amazing, but if you can address this case it will be real help!!!
In any case, you need a reliable indicator for changing a data record. In Chandeeps example it was as NULL-Cell. If this is not the case on your case then the only way is to identify either it is a name or a number (no text inside) . That approach could be done by check if the datatype could be a number or not.
Awesome , Thank you so much.
Just an addition : Instead of creating excel table for column names we can create nested list within Power query and use it to rename columns. It will save us from an extra connection of table.
Limitation: It won't be as flexible as Excel table for new users but advance users can do it pretty easily.
Superb..Hats off to you
Great video...Icwas working on this issue I had with a client's PDF Report that I impoerted into Power Query. Had to do a manual transfer. Wish this video was created sooner. Power Query is a BIG Game Changer for Data Analysis . I have gotten many jobs just from my limited (but growing) knowledge. 😂
Very good man... Useful! Thank you!!!
Incredible Goodly !
Thanks Ismael!
Great examples of how to transform lists into tables and vice versa! Thank you 😊.
Brilliant stuff Chandeep 👏
Thanks Paul!
Perfect. Thank you for sharing.
Glad you like this Raimundo!
Chandeep - the Power Query GOAT!
You're a wizard sir !
Thanks Hicham!
super awesome indeed! you're a nerd man...
Mera to jeewan safal ho gya ye video dekh ke apka bahut bahut abhar
How the heck this guy get this Logic ???.... Its Just awesome. Thanks Chandeep
Great and very good solution
Even if we want to have all steps in one step without using Index and Fill Up we can do it in one step as below:
= Table.Combine(Table.Group(Source, {"Data"}, {{"GpData", each Table.FromRows({List.RemoveNulls([Data])})}}, GroupKind.Local, (x, y) => Number.From(y[Data] = null))[GpData])
Now that is impressive! And if the headers are put in the first record of the data you can simply promote the headers…
But "long" way of Chandeep is visually more simple for understanding.
And after some practice we could understand complex code like your.
I created a custom index to do my grouping which performs about the same as your code but yours is much simpler, I can see you are using the result of the logic math 0 or 1 to create a change in the column being grouped on for groupkind local to create buckets, did know you could do that.
let
Pos = List.Buffer( List.PositionOf(Source[Data],null,Occurrence.All) ),
Source = Excel.CurrentWorkbook(){[Name="LargeData"]}[Content],
Custom1 =
[
Co = List.Count(Source[Data]),
St = List.Buffer({0} & List.RemoveLastN(List.Transform(Pos, each _ +1),1) & (if Lst < Co then {Lst} else {})),
En = List.Buffer({ Pos{0}-1 } & List.Transform(List.Skip(Pos,1), each _-1) & (if Lst < Co then {Co -1} else {})),
Lst = List.Last(Pos)+1,
GetPos = List.Transform(List.Zip({St,En}), each {_{0} .. _{1}}),
Grp = List.Combine(List.Transform(List.Buffer(GetPos), each List.Repeat({_{0}},List.Count(_))))
],
Custom2 = Table.RenameColumns(
Table.Combine(
Table.Group(
Table.FromColumns({List.RemoveNulls(Source[Data]),Custom1[Grp]}),
{"Column2"},
{"Record",each Table.FromRows({[Column1]})}
)[Record]
),
Renames
)
in
Custom2
That was really great, I'd been trying to do something similar, and getting in a bit of a mess, I was trying to use List transform( and then Table from List( , not my favorite function.
Thanks for all your hard work.
Super Awesome Video❤
So love your solutions and approach. Any special prices on your power query course.
at 4:40 Any idea why I keep getting Expression.Error: The column 'Data' of the table wasn't found
{{"Data", each [Data]}})
in this line below (since I have no null rows)
= Table.Group(#"Inserted Integer-Division", {"Integer-Division"}, {{"Data", each [Data]}})
Thanks
Amazing 👏
Amazing
Love you!
very nice, zabar10...
Glad you like this zahoor
Excellent view. Thanks for sharing. A query: If I have multiple column unstacking problem (example, data is in multiple rows with each row having a different combination of id, course, grade, university). Is it possible to unstack it and have data reorganized with columns headings id, course, grade, and university)? Thanks,
Thanks for vedio
Thank you Chandeep, they way you explain is amazing! one question how can we do it when we have two columns instead of one. The second column has repeated Headers.
Very good technique short and sweet... If multiple columns has uneven records can we apply same logic. Please make clips multiple uneven row and more than 10 column having uneven data
Hi Chandeep... Great video as always..
Can you create a video on connecting power bi to service now and Jira.. Adding to it.. There is a complex procedure to set up a data refresh in workspace... If you can make a video on that... It will help many.. Trust me
Thanks!
Thanks for the Tip Alex!
Hi Chandeep, what if the columns missed in the records are not at the end, but in between, For instance if age is missing in one record with all other values available.
Amazing Goodly ! May a question how do the UNICODE function work in Power query ?
This is another great video, though, but how would you deal with this little change in scenario? The same data set, but where the employees don’t have all 5 pieces of data, you don’t know that the ones they have are always the first 3, or 4 etc. what I mean is that some incomplete records might be (Name, City and Age) and others might be (Name, Age, Phone). Do you e what I mean? I think this is a more likely scenario. All you can be sure of is that the data will always be supplied with the fields in a specific order, if they exist. I’d love to see this. Thanks
**I see this was answered previously - apologies **
5:20 How do I keep the Custom column? In my case I am using a Date as that custom column to group by.
fantastic
Great Sir
Such a wonderful video....But I have a query.....What if I do not have an identifier (space in between each item) as in your example...Could you help me out
I have a pdf containing four pages table. every page has same headers but when i load into power query then last page columns values goes to another cells. how to get rid our of it. could you help me.
Hello Goodly. I have been watching your videos and they are really helpful. I have a situation that is the inverse of what you did in this video. I have a table with ID, name, and email. What happens is that some rows have 1 email, others 2 and some have 10 emails. I want to generate a list or a table with the ID and individual emails on on top of the other so that I can submit it to a email higiene pass to eliminate hard bounces and invalid emails. Would be really cool if you could help me out. I am faily new to Power Query and sometimes it gets a bit overloaded and freezes me up. Best regards
What if you got more than 3 nulls (2 intersect by a text row, and 1 before the next set)?
The Filled up step did not come out right.
Any ideas is appreciated. Thanks.
Good video. I just wish that it did not assume that everyone understood more simple steps like writing what if commands etc. Because I got all the way to the index column but after that everything got lost. It just assumes that everyone understands every step and something can be skipped.
Plz make video for Column Sorting (2 Column same time)
what if my conditional is a cell with uppercase letter, what should I use as conditional?
Is there a solution for unpivoting as well?
I have 1 m record on which I want to apply group by to get max value of each category and then same value should reflect against each category in separate custom column.....I tried it but it will take almost 2 hours and eventually system hanged....any suggestions pls
Hi Goodly
I have a column in which there is employee id and on a different row there is the bank name.I want to these 2 information on 2 separate columns.please help
Thks
Hi @goodly, how come the columns get aligned so nicely? I was not expecting this. Seeing some interesting replies too that need to be studied.
What if I want to replicate the same process but with additional columns expressing the same data?
Hi Chandeep...great video. I was following successfully until I hit 4:49 to 4:50 after you edited the formula converting a Table to a list. I get Error for every row instead of a list despite the formula being exactly the same. I noticed you cut the video at that exact moment and continued....did you encounter a similar error or why might my Data Column give me Error?
Send me your data and query
goodly.wordpress@gmail.com
I solved my own question: The original column had to be renamed to 'Data' since that is the column being grouped. I had it named Column1. I was confused because the original column and the new column were both named 'Data'. To clear things up I would ensure your original column is renamed Data and the new column is Data2. New code:
= Table.Group(#"Renamed Columns", {"Custom"}, {{"Data2", each _[Data]}})
I also had my source data in two columns, so I used concatenate with a comma , separator and cleaned up the data using Text to Columns once it was back in Excel. To do the same, select both columns in PQ and go to Transform> Text Column> look for a function that allows to merge columns. Once that was done, I was able to follow the steps in the video, but remember to change the column names in the formula to the 'Merged Column name'
Thanks dear, please help me how to data export in multiple workbook from data in Power query
You have the data which follows an order. What if we have name & age or name & phone number. What happens then.. how could the data be organized in a proper manner?
Yes, I have the same question
One way to do that would be to detect the column's data type and then define it's name. But it becomes very difficult to solve the problem when you have to same data types. For e.g., Power Query won't understand Peter is a name and Paris is a city since both are text values.
Buen video!!! Pero que pasaría en el caso de que en ciertos grupos de datos faltará el número de teléfono, pero en otros faltará la edad o otro dato?
i saw this trick somewhere - you have data & record # - you group the by the record # and SUM the data value - this causes an error - in the group step manually change the List.Sum function to Text.Combine set your delimiter - split the value and you are done
Could be this one
www.google.com/url?sa=t&source=web&rct=j&url=m.youtube.com/watch%3Fv%3DjLpgt-wptH4&ved=2ahUKEwiblMfs0Nr-AhVX7LsIHUggD2AQtwJ6BAgeEAI&usg=AOvVaw1uRBemalphbDdtFVi5e3h8
Can u please create content on alteryx ...🙏🏻
Sir, when I try to load a data into power query with more than 5000 rows it only loads 33 rows. What may be the reason??
Rather than splitting the data into separate four or five columns, how can we combine each (name, contact, etc.) into a single column for each record using space or comma as separator.
i saw this trick somewhere - you have data & record # - you group the by the record and SUM the data value - this causes an error - in the group step manually change the List.Sum function to Text.Combine set your delimiter - split the value and you are done
Nice
Hi Chandeep, there is some problem with downlloading the file. Can you please verify it? Thank you very much.
Great work,
Please, can you build your qurey with us step by step rather than showing the applied steps
how to do if there are NO blank rows between each set of data?
Hi could you share the M Codes so that we dont have to type in manually
Yes, the file used in this video is available in the description, and the M code can be found there.
Really good insight, however it didn’t work for me 😢 Had to trim the data manually 😢 Thankfully there were only 1,300 entries
I'm getting a Network Error when I try to download the file whether or not I subscribe.
Since I couldn't get the file, I entered data manually, but the video didn't show that there was a fifth field, so when I tried to do the Rename Columns I got an error because there was no fifth column. Luckily I remembered that Table.RenameColumns has a third option which defines how to handle missing fields. They are:
MissingField.Error (default if no third option present), MissingField.Ignore (which left out the 5th column for me), and MissingField.UseNull (which creates the column and fills it with nulls).
Can we instead do this Without applying M language , just using the User Interface 😢? I am finding hard to learn M.
Thanks sir,
I want to transpose data like this format
Heading
A
B
C
Heading1
A
B
C
D
Heading2
A
B
C
D
Heading3
A
B
C
D
E
Now need to data below format, that is possible sir, if yes could please help
Here is not black row, this an example for data. Actually I have 1 lac row and I want to transpose in this format
Heading A B C
Heading1 A B C D
Heading2 A B C D
Heading3 A B C D E
❤
1:02 remove empty rows☺
Good trick! btw, in the step of
= Table.Group(#"Filtered Rows", {"Custom"}, { { "Data", each _[Data] } } )
why not just use the following syntax?
=Table.Group(#"Filtered Rows", {"Custom"}, { { "Data", each Table.Transpose( _[[Data]] ) } })
With this and Table.Combine, you only need one or two more steps to get the query done🫡