Great video. I'm actually in an Excel file right now and used Word to make a word cloud out of the text in the spreadsheet. Will have to try this. Good speaking voice too, by the way.
It is a great trick to analyse our data... If data is too much high then it will not be cover in single cell... Is there any option to enter n number of words in single cell
According to this article (support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3) as of today, Excel can only handle 32,767 characters in an individual cell. If you have a little more than this, you could use other tools such as Microsoft Word to do a character count to determine where your limit is and then do a couple recursive cut & paste commands from Word to Excel to handle this. However, if you're dealing with this large a volume of data, then Excel probably isn't the best platform to be analyzing text with. You might want to look into Python (if you're the coding type) or maybe Rapid Miner or SAS if you prefer a slightly more UI-based approach. Hope that helps!
Thanks for the comment and for sharing info on Excel's cell character cap. This tutorial was targeted to the average user that likely isn't dealing with a ton of data. Excel definitely has its limitations on data volume and processing constraints.
How do I do this with grouping 2 words? This uses space delimited separation of words. I’m interested in doing this with 2 or 3 words. Themes. Notes: Word Cloud Data Tab, text to column, transpose, and pivot table.
The most effective way I've seen that done is a "lookup table"/ "reference table" that contains a list of all of the unique phrases (this is typically used with localities such as "New York", "San Francisco", etc.) that you're interested in, and then some type COUNTIF function that looks through the array and counts the number of times each phrase is found. Hope that helps!
@@adaptalytics Thanks. That seems like a good option. I took the list of words in my article and made a big column, like you did. The rows where all sequential so I grabbed the word and I anded (&&) " " and then && the next word and it gives me a 2 word column. Then I just pivot tabled it like you did. Seems like I need to learn more on making databases from webpages. Maybe I need to study webscrapping and data structures.
thank you for posting this. How to do this with sentences in the row and with like a million data records? Need a paragraph but cell character limit sets in after 250 rows.
If you've got a million records then Excel probably isn't the best tool for you. I'd recommend something like Rapid Miner (UI-based) or Python (code-based).
The formula at 4:36 was an error. :-( The intent was to create two calculations - the first (column C) is a running cumulative that sums all values in the pivot table as it moves down the page. The second (column D) divides the cumulative on each row by the absolute total at the bottom of the pivot table. This second calculation is what computes the value needed to create the Pareto principle: "X% of the dataset makes up Y% of the total" - this is what I reference at 5:44 and eventually get to by 6:32 - "there are 1,000 unique words and 117 of them constitute half (50%) of total words in the data set". This is the Pareto principle. The Pareto principle is easiest understood when analyzing tax dollars paid into any government - it's often that 5% of the taxpayers pay 80% of the total tax dollars. Hope helps clarify things!
@@adaptalytics Thanks is not enough but thank you so very much for your amazing explanation. Its help me a lot and your video just save me. Thanks a lot. 😇
Hi! Very useful. I tried doing the same but I experienced a lot of words like "a, the, and, but" etcetera. Do you have a speedy way of excluding these words so only the useful words stay in the pareto or wordcloud?
Sure. Those are called "stop words". The easiest/fastest way to handle them is to find a list of stop words that someone has already compiled (rather than compiling them yourself). Once you find that list (Google search, ask a friend, whatever) then you can bump your entire word list up against the stop word list (if using Excel, the "MATCH" function is easiest) and throw out any word in your list that was found in the stop word list. Often times the stop word list isn't complete (at least for your dataset) so you usually have to add a couple more to it. Hope that helps!
When I paste the last row to a new sheet it doesn’t paste all the words from the text analysis. Did I miss a step or something? Any tips would helpful!
You probably forgot to "paste values". Pasting "normally" would try to paste the same formula, which would error out since the data isn't there in the new workbook. Look at timestamp 2:45 - I did it quickly and used a shortcut so I'll take the blame for why you missed it. :-( Try copying, going to the new workbook, right-clicking in the cell, selecting "Paste Special" and then selecting "values". If that doesn't make sense, just do a quick internet search for "how to paste values in Excel" or something similar and you'll figure it out. 🙂 Thanks for watching!
this was new to me, glad i came across the video, but the stop words make it ... useless. im in dire need of having those removed to make sense of all my text, done a few searches but it's not easy/straight forward. there goes another afternoon of research, trial and error.
The simplest way I've done this in the past is to create a secondary list of stop words and write a formula that looks at this list and if the word is found, returns a Boolean value. For example, create an attribute column named "Is Stop Word" that returns a Y if it's found in your list of stop words and an N if it's not found. To create this initial list of stop words, sort the total words high to low by volume (as I've done around the 4:00 mark) and then scroll down with your human eyeballs and type a Y by any of the words that you'd like to exclude. Once you've done this for the top 100 words or so, copy this list elsewhere in the workbook (I usually have a tab named "Lookups" for odds and ends like this), filter to ONLY the words you've flagged as Y and delete this other words - this is your initial list of stop words. Go back to your dataset and reference this list and then once you've got the new stop word column created, use it to filter OUT the stop words in your pivot table. Another way to do this is to start by looking at short words using the LEN function. It's not perfect but it's a start. Most stop words are 2 to 4 characters, whereas most other words are longer, so you can filter your initial list to only those words that are 4 characters or less and see if that yields a high volume of stop words. Hope that helps!
It's an old-school form of concatenation. I'm just concatenating the first two rows, separated by a space (in double quotes). The more formal way to do this would be =CONCATENATE(AH5," ",AH6). The second formula though, is slightly different, as I concatenate the first formula with the next row (rather than just the prior row with the next row). Hope that helps!
@@adaptalytics Thanks that seemed to work. My intuition told me to use concatenate but -- because of the small print -- I wasn't able to read your innovation.
If you're talking about how I quickly counted the number of unique words, then that wasn't a formula, but was a native Excel behavior that's available if you right click the bottom task bar (called the status bar) and make sure that "Count" is selected. Doing so will count any content highlighted at any time - it's very useful! Here's a video I found that explains it in more detail: th-cam.com/video/xBNIv14rynQ/w-d-xo.html - hope that helps!
Nonchalant brilliance. Love the video mate. Very helpful
Glad you enjoyed it and found it helpful! That was the whole point of sharing. Cheers!
Hey man! I just got the best ever kudos at my job because of this! Thank you so much and you are so awesome!
Great to hear! We share stuff like this in hopes of helping people so cheers to your success!
Firstly, and respectfully, sir, you have a very beautiful masculine voice. Secondly, thank you for the approach
Firstly, thank you for the kind words and I'm glad it was helpful. :-)
Great video. I'm actually in an Excel file right now and used Word to make a word cloud out of the text in the spreadsheet. Will have to try this. Good speaking voice too, by the way.
Appreciate the kind words and I'm glad it was helpful!
Clever thinking. Appreciate the content!
Glad you liked it. If you have questions about other Excel related analyses, let us know and we'll make a video to help!
THANK YOU SO MUCH!
nice video and really helpful, wish had a font size little bigger , would have been easy to read.
Great feedback, thanks. I'll adjust accordingly next time. 🙂
omg this is so useful
Great to hear!
It is a great trick to analyse our data... If data is too much high then it will not be cover in single cell... Is there any option to enter n number of words in single cell
According to this article (support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3) as of today, Excel can only handle 32,767 characters in an individual cell. If you have a little more than this, you could use other tools such as Microsoft Word to do a character count to determine where your limit is and then do a couple recursive cut & paste commands from Word to Excel to handle this. However, if you're dealing with this large a volume of data, then Excel probably isn't the best platform to be analyzing text with. You might want to look into Python (if you're the coding type) or maybe Rapid Miner or SAS if you prefer a slightly more UI-based approach. Hope that helps!
Well explained, thanks for sharing- just one downside of cell capacity limitation to 32767 characters
Thanks for the comment and for sharing info on Excel's cell character cap. This tutorial was targeted to the average user that likely isn't dealing with a ton of data. Excel definitely has its limitations on data volume and processing constraints.
How do I do this with grouping 2 words?
This uses space delimited separation of words. I’m interested in doing this with 2 or 3 words. Themes.
Notes:
Word Cloud
Data Tab, text to column, transpose, and pivot table.
The most effective way I've seen that done is a "lookup table"/ "reference table" that contains a list of all of the unique phrases (this is typically used with localities such as "New York", "San Francisco", etc.) that you're interested in, and then some type COUNTIF function that looks through the array and counts the number of times each phrase is found. Hope that helps!
@@adaptalytics Thanks. That seems like a good option.
I took the list of words in my article and made a big column, like you did. The rows where all sequential so I grabbed the word and I anded (&&) " " and then && the next word and it gives me a 2 word column. Then I just pivot tabled it like you did.
Seems like I need to learn more on making databases from webpages. Maybe I need to study webscrapping and data structures.
thank you for posting this. How to do this with sentences in the row and with like a million data records? Need a paragraph but cell character limit sets in after 250 rows.
If you've got a million records then Excel probably isn't the best tool for you. I'd recommend something like Rapid Miner (UI-based) or Python (code-based).
What is the formula you use around 4:36 into the video? and afterwards, what did you change in the formula?
The formula at 4:36 was an error. :-( The intent was to create two calculations - the first (column C) is a running cumulative that sums all values in the pivot table as it moves down the page. The second (column D) divides the cumulative on each row by the absolute total at the bottom of the pivot table. This second calculation is what computes the value needed to create the Pareto principle: "X% of the dataset makes up Y% of the total" - this is what I reference at 5:44 and eventually get to by 6:32 - "there are 1,000 unique words and 117 of them constitute half (50%) of total words in the data set". This is the Pareto principle. The Pareto principle is easiest understood when analyzing tax dollars paid into any government - it's often that 5% of the taxpayers pay 80% of the total tax dollars.
Hope helps clarify things!
@@adaptalytics Thanks is not enough but thank you so very much for your amazing explanation. Its help me a lot and your video just save me. Thanks a lot. 😇
Hi! Very useful. I tried doing the same but I experienced a lot of words like "a, the, and, but" etcetera. Do you have a speedy way of excluding these words so only the useful words stay in the pareto or wordcloud?
Sure. Those are called "stop words". The easiest/fastest way to handle them is to find a list of stop words that someone has already compiled (rather than compiling them yourself). Once you find that list (Google search, ask a friend, whatever) then you can bump your entire word list up against the stop word list (if using Excel, the "MATCH" function is easiest) and throw out any word in your list that was found in the stop word list. Often times the stop word list isn't complete (at least for your dataset) so you usually have to add a couple more to it. Hope that helps!
When I paste the last row to a new sheet it doesn’t paste all the words from the text analysis. Did I miss a step or something? Any tips would helpful!
You probably forgot to "paste values". Pasting "normally" would try to paste the same formula, which would error out since the data isn't there in the new workbook. Look at timestamp 2:45 - I did it quickly and used a shortcut so I'll take the blame for why you missed it. :-( Try copying, going to the new workbook, right-clicking in the cell, selecting "Paste Special" and then selecting "values". If that doesn't make sense, just do a quick internet search for "how to paste values in Excel" or something similar and you'll figure it out. 🙂 Thanks for watching!
this was new to me, glad i came across the video, but the stop words make it ... useless. im in dire need of having those removed to make sense of all my text, done a few searches but it's not easy/straight forward. there goes another afternoon of research, trial and error.
The simplest way I've done this in the past is to create a secondary list of stop words and write a formula that looks at this list and if the word is found, returns a Boolean value. For example, create an attribute column named "Is Stop Word" that returns a Y if it's found in your list of stop words and an N if it's not found. To create this initial list of stop words, sort the total words high to low by volume (as I've done around the 4:00 mark) and then scroll down with your human eyeballs and type a Y by any of the words that you'd like to exclude. Once you've done this for the top 100 words or so, copy this list elsewhere in the workbook (I usually have a tab named "Lookups" for odds and ends like this), filter to ONLY the words you've flagged as Y and delete this other words - this is your initial list of stop words. Go back to your dataset and reference this list and then once you've got the new stop word column created, use it to filter OUT the stop words in your pivot table.
Another way to do this is to start by looking at short words using the LEN function. It's not perfect but it's a start. Most stop words are 2 to 4 characters, whereas most other words are longer, so you can filter your initial list to only those words that are 4 characters or less and see if that yields a high volume of stop words.
Hope that helps!
What is the formula you use around 49 seconds into the video?
It's an old-school form of concatenation. I'm just concatenating the first two rows, separated by a space (in double quotes). The more formal way to do this would be =CONCATENATE(AH5," ",AH6). The second formula though, is slightly different, as I concatenate the first formula with the next row (rather than just the prior row with the next row). Hope that helps!
@@adaptalytics Thanks that seemed to work. My intuition told me to use concatenate but -- because of the small print -- I wasn't able to read your innovation.
Awesome m8, ty!
Happy to help! Let us know if you're stuck on something and we'll see if we can help out with another video.
Can you please share the dataset
Unfortunately no, but there are many sample data sets out there to download for free. Try looking on Kaggle. Best of luck, cheers!
Please tell the formula u used 6:00
If you're talking about how I quickly counted the number of unique words, then that wasn't a formula, but was a native Excel behavior that's available if you right click the bottom task bar (called the status bar) and make sure that "Count" is selected. Doing so will count any content highlighted at any time - it's very useful! Here's a video I found that explains it in more detail: th-cam.com/video/xBNIv14rynQ/w-d-xo.html - hope that helps!