How to Create an LDA Topic Model in Python with Gensim (Topic Modeling for DH 03.03)
ฝัง
- เผยแพร่เมื่อ 8 ต.ค. 2024
- Notebook: github.com/wjb...
In this video, we use Gensim and Python to create an LDA Topic Model. As with other text analysis methods, most time is spent preparing the data and getting it into a form readable by the ML system.
If you enjoy this video, please subscribe. I provide all my content at no cost. If you want to support my channel, please donate via
PayPal: www.paypal.com...
Patreon: / wjbmattingly (its my www.themedievalworld.com account as well).
If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
You can follow me at:
/ wjb_mattingly
Notebook: github.com/wjbmattingly/topic_modeling_textbook/blob/main/03_03_lda_model_demo.ipynb
First : thank you VERY MUCH for this video.
I was still facing an error message mentionning that :
module 'pyLDAvis' has no attribute 'gensim'
It comes from the fact that the name changed.
You now have to write :
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
pyLDAvis.enable_notebook()
and then :
# feed the LDA model into the pyLDAvis instance
lda_viz = gensimvis.prepare(ldamodel, corpus, dictionary)
(source : script_kitty : stackoverflow.com/questions/66759852/no-module-named-pyldavis)
Hey man , this jupyter notebook isn't working apparently. Gives the below error.
Unreadable Notebook: C:\Users\PT\Downloads\topic_modelling.ipynb NotJSONError("Notebook does not appear to be JSON: '\
\
\
\
\
\
\
Literally one of the most underrated channel for NLP, keep up the works!!!
You are too kind! Thanks! And I will
I usually view the tutorial videos in speed 1.5x, but this man speaks in 2x XD, thanks for the video
Haha! No problem. I spent a lot of time trying to learn to speak this slowly. =)
Really!!! one of the many underared channels for NLP on youtube, keep your good work, prize will be followed. Thank you
Thanks for the kind words!
Good day Dr. highly appreciate your time and effort to create these videos and make it available on TH-cam ! Best wishes
Thanks for the great tutorial on topic modeling in general, very valuable material here.
In this particular video, have you forgotten to exclude the stop_words? Skimming through the code, but can't find the place in which those were used.
Keep up the good work (Y)
Thank you for this great tutorial!
Wow so easy to understand. Tq so much.
Very clear tutorial...easily understood 🙂
Thanks! Glad you found it useful.
after identifying topics , how do we assign them to each record in the DF? we identify 2 cluster and their relevant words. how to identify which document belongs to cluster 1 and which to cluster 2.
Great tutorial ovrerall, but just a few things I would love clarification on: as a few others have mentioned, where are you removing the stopwords, and what is "glob" used for and when is it used?
I am trying to do LDA where my texts, almost 100 words per line, when introduced to model become one sole document. Is there anywhere I can find a tutorial using a pd Dataframe where the text is cells in the columns? Please help...
wow... very good!
Thanks !
@17:10 doc2bow should give the frequency of words in the doc, not the corpus. Please confirm this.
could you please explain after identifying topics , how do we assign them to each record in the DF?
Very helpful
Amazing tutorial!!
My output shows one huge cluster and many tiny clusters. What does that mean?
Note: First cluster clearly represents the topic. All other clusters have the other irrelevant words with frequency generally 1. Shall I remove all those tokens from the corpus?
Very good video!
It seems like the import pyLDAvis.gensim does not work, I changed to import pyLDAvis_gensim_models
Thanks for updating us! I have liked this so others can see it higher in the comments.
I need to create a file as a matrix in witch every line correspond to a topic and every colum correspond to a word. The information that correspond to the line and the colum is the probability that the word is from the topic. Any help in how to do this?
Thank you
Hey, many thanks for this tutorial! But I have a question:
How can I Export the Final result (viz) to an Excel File ?!
BR,
Hamidreza
Hey !! I stumbled upon this video looking for nlp methods to analyze text pdfs .
I want to analyze over 2000 files, is there a fast way to do so ?
The clustering, would help me to analyze topics within 1 archive is there a way to do it on the 2000 files automatically?
after trying data = load_data("data/ushmm_dn.json")["texts"]
an error occur JSONDecodeError: Expecting value: line 1 column 1 (char 0)
could anyone teach me how to solve this
Can someone please help me. I am getting this error at the end when I try to show the vis:
TypeError: Object of type complex is not JSON serializable
Does anyone know how to fix this? This is my last line of code:
import pyLDAvis
import pyLDAvis.gensim_models
pyLDAvis.enable_notebook()
vis=pyLDAvis.gensim_models.prepare(lda_model,corpus,id2word)
vis
I think it has to do with how the JSON file was read and formatted?
May i ask which python we should use for this ? i keep getting the ModuleNotFoundError...
at minute 6:48, how can I make it work if I have a txt file and not a json one?
❌ import pyLDAvis.gensim
✅import pyLDAvis.gensim_models
can you make a video on Guided LDA or Corex LDA for Semi Supervised LDA?
Hey! What version of spaCy are you using?
Hi Dr. Mattingly, is the ushmm_dn.json file available as well through the GitHub repo ...?
I have a meeting today in which I will be asking permission to share that file. Thanks for reminding me.
Hey, many thanks for this tutorial! But I have a question:
How can I Export the Final result (viz) to an Excel File ?!
BR,
Hamidreza
@@python-programming can i get that file?
Hey man I'm having an issue with this line
vis = prepare(lda_model,corpus,id2word,mds="mmds",R=20)
When I run it, it says
TypeError: prepare() missing 2 required positional arguments: 'vocab' and 'term_frequency'
If ushmm_dn.json is not available can you point us towards data that we can use?
I got the okay to share it. Here it is: github.com/wjbmattingly/topic_modeling_textbook/tree/main/data
can you please help me ? I am getting this error ModuleNotFoundError: No module named 'pyLDAvis.gensim'
import pyLDAvis.gensim_models
Any suggestions on visualizing the output in and IDE such as Pycharm? It seems like there are some issues using pyLDAvis in pycharm
Unfortunately, I do not know of any. PyLDAvis was designed with Jupyter in mind, I believe. You can, however, save it as an html and open it externally. Here's how => stackoverflow.com/questions/41936775/export-pyldavis-graphs-as-standalone-webpage
@@python-programming yeah thats what I have been reading as well thanks!
@@alexwinquist8092 No problem! Wish I had better news.
When I do the visualization part, it tells that "module 'pyLDAvis' has no attribute 'gensim'
". Not sure how to deal with it.
try gensim_models, if i got that right, they changed it in a later version, therefor it does not work anymore with only gensim
import pyLDAvis
import pyLDAvis.gensim_models
pyLDAvis.enable_notebook()
vis=pyLDAvis.gensim_models.prepare(lda_model,corpus,id2word)
Using final and new as variable names makes me a little angry as a Java developer :D
Haha! I know it is such a bad habbit, I forgot about final keyword in Java.
Uhmm. I think u forgot to remove the stopwords.
afaik "simple_preprocess" doesn't remove stopwords.
Anw, please cmiiw
"No module named gensim"
pip install "pandas
Thanks 😊