Modeling Science: Dynamic Topic Models of Scholarly Research

An Introduction to Topic Modeling

Neng Wang: Valuing Private Equity

Incredibox Sprunki BUILD A QUEEN RUN Challenge With Wenda

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

ปาฎิหาริย์ไม่มีจริง R.I.P. คุณแม่หน้านิ่ง | อีจัน EJAN

Prof. David Blei - Probabilistic Topic Models and User Behavior

The School of Informatics at the University of Edinburgh

มุมมอง 41 381

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 พ.ย. 2024

ความคิดเห็น • 20

@DILLIPKUMARSAHOOIITM 7 ปีที่แล้ว ⁺¹⁰
Beautiful lecture on topic modeling. Thanks Prof Blei and Univ Edinburg for making this lecture available.
@jagadeeshakanihal 6 ปีที่แล้ว ⁺²
link for pdf of the presentation - www.cs.columbia.edu/~blei/talks/Blei_User_Behavior.pdf
@saqibwarriach ปีที่แล้ว
A very informative session
@Consistent_studying 4 ปีที่แล้ว
Very beautiful! Thanks for sharing.
@kapilkumar2650 6 ปีที่แล้ว
If any semantic meaning that lda has, I think its because of the gibbs sampling step that tries to push a word into a topic that its neighbor words are already in. In broader sense what gibbs sampling is doing is P(selected word | each topic) x P(neighboring words | each topic)
@EzraCheney ปีที่แล้ว
Trivial contrivance
@phpn99 7 ปีที่แล้ว ⁺³
I think the weakness in LDA is that it conflates semantics with words. Meaning arises via the relations between words; which totally escape LDA analysis. All LDA is good for, is to estimate the word promiximity between documents, but it's effectively incapable of extractive precise topics from documents; only generic topics.
@RPDBY 6 ปีที่แล้ว ⁺³
its good enough if you have to deal with hundreds of documents containing thousands of words each
@phpn99 6 ปีที่แล้ว
Sure; but what is it good at ? What is the semantic value of the (let's call it a cartesian distance) between two LDA signatures ? I know what I'm talking about. I worked for a couple of years on an LDA-based classification project and the semantic value of the topics extracted from the documents was too general to be truly useful. I think Blei et all have found an interesting statistical method and a cool idea, but what they fail to express in this entire approach is precisely in what way their metric and the methods by which they choose words yields any meaningful insights on the analyzed texts. I find this whole thing very superficial. Without connecting your word net to some semantic ontology, you are doing nothing but an arbitrary match; arbitrary in the sense that meaning in language occurs in more complex ways than with individual nouns, vebs and adjectives.
@aahirip737 6 ปีที่แล้ว
I'm a noob on this, few weeks into NLP and i'm trying to solve a usecase and i'm hitting exactly this issue. Ultimately LDA just gives me a bunch of topic ids with words that dont mean anything together. I read that i have to name the topics myself ! And i landed here looking for a 'solution'. hmm .. i'm not the only one. Meanwhile i found something interesting ..dont know its worth, ieeexplore.ieee.org/document/6405699/. This introduces the term 'concept' between topic and word - Could not find any implementations as yet.
@RPDBY 6 ปีที่แล้ว
Pritish N I applied lda to public speeches and was able to compare results to manual results (i.e. people read the speeches and distinguished the main topics) and lda performed rather well and discovered 12 out 15 distinct topics. For instance health care topic had such words as health care afford insurance cost at the top, so u won't confuse it with anything else. I also have a few topics that are hard to interpret, but it gave me the main topics I needed across 6000 documents. I need to mention that in addition to stopwords I had to exclude about 30 other words that were frequent but noninformative, such as year state always because etc, these will depend on your area, of course, but they pollute the results, and the exclusion helped a lot.
@phpn99 6 ปีที่แล้ว ⁺²
The problem is that the whole concept of the "topic" is grossly inflated. It has very shallow semantic value. A topic is a broad and ambiguous category.
@bennri 3 ปีที่แล้ว
16:36 add one for Tomotopy
@Jack-lg9mq 5 ปีที่แล้ว ⁺¹
Does anyone know where his other talk is that describes how to perform inference? 16:12
@RPDBY 7 ปีที่แล้ว
how to make this graph from 2:30 in R?
@HarpreetKaur-qq8rx 5 ปีที่แล้ว
Hello professor, Can LDA be used to categorize documents into strict categories. Your video suggest otherwise but I wanted to confirm.
@manishthapliyal6372 5 ปีที่แล้ว ⁺¹
I think you should use hard clustering algorithm like k-means or hierarchical clustering for strict topics because topic modelling is a. Soft clustering approach
@HarpreetKaur-qq8rx 5 ปีที่แล้ว
Thank You Manish for the reply but could you further elaborate on what is meant by soft and hard techniques
@guoguozheng32 3 ปีที่แล้ว
@@HarpreetKaur-qq8rx to my understanding, a hard clustering would assume all the documents in a corpus have the same probably of showing each of the topics. Each document is assumed to only show one of the topics and all the words in this document are assumed to show this topic. A soft clustering assumes each document has different probabilities of showing each of the topics. And a document shows all the topics rather than one. A word in a document shows one of the topics and the words in a document may show different topics.

ต่อไป

เล่นอัตโนมัติ

Modeling Science: Dynamic Topic Models of Scholarly Research

Modeling Science: Dynamic Topic Models of Scholarly Research

An Introduction to Topic Modeling

An Introduction to Topic Modeling

Neng Wang: Valuing Private Equity

Neng Wang: Valuing Private Equity

Incredibox Sprunki BUILD A QUEEN RUN Challenge With Wenda

Incredibox Sprunki BUILD A QUEEN RUN Challenge With Wenda

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

消防避险训练，消防员用“水盾”逼退烈火！这是训练，也是他们可能面对的日常。致敬！#熱門 #中国

ปาฎิหาริย์ไม่มีจริง R.I.P. คุณแม่หน้านิ่ง | อีจัน EJAN

ปาฎิหาริย์ไม่มีจริง R.I.P. คุณแม่หน้านิ่ง | อีจัน EJAN

พลาด เพื่อนขี้เหล้าเมาตกตึกจากดาดฟ้า!! พ่อเครียดหนัก แกล้งคน

พลาด เพื่อนขี้เหล้าเมาตกตึกจากดาดฟ้า!! พ่อเครียดหนัก แกล้งคน

Dirichlet Process Mixture Models and Gibbs Sampling

Dirichlet Process Mixture Models and Gibbs Sampling

Latent Dirichlet Allocation (Part 1 of 2)

Latent Dirichlet Allocation (Part 1 of 2)

Data Science for Public Health Summit: Data Science, Statistics, and Health

Data Science for Public Health Summit: Data Science, Statistics, and Health

Matti Lyra - Evaluating Topic Models

Matti Lyra - Evaluating Topic Models

LDA Topic Models

LDA Topic Models

(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC Foundational

(Original Paper) Latent Dirichlet Allocation (algorithm) | AISC Foundational

Jeffrey Sachs: The Path to Sustainable Development

Jeffrey Sachs: The Path to Sustainable Development

Chris Moody introduces lda2vec

Chris Moody introduces lda2vec

อาจารย์ใหญ่ • คุณอาร์ต เชียงราย | 16 พ.ย. 67 | THE GHOST RADIO

อาจารย์ใหญ่ • คุณอาร์ต เชียงราย | 16 พ.ย. 67 | THE GHOST RADIO

เกษมบัณฑิต เอฟซี พบ บลูเวฟ ชลบุรี MEA ฟุตซอลไทยลีก2024 นัดที่ 15

เกษมบัณฑิต เอฟซี พบ บลูเวฟ ชลบุรี MEA ฟุตซอลไทยลีก2024 นัดที่ 15

ฮักอ้ายอยู่แต่หนูมีผัว - บิ๋ว พรประภา【OFFICIAL MUSIC VIDEO】

ฮักอ้ายอยู่แต่หนูมีผัว - บิ๋ว พรประภา【OFFICIAL MUSIC VIDEO】

เตรียมเฮ! นายกฯอิ๊งค์ แจกเงิน 10,000-แก้หนี้ | DAILYNEWSTODAY 18/11/67

เตรียมเฮ! นายกฯอิ๊งค์ แจกเงิน 10,000-แก้หนี้ | DAILYNEWSTODAY 18/11/67

Try this prank with your friends 😂

Try this prank with your friends 😂

พลาด เพื่อนขี้เหล้าเมาตกตึกจากดาดฟ้า!! พ่อเครียดหนัก แกล้งคน

พลาด เพื่อนขี้เหล้าเมาตกตึกจากดาดฟ้า!! พ่อเครียดหนัก แกล้งคน

หลวงพ่อเท่ง ออกบวชให้พรญาติโยมของจริง ไม่ใช่เล่นหนัง!!

หลวงพ่อเท่ง ออกบวชให้พรญาติโยมของจริง ไม่ใช่เล่นหนัง!!

CEO ถูกบังคับให้แต่งงานกับคนแปลกหน้า กลับกลายเป็นคนที่เขาตกหลุมรักในคืนนั้นและไม่สามารถหนีไปได้

CEO ถูกบังคับให้แต่งงานกับคนแปลกหน้า กลับกลายเป็นคนที่เขาตกหลุมรักในคืนนั้นและไม่สามารถหนีไปได้