Bag of Words - Feature Extraction in Natural Language Processing (BoW in NLP)

Cosine Similarity, Clearly Explained!!!

The moment we stopped understanding AI [AlexNet]

Which part do you like?😂😂😂New Meme Remix

Smart Sigma Kid #funny #sigma

龟兔赛跑：好可爱的小乌龟#short #angel #clown

Cosine Similarity ← Natural Language Processing ← Socratica

Socratica

มุมมอง 4 581

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 14 ต.ค. 2024

ความคิดเห็น • 15

@Socratica 7 หลายเดือนก่อน ⁺³
𝙄𝙣𝙩𝙧𝙤𝙙𝙪𝙘𝙞𝙣𝙜 𝙎𝙤𝙘𝙧𝙖𝙩𝙞𝙘𝙖 𝘾𝙊𝙐𝙍𝙎𝙀𝙎
www.socratica.com/collections
@MakeDataUseful 7 หลายเดือนก่อน ⁺⁵
Fantastic video, great to see another Socratica video in my feed
@juanmacias5922 7 หลายเดือนก่อน ⁺²
So cool, being able to find similarities in books from neighboring time periods was fascinating.
@Socratica 7 หลายเดือนก่อน ⁺³
It really makes us curious about a lot of the more recent writers-can you use this to find out which older writers influenced them!
@jagadishgospat2548 7 หลายเดือนก่อน ⁺³
Keep em coming, the courses are looking good too.
@Insightfill 7 หลายเดือนก่อน ⁺²
This is phenomenal! Here I was, thinking we were just going to talk about the cos a = a approximation in trig. Bonus!
@Socratica 7 หลายเดือนก่อน ⁺³
It was a fun surprise to learn about this technique 💜🦉
@Insightfill 7 หลายเดือนก่อน ⁺²
@@Socratica It's fun when you hear of similar analysis done to uncover ghost writers or shared authorship. Shakespeare, Rowling, and The Federalist Papers all come to mind.
@OPlutarch 7 หลายเดือนก่อน
Very useful info, and the approach was excellent, very fun too
@orangeinfotainment620 7 หลายเดือนก่อน
Thank you
@ahmedouerfelli4709 7 หลายเดือนก่อน
I don't like removing "stop words" from the statistics, because their frequency is still meaningful. Even though everybody uses the word "the" frequently, some use it much more than others; and that is some characteristic that should not be ignored.
So rather, I would suggest performing some kind of "normalization"; like dividing each word count by the average occurrence rate of that particular word in natural language.
Instead of just word counts, the vector coordinates will consist of relative use rate of the particular word in the book compared the average use rate in general language.
That would make a much more precise comparison. Because not just "stop words" are very common, some words are inherently much more common than others.
Although I did not make the experiment, I suspect that in this way, everything will have a much lower cosine similarity.
@AndrewMilesMurphy 6 หลายเดือนก่อน
That's a very intuitive and helpful explanation, thank you. But pray tell, prithee even, is not some relationship between words in individual sentences what we would prefer (smaller angels)? It seems odd to me that when creating embeddings we're focused on these huge arcs rather than the smaller arcs that build understanding on a more basic level. The thresh-hold for AI in GPT 3 seems to have been on a huge amount of text, but isn't there some way to make that smaller? For most of us, that's the only way we can even contribute, as we just don't have the computer-hardware.
@danielschmider5069 7 หลายเดือนก่อน
pretty good, but the visualization of the results could have been made in something other than a table. That way, you wouldn't have to explain why the diagonal is 1, and that every number appears twice (mirrored along the diagonal). You'd end up with just 45 rather than 100 datapoints, and then compare the "top 10" across the different measurements. This would be much easier to follow.
@Socratica 7 หลายเดือนก่อน ⁺³
Interesting!! We'd love to see a sketch of what you have in mind!
@cryptodashboard1173 2 หลายเดือนก่อน
@@Socraticapls upload more videos on AI and machine learning

ต่อไป

เล่นอัตโนมัติ

Bag of Words - Feature Extraction in Natural Language Processing (BoW in NLP)

Bag of Words - Feature Extraction in Natural Language Processing (BoW in NLP)

Cosine Similarity, Clearly Explained!!!

Cosine Similarity, Clearly Explained!!!

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Which part do you like?😂😂😂New Meme Remix

Which part do you like?😂😂😂New Meme Remix

Smart Sigma Kid #funny #sigma

Smart Sigma Kid #funny #sigma

龟兔赛跑：好可爱的小乌龟#short #angel #clown

龟兔赛跑：好可爱的小乌龟#short #angel #clown

Accuracy vs PRECISION 🎯 College Math & Science

Accuracy vs PRECISION 🎯 College Math & Science

The Mystery of Hyperbolicity - Numberphile

The Mystery of Hyperbolicity - Numberphile

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

Quest To Find The Largest Number

Quest To Find The Largest Number

A Complete Overview of Word Embeddings

A Complete Overview of Word Embeddings

Terence Tao at IMO 2024: AI and Mathematics

Terence Tao at IMO 2024: AI and Mathematics

Principles of Beautiful Figures for Research Papers

Principles of Beautiful Figures for Research Papers

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

🔴 ฟุตบอลแชมป์กีฬา 7HD แชมเปียน คัพ 2024 สนาม 3 วันที่ 14 ต.ค. 2567

🔴 ฟุตบอลแชมป์กีฬา 7HD แชมเปียน คัพ 2024 สนาม 3 วันที่ 14 ต.ค. 2567

REAL 3D brush can draw grass Life Hack #shorts #lifehacks

REAL 3D brush can draw grass Life Hack #shorts #lifehacks

ILLSLICK - WINTER IS COMING [Official Video]

ILLSLICK - WINTER IS COMING [Official Video]

แกล้งเพื่อน กินทรายสี!

แกล้งเพื่อน กินทรายสี!

龟兔赛跑：好可爱的小乌龟#short #angel #clown

龟兔赛跑：好可爱的小乌龟#short #angel #clown

จะเดินก็ไม่กล้า! ผอ เคลียร์ต้นไม้หน้าทางออก เจอครูแซว คนจะกลับบ้าน ผอ. มาขยันอะไรตอนนี้

จะเดินก็ไม่กล้า! ผอ เคลียร์ต้นไม้หน้าทางออก เจอครูแซว คนจะกลับบ้าน ผอ. มาขยันอะไรตอนนี้

7 Days Exploring An Underground City

7 Days Exploring An Underground City

Flipping Robot vs Heavier And Heavier Objects

Flipping Robot vs Heavier And Heavier Objects