Long-Context LLM Extension

Why Does Diffusion Work Better than Auto-Regression?

Diffusion Models From Scratch | Score-Based Generative Models Explained | Math Explained

OHANA บ้าพลัง EP.134 : เกมการ์ดโอฮาน่า X วัยหนุ่ม 2544

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 2

Simple Diffusion Language Models

Sasha Rush 🤗

มุมมอง 6 568

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ม.ค. 2025

ความคิดเห็น • 14

@jrkirby93 6 หลายเดือนก่อน ⁺⁶
Why do this by masking and unmasking whole tokens or words? Why not pretrain some kind of latent space for each token/word and then do the diffusion in the latent space? Then the diffusion becomes much simpler. Of course you still need to convert from the latent space into the best token/word after that, but that should be relatively straightforward as well.
@srush_nlp 6 หลายเดือนก่อน ⁺¹³
Useful latent spaces for text have remained an extremely challenging problem for years, starting from the fact that no one really ever got Text VAEs or GANs to actually work.. We have nice ways of mapping text to embeddings for tasks like retrival, but these spaces do not have a nice regularized structure that would permit things like gaussian diffusion to work well. Certainly agree that if it were easy to do this it would make sense to run standard diffusion.
@marinepower 6 หลายเดือนก่อน ⁺⁴
Making this process discrete seems very strange to me. Why not noise the token embeddings themselves (e.g. at pure noise levels a given token embedding is made up of the embeddings of all tokens, at zero noise it is made up of a one-hot vector like normal). And as you do diffusion you can update this token-embedding probability space since you have the logits.
After n inference steps you will probably end up with tokens that probably don't converge to a single token but instead map to some subset of tokens that should all be roughly equivalent in semantic space, so you can just randomly sample from said distribution based on the final logits. Tokens you've already generated will be one-hot, noised tokens will be blended as described.
@srush_nlp 6 หลายเดือนก่อน ⁺¹
Surprisingly, many people have tried this with lots of fancy approaches. So far nothing is really close to auto regressive models. Many things go wrong, but particularly the last step of mapping back to specific words seems to be tricky.
@marinepower 6 หลายเดือนก่อน ⁺¹
@@srush_nlp Hmm... that is very surprising to me. The mapping seems like the most straightforward part. Do you know what this technique is called in academia / if there are any papers published on this idea?
Also, after thinking about it, I'm pretty sure you don't want to uniformly smear across all tokens, but instead make the embedding the average unconditional embedding (e.g. count all tokens in your dataset, and make the pure noise regime equal to the average token).
@MagusArtStudios 6 หลายเดือนก่อน
I made a Retrieval Base chatbot from scratch, but I'm not a professional, but the main component was compressing the vocabulary with synonyms, training the model on the compressed vocabulary to make it grok faster. I have a feeling that approach would allow for very small and intelligent models. What do you think about compressing the vocabulary?
@sarthak-ti 6 หลายเดือนก่อน ⁺¹
Really cool stuff! It’s a shame it’s not quite at the level of auto regressive models (especially for DNA), but I’m excited about future work in the field. Love the explanation, it made reading the paper much more digestible
@abitintostep หลายเดือนก่อน
Did you guys just recently read the original BERT paper, added a random masking, a few repeats and done?
@srush_nlp 28 วันที่ผ่านมา
Yeah! just read it over the summer. good paper.
@john_olu 6 หลายเดือนก่อน ⁺¹
Really good video. I have to improve on my math, but i get the general idea. Will try to implement the idea
@ASarkar-ML หลายเดือนก่อน
@srush_nlp Great explanation! How do you think discrete diffusion models should be modified to enable long context sequence generation comparable to LLMs?
@MultiBussen 16 วันที่ผ่านมา
See 4.2 in the paper! It talks about how to use MDLM for autoregressive modeling, which results in text of arbitrary length
@wwkk4964 6 หลายเดือนก่อน ⁺¹
Really liked it! This could work on ARC better.

ต่อไป

เล่นอัตโนมัติ

Long-Context LLM Extension

Long-Context LLM Extension

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Diffusion Models From Scratch | Score-Based Generative Models Explained | Math Explained

Diffusion Models From Scratch | Score-Based Generative Models Explained | Math Explained

OHANA บ้าพลัง EP.134 : เกมการ์ดโอฮาน่า X วัยหนุ่ม 2544

OHANA บ้าพลัง EP.134 : เกมการ์ดโอฮาน่า X วัยหนุ่ม 2544

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 2

🔴LIVE สด! PGC 2024 ศึกชิงแชมป์โลกพับจี Circuit 3 วันที่ 2

หมวกกันน็อค - TaitosmitH |Official MV|

หมวกกันน็อค - TaitosmitH |Official MV|

We had Image Gen copying LLM... and now the REVERSE?? [DiffusionLM]

We had Image Gen copying LLM... and now the REVERSE?? [DiffusionLM]

Mamba Language Model Simplified In JUST 5 MINUTES!

Mamba Language Model Simplified In JUST 5 MINUTES!

How I Understand Diffusion Models

How I Understand Diffusion Models

Do we need Attention? A Mamba Primer

Do we need Attention? A Mamba Primer

Diffusion Models | PyTorch Implementation

Diffusion Models | PyTorch Implementation

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

Inverting Language Models: Raw Text from Vectors and LLM APIs

Inverting Language Models: Raw Text from Vectors and LLM APIs

Accelerated Training by Amplifying Slow Gradients

Accelerated Training by Amplifying Slow Gradients

Text to Image Diffusion AI Model from scratch - Explained one line of code at a time!

Text to Image Diffusion AI Model from scratch - Explained one line of code at a time!

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ต้าห์อู๋-ออฟโรด ขอฝึกวิชาเซียน จับหมูป่ามือเปล่า | เฮ็ดอย่างเซียนหรั่ง FULL EP.21 | One Playground

ต้าห์อู๋-ออฟโรด ขอฝึกวิชาเซียน จับหมูป่ามือเปล่า | เฮ็ดอย่างเซียนหรั่ง FULL EP.21 | One Playground

Players vs Trophies 🤯

Players vs Trophies 🤯

Bloxfruits player after Dragon update🐲| Doge Gaming

Bloxfruits player after Dragon update🐲| Doge Gaming

มายคราฟ แต่ ผมห้ามตาย..!!! #minecraft #พี่เก้า #มายคราฟ #minecraftmtr

มายคราฟ แต่ ผมห้ามตาย..!!! #minecraft #พี่เก้า #มายคราฟ #minecraftmtr

เนื้อเรื่องที่ท่านจะโมโหจนน้ำตาไหล | Mouthwashing

เนื้อเรื่องที่ท่านจะโมโหจนน้ำตาไหล | Mouthwashing

เจ้าของแทบทรุด บ้านสร้างได้ 3 เดือน พังทรุดตัว เพจดังชี้สาเหตุ ไม่ใช่เกิดจากเสาเข็ม

เจ้าของแทบทรุด บ้านสร้างได้ 3 เดือน พังทรุดตัว เพจดังชี้สาเหตุ ไม่ใช่เกิดจากเสาเข็ม