How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

How Stable Diffusion Works (AI Image Generation)

DALL-E: Zero-Shot Text-to-Image Generation | Paper Explained

แฟนแนวใด๋ - ยูริ โตเกียวมิวสิค [ SyncVersion ]

พาแม่บินครั้งแรก..เกือบวูบ! ให้เกาหลีเยียวยาใจ I Korea Ep.1 [Seoul] x อุงเอิง

เมียหลวงปีนรังรักจับกิ๊ก โดนท้าให้ไปฟ้อง แถมท้าทายทะเบียนสมรสแค่กระดาษใบเดียว l EP.1693 l 17 มิ.ย.67

DALL·E 2 Explained - model architecture, results and comparison

AI Bites

มุมมอง 7 255

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 19 มิ.ย. 2024
DALL·E 2 Explained - model architecture, results and comparison
Dalle-2 or unCLIP is an image generation model that leverages the diffusion model to generate images from text embeddings. Here is a video that explains the DALLE-2 paper. More specifically, the model architecture, the results and comparison to other state-of-the-art models like GLIDE.
Paper Title
Hierarchical Text-Conditional Image Generation with CLIP Latents
Paper Abstract
Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples.
Paper Link
arxiv.org/abs/2204.06125
Website
openai.com/dall-e-2
Video Outline
0:00 - Introduction
1:05 - Method / Model of CLIP
1:53 - Method / Model of unCLIP
3:02 - Decoder Architecture
4:04 - Prior Architecture
5:59 - Image Manipulation
7:49 - Image Interpolation
8:26 - Languge Guided Manipulation
9:14 - Importance of the Prior
10:07 - Results / Human Evaluation
AI Bites
TH-cam: / aibites
Twitter: / ai_bites
Patreon: / ai_bites
Github: github.com/ai-bites
Vision Transformers (ViT): • Vision Transformer (Vi...
Data Efficient Image Transformer (DeiT): • DeiT - Data-efficient ...
📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚
📖 Deep Learning by Ian Goodfellow - amzn.to/3Wnyixv
📙 Pattern Recognition and Machine Learning by Christopher M. Bishop - amzn.to/3ZVnQQA
📗 Machine Learning: A Probabilistic Perspective by Kevin Murphy - amzn.to/3kAqThb
📘 Multiple View Geometry in Computer Vision by R Hartley and A Zisserman - amzn.to/3XKVOWi
Music: www.bensound.com
#machinelearning #deeplearning #aibites
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 6

@oc1655 ปีที่แล้ว ⁺⁴
this is excellent. i just can't understand why companies (e.g., openai in this case) cannot add small notational hints to make their method more understandable. your diagram at around 2:50 is better than the 2 page gibberish actual paper presents. great work!
@rezarawassizadeh4601 2 ปีที่แล้ว ⁺²
Thank you for this easy to understand, to my understanding CLIP is not separate from Prior. Prior includes the frozen CLIP model that constructs image embedding.
@liji9354 ปีที่แล้ว ⁺¹
Thank you so much! this is super helpful!
@salomeshunamon4737 ปีที่แล้ว
How would you say DALLE2 compares to Stable Diffusion architecturally? Would you consider Stable Diffusion a latent diffusion model, denoising diffusion model or something else?
@salomeshunamon4737 ปีที่แล้ว
Another question :) I see in the table that humans evaluated the output and rated the photos by photorealism and prompt accuracy, but what is diversity?
@AIBites ปีที่แล้ว ⁺¹
so diversity is how different the output images look. For example, if you want images of winter, then the generated images should not always show hibernating trees without leaves but also show snow

ต่อไป

เล่นอัตโนมัติ

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

How AI Image Generators Work (Stable Diffusion / Dall-E) - Computerphile

How Stable Diffusion Works (AI Image Generation)

How Stable Diffusion Works (AI Image Generation)

DALL-E: Zero-Shot Text-to-Image Generation | Paper Explained

DALL-E: Zero-Shot Text-to-Image Generation | Paper Explained

แฟนแนวใด๋ - ยูริ โตเกียวมิวสิค [ SyncVersion ]

แฟนแนวใด๋ - ยูริ โตเกียวมิวสิค [ SyncVersion ]

พาแม่บินครั้งแรก..เกือบวูบ! ให้เกาหลีเยียวยาใจ I Korea Ep.1 [Seoul] x อุงเอิง

พาแม่บินครั้งแรก..เกือบวูบ! ให้เกาหลีเยียวยาใจ I Korea Ep.1 [Seoul] x อุงเอิง

เมียหลวงปีนรังรักจับกิ๊ก โดนท้าให้ไปฟ้อง แถมท้าทายทะเบียนสมรสแค่กระดาษใบเดียว l EP.1693 l 17 มิ.ย.67

เมียหลวงปีนรังรักจับกิ๊ก โดนท้าให้ไปฟ้อง แถมท้าทายทะเบียนสมรสแค่กระดาษใบเดียว l EP.1693 l 17 มิ.ย.67

กระโปรงเปื้อนอะไร อะ!

กระโปรงเปื้อนอะไร อะ!

OpenAI CLIP: ConnectingText and Images (Paper Explained)

OpenAI CLIP: ConnectingText and Images (Paper Explained)

OpenAI DALL·E 2: Hierarchical text conditional image generation with clip latents

OpenAI DALL·E 2: Hierarchical text conditional image generation with clip latents

Stable Diffusion in Code (AI Image Generation) - Computerphile

Stable Diffusion in Code (AI Image Generation) - Computerphile

AI art, explained

AI art, explained

Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)

Parti - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation (Paper Explained)

OpenAI CLIP Explained | Multi-modal ML

OpenAI CLIP Explained | Multi-modal ML

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers

Diffusion Models | Paper Explanation | Math Explained

Diffusion Models | Paper Explanation | Math Explained

DALL-E 2 is… meh

DALL-E 2 is… meh

รีวิว Sony BRAVIA 7 รุ่นใหม่ 2024 | เปรียบดั่ง...'ยกโรงหนังมาไว้ที่บ้าน'

รีวิว Sony BRAVIA 7 รุ่นใหม่ 2024 | เปรียบดั่ง...'ยกโรงหนังมาไว้ที่บ้าน'

How To Joint Picture Editing pictures with a Green Background Tips For Better Results

How To Joint Picture Editing pictures with a Green Background Tips For Better Results

CAL ME 105 key board cell phone 📱 informations to know and enjoy

CAL ME 105 key board cell phone 📱 informations to know and enjoy

โดรนจิ๋วแต่แจ๋ว

โดรนจิ๋วแต่แจ๋ว

My DREAM Everyday Tech!

My DREAM Everyday Tech!

แอปเปิลอินเทลิเจนซ์

แอปเปิลอินเทลิเจนซ์

lol Apple Intelligence is dumb...

lol Apple Intelligence is dumb...

Tutorial on how to edit editing and color change

Tutorial on how to edit editing and color change