Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Highlight : นายใหญ่ฉุนใคร?

New Colour Match Puzzle Challenge with Cola and McDonald’s Avengers Logo - Incredibox Sprunki

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained)

Yannic Kilcher

มุมมอง 20 585

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 8 ก.พ. 2025

ความคิดเห็น • 35

@YannicKilcher 3 ปีที่แล้ว ⁺²
OUTLINE:
0:00 - Intro & Overview
4:25 - Sponsor: Weights & Biases
6:15 - Problem Setup & Contributions
8:50 - Recap: Straight-Through Estimator
13:25 - Encoding the discrete problem as an inner product
19:45 - From algorithm to distribution
23:15 - Substituting the gradient
26:50 - Defining a target distribution
38:30 - Approximating marginals via perturb-and-MAP
45:10 - Entire algorithm recap
56:45 - Github Page & Example
Paper: arxiv.org/abs/2106.01798
Code (TF): github.com/nec-research/tf-imle
Code (Torch): github.com/uclnlp/torch-imle
Our Discord: discord.gg/4H8xxDF
@BryanRink 3 ปีที่แล้ว ⁺³
This is the type of paper I would normally gloss over and skip. Thanks for taking the time to explain it!
@dohyun0047 3 ปีที่แล้ว ⁺⁴
awesome explanation with multiple careful recapping , thank you!!
@shawnlin2718 3 ปีที่แล้ว ⁺²
I check your channel for new video everyday. Keep it up!
@SirSpinach 3 ปีที่แล้ว
Hahaha so many layers 🤣
Thanks for yet another explaino, Yannic. 🤲
@etiennetiennetienne 3 ปีที่แล้ว ⁺⁵
i am confused >
@YannicKilcher 3 ปีที่แล้ว ⁺⁵
Because of how the problem is encoded as an inner product between z and theta
@etiennetiennetienne 3 ปีที่แล้ว
@@YannicKilcher ah ok in fact this is very clearly explained but i had to watch it a second time because i am dumb. very clever!
@PasqualeMinervini 3 ปีที่แล้ว ⁺⁴
Here's the other video about the paper :) th-cam.com/video/hb2b0K2PTxI/w-d-xo.html
@DanFrederiksen 3 ปีที่แล้ว ⁺⁵
I don't fully get the gist here but if the middle is an explicit algorithm, wouldn't that need a well defined input meaning the output of the first net would be a well defined feature set thus the first net is defined and static and wouldn't need to be part of a full backprop chain? or is the middle part somehow also a general learning algo compatible with backprop??
@YannicKilcher 3 ปีที่แล้ว ⁺²
That's the point: the middle part is a general algorithm, but you can still use backpropagation
@dennisestenson7820 3 ปีที่แล้ว
I could be wrong, but it sounds to me like they use an analytical perturbative method to get around the fact that a distribution divides the two neural networks.
@ivanvoid4910 3 ปีที่แล้ว
Oh man what a trip, it's pretty sneaky of them just to avoid computing gradient of the solver.
@epolat19 5 หลายเดือนก่อน
what happened to the constraints C during MAP?
@dennisestenson7820 3 ปีที่แล้ว
Makes sense so far.
@JTMoustache 3 ปีที่แล้ว
I know its an example but it is unclear to me why you would need a second neural network after the shortest path to get the size of the path. This whole approach seem to depend on the fact that we infer new information from the solution. But without that 2nd network, it is impossible to improve the first graph generation network.
Hm.. would it be possible to use that method without the last layers.. without the last layers MLE( - label ) is the loss but then you’re stuck right after the discrete part.
@miguelalba2106 3 ปีที่แล้ว
Pretty good video :)
@herp_derpingson 3 ปีที่แล้ว
Basically the takeaway is that if we formulate a problem as a dot product of the solution and the problem statement, then we can bypass taking gradients of the discrete solver because we know that the gradients wrt the output of the solver, can be subtracted from the input of the solver and it would still converge.
.
So we are perturbing the problem before running through Dijkstra. Then we do the subtract theta from z gradient trick to get a new problem which should be solved more easily with Dijkstra. Then we run this new problem through Dijkstra again and find the difference in solutions of the problem. The trick is then to say that the difference (gradient) in solutions of the problem is actually same as the difference (gradient) in the problem definitions.
.
My intuition fails me. I think this would only work for a very small subset of problems.
@kennethsch2943 3 ปีที่แล้ว ⁺¹
36:00 I am also wondering how it is guaranteed that the target distribution created by subtracting z gradient from theta has on expectation a lower loss than the original distribution. This is totally dependent on the blackbox algorithm isn't it. Shortest paths could be totally different with tiny changes in theta. And if it was guaranteed, why not use the z gradient directly on theta and skip all the target distribution stuff?
@kimchi_taco 3 ปีที่แล้ว ⁺²
My back prop blew up around 40:00 😂
@G12GilbertProduction 3 ปีที่แล้ว
Mixed-MLP in the reverse looks like this: 36:53. :D
@riyajatar6859 3 ปีที่แล้ว
Hi Yannic, let's have papers from video modality.
🖐
@NikolajKuntner 3 ปีที่แล้ว ⁺⁴
I might be totally wrong, but I think generally people can't/won't watch a 40 minute video every 3 days.
@YannicKilcher 3 ปีที่แล้ว ⁺³⁵
I didn't think so either when I started this, but here we are 🤷🏽‍♀️
@NikolajKuntner 3 ปีที่แล้ว ⁺⁶
@@YannicKilcher Acquired audience
@AIology2022 3 ปีที่แล้ว ⁺¹
Possibly! But you and other subscribers are not like other people :)
@paulcurry8383 3 ปีที่แล้ว ⁺³
I legit love these though
@hexenkingTV 3 ปีที่แล้ว ⁺⁸
such papers are very complicated and without mathematical deep understanding almost impossible to follow, good explanations are therefore necessary.

ต่อไป

เล่นอัตโนมัติ

Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

Highlight : นายใหญ่ฉุนใคร?

Highlight : นายใหญ่ฉุนใคร?

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

วาทะลูกหนังขอเสนอ"แมนเชสเตอร์ ซิตี้ VS แมนเชสเตอร์ ยูไนเต็ด หลังเกม เรือใบสีฟ้าแพ้ปีศาจแดงคาบ้าน"

เนื้อเรื่องที่ท่านจะโมโหจนน้ำตาไหล | Mouthwashing

เนื้อเรื่องที่ท่านจะโมโหจนน้ำตาไหล | Mouthwashing

Elon Musk’s DOGE Team: 19-Year-Olds Running US government? | Vantage with Palki Sharma | N18G

Elon Musk’s DOGE Team: 19-Year-Olds Running US government? | Vantage with Palki Sharma | N18G

EU leaders brace for Trump tariffs: How will they respond? | DW News

EU leaders brace for Trump tariffs: How will they respond? | DW News

Gradients are Not All You Need (Machine Learning Research Paper Explained)

Gradients are Not All You Need (Machine Learning Research Paper Explained)

The U.S. share in global trade has been declining over the past decade: Rockefeller’s Ruchir Sharma

The U.S. share in global trade has been declining over the past decade: Rockefeller’s Ruchir Sharma

Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)

Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)

China announces retaliatory tariffs on US goods

China announces retaliatory tariffs on US goods

Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

Rectified Flow: The Game-Changing Technique Powering Stable Diffusion 3 (Full Reimplementation!)

Rectified Flow: The Game-Changing Technique Powering Stable Diffusion 3 (Full Reimplementation!)

Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors)

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

마시멜로우 버섯 만들기🍄😂Making marshmallow mushrooms#funny

OHANA บ้าพลัง EP.134 : เกมการ์ดโอฮาน่า X วัยหนุ่ม 2544

OHANA บ้าพลัง EP.134 : เกมการ์ดโอฮาน่า X วัยหนุ่ม 2544

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

ช้างศึกโดนก่อน ไล่ยิงคืนสิงคโปร์ ทะลุน็อคเอาท์

มายคราฟ แต่ ผมห้ามตาย..!!! #minecraft #พี่เก้า #มายคราฟ #minecraftmtr

มายคราฟ แต่ ผมห้ามตาย..!!! #minecraft #พี่เก้า #มายคราฟ #minecraftmtr

ต้าห์อู๋-ออฟโรด ขอฝึกวิชาเซียน จับหมูป่ามือเปล่า | เฮ็ดอย่างเซียนหรั่ง FULL EP.21 | One Playground

ต้าห์อู๋-ออฟโรด ขอฝึกวิชาเซียน จับหมูป่ามือเปล่า | เฮ็ดอย่างเซียนหรั่ง FULL EP.21 | One Playground