56
141 159

Reading a paper + running an experiment: how do LLMs 'connect the dots' in this contrived example?

1:00:34

Min P Sampling: Balancing Creativity and Coherence (paper explanation + code)

15:01

Mixed-Modal Early-Fusion Foundation Models: Paper run-throughs for 'Chameleon' and 'MoMa'

38:47

LLM Steganography: Hiding Messages in Text

7:03

Zotero Cleanup (chatting informally while closing my open papers)

35:24

With the author: Readout Guidance (plus Diffusion Hyperfeatures)

45:43

Paper walkthrough: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper: arxiv.org/abs/2501.04519

มุมมอง: 1 083

วีดีโอ

Reading a paper + running an experiment: how do LLMs 'connect the dots' in this contrived example?

1:00:34

Reading a paper + running an experiment: how do LLMs 'connect the dots' in this contrived example?

มุมมอง 43721 วันที่ผ่านมา

Notebook for those who want to see the code: colab.research.google.com/drive/1RSNOL1kkE2m0x2etgBlLyo49k1lK1IjX?usp=sharing In this video I wanted to show what 'raw curiosity driven research' looks like, as I tinker with a paper I'd been discussing with some friends. I don't know that I do a great job narrating, since my focus is on the code. And I don't do a great job coding since I'm at least ...

Min P Sampling: Balancing Creativity and Coherence (paper explanation + code)

15:01

Min P Sampling: Balancing Creativity and Coherence (paper explanation + code)

มุมมอง 3265 หลายเดือนก่อน

A look at min_p sampling Paper: arxiv.org/abs/2407.01082 Code: gist.github.com/johnowhitaker/2d14cfed0d54c20e3299ce94d52857c4

Mixed-Modal Early-Fusion Foundation Models: Paper run-throughs for 'Chameleon' and 'MoMa'

38:47

Mixed-Modal Early-Fusion Foundation Models: Paper run-throughs for 'Chameleon' and 'MoMa'

มุมมอง 5365 หลายเดือนก่อน

Text-only LLMs are great, and we've seen people pasting on some image support here and there, but the future it seems is multi-modal. What does it take to train models from scratch that take in both images and text (and more)? In this video we look at two key papers from FAIR at Meta, introducing their Chameleon approach and making it more efficient with mixture of experts.

LLM Steganography: Hiding Messages in Text

7:03

LLM Steganography: Hiding Messages in Text

มุมมอง 6307 หลายเดือนก่อน

Code: github.com/johnowhitaker/llm_steganography This idea has been knocking around in my head for years (inspired by some early LLM watermarking papers) but I was finally motivated to do it after seeing this masterpiece: th-cam.com/video/Y65FRxE7uMc/w-d-xo.html Please comment on this video in our secret way ;)

Zotero Cleanup (chatting informally while closing my open papers)

35:24

Zotero Cleanup (chatting informally while closing my open papers)

มุมมอง 2658 หลายเดือนก่อน

I decided to record while I do my semi-regular cleanout of the papers I have open in Zotero. This video is even less educational than usual, let me know if you enjoy this format. Our library if you want to find the papers mentioned easily: www.zotero.org/groups/5004697/llms_ai_answers

With the author: Readout Guidance (plus Diffusion Hyperfeatures)

45:43

With the author: Readout Guidance (plus Diffusion Hyperfeatures)

มุมมอง 5869 หลายเดือนก่อน

In this video, we look at a series of papers I really enjoyed, with Grace Luo - one of the authors of these papers! The theme running through all three is what features diffusion models learn and what we can do with those features. Enjoy :) Main links: Readout Guidance: readout-guidance.github.io/ Diffusion Hyperfeatures: diffusion-hyperfeatures.github.io/ Shape-Guided Diffusion with Inside-Out...

Paper deep dive: Evolutionary Optimization of Model Merging Recipes

40:00

Paper deep dive: Evolutionary Optimization of Model Merging Recipes

มุมมอง 3.7K10 หลายเดือนก่อน

Sakana AI has a great new paper exploring evolutionary approaches to model merging, showing how to find ways of combining existing models into new ones with impressive new skills. In this video, we dive into the paper and along the way spend some time learning about model merging in general, evolutionary algorithms, and more.

4:08:37

Paperathon #1

มุมมอง 1.2Kปีที่แล้ว

An experimental livestream reading through papers. I'll hopefully add chapter headings and a link to the completed notes soon. Paper notes (tidied and expanded from the live notes we made): docs.google.com/document/d/1weSVlVfVUufOesEmMB2_TlmM_I250ug-WIEZu9xEpVw/edit?usp=sharing Topics papers: 01:00 - Intro Plan 06:00 - Orca 2 (and Orca 1) 20:00 - Emu EDIT 32:20 - TULU V2 49:00 - QLoRA 1:01:00 -...

ZipLoRA: Any Subject in Any Style (deep dive and paper explanation)

26:20

ZipLoRA: Any Subject in Any Style (deep dive and paper explanation)

มุมมอง 1.6Kปีที่แล้ว

In this video we talk about merging LoRAs - the difficulties with a naive approach and the benefits of the new "ZipLoRA' technique. Paper: arxiv.org/abs/2311.13600 Colab shown in the video: colab.research.google.com/drive/1lo5OTSzsj1KCwfw8l_dNhfY_UcA8udbg?usp=sharing ziplora-pytorch implementation: github.com/mkshing/ziplora-pytorch/tree/main

Evaluating Diffusion Models with PickScore

14:32

Evaluating Diffusion Models with PickScore

มุมมอง 1Kปีที่แล้ว

Setting the scene for some future videos where I'll explore ways to improve diffusion models through various tricks. Here we learn why evaluating diffusion models is hard, that user preference is the gold standard, and that preference models like PickScore give us an approximation we can work with. Code on github: github.com/johnowhitaker/dm_fun PickScore on GitHub: github.com/yuvalkirstain/Pic...

14:17

How I 'monetized' an AI demo

มุมมอง 920ปีที่แล้ว

See the 'finished product' at hallowhatnow.johnowhitaker.repl.co/ Template for those wanting to try something like this: replit.com/@johnowhitaker/AIAppTemplate?v=1 In this video, I take you through the process I followed to take a generative AI workflow and turn it into a 'product', where users upload a picture and pay to have it transformed into a gallery of themed Halloween costume ideas. It...

32:45

Gaussian Splatting explorations

มุมมอง 27Kปีที่แล้ว

Let's dive into Gaussian Splatting: what is it, how are scenes represented, and what fun things can we do with it? This is a fairly informal and code-heavy video - let me know if you like this format! GS website (with links to paper): My lesson on optimizing things with fun losses such as CLIP: johnowhitaker.github.io/tglcourse/generators_and_losses.html My Twitter, if you want updates on this ...

LLM basics #4 with the LLM Science Exam Kaggle Competition - Retrieval

8:26

LLM basics #4 with the LLM Science Exam Kaggle Competition - Retrieval

มุมมอง 979ปีที่แล้ว

In this (delayed) final video of the series, we take a lightning look at one more useful technique to add to your submission arsenal: document retrieval. Finding the closest matches among a collection of documents is an extremely useful tool for all sorts of LLM applications, and this intro shows how easy it can be to get started. Notebook link: www.kaggle.com/johnowhitaker/embedding-documents-...

15:21

What is Speculative Sampling?

มุมมอง 2.6Kปีที่แล้ว

What is Speculative Sampling?

LLM basics #3 with the LLM Science Exam Kaggle Competition - Training a task-specific model for MCQs

18:15

LLM basics #3 with the LLM Science Exam Kaggle Competition - Training a task-specific model for MCQs

มุมมอง 2.1Kปีที่แล้ว

LLM basics #3 with the LLM Science Exam Kaggle Competition - Training a task-specific model for MCQs

LLM basics #2 with the LLM Science Exam Kaggle Competition - Generating Synthetic Data

17:04

LLM basics #2 with the LLM Science Exam Kaggle Competition - Generating Synthetic Data

มุมมอง 2.7Kปีที่แล้ว

LLM basics #2 with the LLM Science Exam Kaggle Competition - Generating Synthetic Data

LLM basics #1 with the LLM Science Exam Kaggle Competition - Zero-Shot approaches

21:08

LLM basics #1 with the LLM Science Exam Kaggle Competition - Zero-Shot approaches

มุมมอง 4.1Kปีที่แล้ว

LLM basics #1 with the LLM Science Exam Kaggle Competition - Zero-Shot approaches

16:22

Stylizing Video with Diffusion Models

มุมมอง 1.4Kปีที่แล้ว

Stylizing Video with Diffusion Models

InstructPix2Pix Explained - Edit Images with Words!

13:22

InstructPix2Pix Explained - Edit Images with Words!

มุมมอง 3.8Kปีที่แล้ว

InstructPix2Pix Explained - Edit Images with Words!

Stable Diffusion Deep Dive Notebook Run-through

41:09

Stable Diffusion Deep Dive Notebook Run-through

มุมมอง 12K2 ปีที่แล้ว

Stable Diffusion Deep Dive Notebook Run-through

19:58

Podcast E6: Wasim Lorgat

มุมมอง 1692 ปีที่แล้ว

Podcast E6: Wasim Lorgat

Building DistilHN: Using ML to Summarize News Articles

15:50

Building DistilHN: Using ML to Summarize News Articles

มุมมอง 9852 ปีที่แล้ว

Building DistilHN: Using ML to Summarize News Articles

HuggingFace Class, Unit 2 - Fintuning and Guidance (casual notebook walkthough)

46:02

HuggingFace Class, Unit 2 - Fintuning and Guidance (casual notebook walkthough)

มุมมอง 3.3K2 ปีที่แล้ว

HuggingFace Class, Unit 2 - Fintuning and Guidance (casual notebook walkthough)

HuggingFace Diffusion Model Class, Unit 1 (casual notebook walkthough)

50:38

HuggingFace Diffusion Model Class, Unit 1 (casual notebook walkthough)

มุมมอง 12K2 ปีที่แล้ว

HuggingFace Diffusion Model Class, Unit 1 (casual notebook walkthough)

Editing Images with Diffusion Models (lit review / overview of different approaches)

20:03

Editing Images with Diffusion Models (lit review / overview of different approaches)

มุมมอง 6K2 ปีที่แล้ว

Editing Images with Diffusion Models (lit review / overview of different approaches)

1:11:46

TGL Discussion Series: Hamel Husain

มุมมอง 4802 ปีที่แล้ว

TGL Discussion Series: Hamel Husain

29:55

TGL Discussion Series: Jason Antic

มุมมอง 3592 ปีที่แล้ว

TGL Discussion Series: Jason Antic

34:49

TGL Discussion Series: Teodora Szasz

มุมมอง 2962 ปีที่แล้ว

TGL Discussion Series: Teodora Szasz

14:53

TGL Discussions Series: @EnzymeZoo

มุมมอง 2672 ปีที่แล้ว

TGL Discussions Series: @EnzymeZoo

ความคิดเห็น

@markozege 20 วันที่ผ่านมา
This is great, I learned a lot. Thank you very much for sharing this!
@nathank5140 20 วันที่ผ่านมา
Intermediate steps for coding could be each of the commits in turn that leads to an accepted PR. Often there are even multiple PRs in open projects and one is accepted but not another. Usually if this is the case there would be cross links between the issues.
@Kogo180 20 วันที่ผ่านมา
Fresher should try ml?
@rr2961 21 วันที่ผ่านมา
banger as always, thanks for doing those king
@datasciencecastnet 21 วันที่ผ่านมา
Take a shot every time I mix up 'policy', 'process' and 'preference' :D
@shamikbose89 21 วันที่ผ่านมา
Loved seeing the thought process of how you iterate quickly on research ideas. I did wonder what effect it would have if "CITY" was replaced by a dummy word. In that case, it should initially have very low similarity with "Paris", but the trained model should see it converge. Additionally, I wonder if LoRA had a larger effect on the values. What would happen inf you tried LoRA on the smaller models?
@paperboi__ 2 หลายเดือนก่อน
why he moves screen every 3 seconds, there's no need to do this
@Momoanter 3 หลายเดือนก่อน
thank you very informative
@mdbayazid6837 3 หลายเดือนก่อน
GNN model merge please
@useless_deno 4 หลายเดือนก่อน
great explanation!
@EkShunya 5 หลายเดือนก่อน
I love these paper deep dives, can we also do more toy projects here :)
@chickenp7038 5 หลายเดือนก่อน
great video can you cover the other sampling method that yannic already talked about? maybe also with the code implementation and side by side comparisons
@parthwagh3607 5 หลายเดือนก่อน
this is not working on updated stable diffusion webui and forge
@eclecticelf2847 5 หลายเดือนก่อน
You’re my favourite TH-camr!
@banalMinuta 5 หลายเดือนก่อน
thanks man
@thehigheststateofsalad 5 หลายเดือนก่อน
keep doing it
@thehigheststateofsalad 5 หลายเดือนก่อน
Thank you for highlighting the latest top papers
@EkShunya 5 หลายเดือนก่อน
welcome back keep em coming
@banalMinuta 5 หลายเดือนก่อน
thank you for this video. I hope to one day be as capable as you in describing these papers. these are fascinating pieces of technology
@optus231 6 หลายเดือนก่อน
Where is this? "GS website (with links to paper):"
@adityakharbanda3290 6 หลายเดือนก่อน
Could you please explain the loss function in a bit more detail? Thanks
@davidgwyer5169 6 หลายเดือนก่อน
@datasciencecastnet Loving these videos! Is there going to be coverage of units 3 and 4 at any point?
@ParinithaRamesh-qf2ig 7 หลายเดือนก่อน
Can you please share the git repo for all your code, it would be great to follow along and see the results on my end.
@bikashpatra119 7 หลายเดือนก่อน
Thank you for this nice getting started video. Could learn a lot from it. One question, did you write the function json schema or you used any function to generate the schema.
@aintgonhappen 7 หลายเดือนก่อน
the video is laggy :(
@datasciencecastnet 7 หลายเดือนก่อน
THat was an interesting video about steganographic techniques. The first thing I thought of was the movie Sneakers, where they used a cassette tape as the steganographic device. The movie is about 20+
@HaiweiShi 7 หลายเดือนก่อน
Hi, really good video. Could you share the jupyter notebook you showed in this video? It would be so grateful!
@quyet-65cs3buivan8 7 หลายเดือนก่อน
very interesting, thank you! Can you also share the notebook code of your on this video?
@SahlEbrahim 8 หลายเดือนก่อน
isnt open ai api a paid feature?
@alexkelly757 8 หลายเดือนก่อน
Yeah it valueable. I've just added one to my read list. I don't think anyone goes over papers in a quick-fire method as you've just done where similar (also dis-similar) papers are compared at a high level on why you picked them and some institution of how they work.
@hamelhusain7140 8 หลายเดือนก่อน
Yaaay Johno!
@BrianMPrime 8 หลายเดือนก่อน
Nice work haha i don't clean up zotero often enough
@joe33444 8 หลายเดือนก่อน
This is a really interesting idea.. With regards to guidance, do you know if anyone has tried to train models that try and predict those noise deltas you would get from pushing the gradient backwards, but doing it in a forward direction? For example, you could train a model to take the features from the "up" layers in Stable Diffusion, and then predict a secondary noise delta to try and correct the regular Stable Diffusions noise in the right direction. Like estimate what the delta of the back propagation noise would need to be. I'm not sure if that would actually save in compute power during inference, compared to having to do a backwards pass and holding the whole graph while you generate images, because I assume that model would need to be reasonable in size. But it might also allow larger steps in prediction than you might get from a single gradient backwards pass. And it would remove the need for an internal RGB image stage at all, because you would only need to use that model during training.. Although it would likely break the awesome part of this method that it requires very few samples to get good results, at the cost of shifting work to the inference stage.
@neelshah1651 9 หลายเดือนก่อน
Thank you very much for such a great work and explanation. Hatt's off
@Ketan-somewhere 9 หลายเดือนก่อน
thanks a lot
@manindermaan3695 9 หลายเดือนก่อน
hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help
@abse-mj8pw 10 หลายเดือนก่อน
very great introduction! I can see a lot of efforts have been put into this video! It helps a lot to understand the paper! thank you for sharing!!
@abse-mj8pw 10 หลายเดือนก่อน
However I have one small question about the overfit part at the end of this video. Is it about that the test set translated into Japanese might be learned or finetuned by the math 7B llm?
@jonatan01i 10 หลายเดือนก่อน
Oh my god, man you don't understand how happy I am for your storytelling about how things went in the timeline of developing on the idea of model merging up to this point, where it started how it went, that and how they were thinking about reasons why it works, etc.etc.. I want to get into this so that I understand the main ideas and be able to start working on these as well, but it's so hard to get to the root of things, it requires a huge amount of time to read and digest everything and slowly being able to put the pieces together, so boy do I mean it when I say thank you!
@yuvish00 10 หลายเดือนก่อน
Can you share the code please?
@UmerHA 10 หลายเดือนก่อน
Hi Johno, at the beginning you said you're somewhat skeptical of model merging. Iiuc, your criticism is only about iterative merging for a given goal, which leads to overfitting. Or are you skeptical of the general concept of model merging? Thanks!
@gedankenthesis 10 หลายเดือนก่อน
That was an excellent overview of not just Sakana's evolutionary methods to identify good merge candidates, but also the popular techniques TIES, DARE and Passthrough/Frankenmerge. Appreciate it as usual, Johno!
@rogerc7960 10 หลายเดือนก่อน
Deep dive: model merging m.th-cam.com/video/cvOpX75Kz4M/w-d-xo.html
@TuanKhai298 10 หลายเดือนก่อน
very helpful!
@Ali-wf9ef 10 หลายเดือนก่อน
It would've been cool if you could visualize which point in the scene you are showing the spherical harmonics for
@thehigheststateofsalad 11 หลายเดือนก่อน
Thanks for hosting this
@seanriley3121 11 หลายเดือนก่อน
when approaching a manifold, what would happen if the approach was aligned the norm of the manifold surface?
@specyfickRC 11 หลายเดือนก่อน
Can you share this code you have made for this video ?
@thecheekychinaman6713 11 หลายเดือนก่อน
Practical and useful, thank you!
@thecheekychinaman6713 11 หลายเดือนก่อน
Subscribed, rare to see someone tackle a new category of competition on Kaggle, nice work
@EkShunya 11 หลายเดือนก่อน
enjoying it :)

DataScienceCastnet

ความคิดเห็น