- 56
- 141 159
DataScienceCastnet
United States
เข้าร่วมเมื่อ 18 ส.ค. 2020
Paper walkthrough: rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper: arxiv.org/abs/2501.04519
มุมมอง: 1 083
วีดีโอ
Reading a paper + running an experiment: how do LLMs 'connect the dots' in this contrived example?
มุมมอง 43721 วันที่ผ่านมา
Notebook for those who want to see the code: colab.research.google.com/drive/1RSNOL1kkE2m0x2etgBlLyo49k1lK1IjX?usp=sharing In this video I wanted to show what 'raw curiosity driven research' looks like, as I tinker with a paper I'd been discussing with some friends. I don't know that I do a great job narrating, since my focus is on the code. And I don't do a great job coding since I'm at least ...
Min P Sampling: Balancing Creativity and Coherence (paper explanation + code)
มุมมอง 3265 หลายเดือนก่อน
A look at min_p sampling Paper: arxiv.org/abs/2407.01082 Code: gist.github.com/johnowhitaker/2d14cfed0d54c20e3299ce94d52857c4
Mixed-Modal Early-Fusion Foundation Models: Paper run-throughs for 'Chameleon' and 'MoMa'
มุมมอง 5365 หลายเดือนก่อน
Text-only LLMs are great, and we've seen people pasting on some image support here and there, but the future it seems is multi-modal. What does it take to train models from scratch that take in both images and text (and more)? In this video we look at two key papers from FAIR at Meta, introducing their Chameleon approach and making it more efficient with mixture of experts.
LLM Steganography: Hiding Messages in Text
มุมมอง 6307 หลายเดือนก่อน
Code: github.com/johnowhitaker/llm_steganography This idea has been knocking around in my head for years (inspired by some early LLM watermarking papers) but I was finally motivated to do it after seeing this masterpiece: th-cam.com/video/Y65FRxE7uMc/w-d-xo.html Please comment on this video in our secret way ;)
Zotero Cleanup (chatting informally while closing my open papers)
มุมมอง 2658 หลายเดือนก่อน
I decided to record while I do my semi-regular cleanout of the papers I have open in Zotero. This video is even less educational than usual, let me know if you enjoy this format. Our library if you want to find the papers mentioned easily: www.zotero.org/groups/5004697/llms_ai_answers
With the author: Readout Guidance (plus Diffusion Hyperfeatures)
มุมมอง 5869 หลายเดือนก่อน
In this video, we look at a series of papers I really enjoyed, with Grace Luo - one of the authors of these papers! The theme running through all three is what features diffusion models learn and what we can do with those features. Enjoy :) Main links: Readout Guidance: readout-guidance.github.io/ Diffusion Hyperfeatures: diffusion-hyperfeatures.github.io/ Shape-Guided Diffusion with Inside-Out...
Paper deep dive: Evolutionary Optimization of Model Merging Recipes
มุมมอง 3.7K10 หลายเดือนก่อน
Sakana AI has a great new paper exploring evolutionary approaches to model merging, showing how to find ways of combining existing models into new ones with impressive new skills. In this video, we dive into the paper and along the way spend some time learning about model merging in general, evolutionary algorithms, and more.
Paperathon #1
มุมมอง 1.2Kปีที่แล้ว
An experimental livestream reading through papers. I'll hopefully add chapter headings and a link to the completed notes soon. Paper notes (tidied and expanded from the live notes we made): docs.google.com/document/d/1weSVlVfVUufOesEmMB2_TlmM_I250ug-WIEZu9xEpVw/edit?usp=sharing Topics papers: 01:00 - Intro Plan 06:00 - Orca 2 (and Orca 1) 20:00 - Emu EDIT 32:20 - TULU V2 49:00 - QLoRA 1:01:00 -...
ZipLoRA: Any Subject in Any Style (deep dive and paper explanation)
มุมมอง 1.6Kปีที่แล้ว
In this video we talk about merging LoRAs - the difficulties with a naive approach and the benefits of the new "ZipLoRA' technique. Paper: arxiv.org/abs/2311.13600 Colab shown in the video: colab.research.google.com/drive/1lo5OTSzsj1KCwfw8l_dNhfY_UcA8udbg?usp=sharing ziplora-pytorch implementation: github.com/mkshing/ziplora-pytorch/tree/main
Evaluating Diffusion Models with PickScore
มุมมอง 1Kปีที่แล้ว
Setting the scene for some future videos where I'll explore ways to improve diffusion models through various tricks. Here we learn why evaluating diffusion models is hard, that user preference is the gold standard, and that preference models like PickScore give us an approximation we can work with. Code on github: github.com/johnowhitaker/dm_fun PickScore on GitHub: github.com/yuvalkirstain/Pic...
How I 'monetized' an AI demo
มุมมอง 920ปีที่แล้ว
See the 'finished product' at hallowhatnow.johnowhitaker.repl.co/ Template for those wanting to try something like this: replit.com/@johnowhitaker/AIAppTemplate?v=1 In this video, I take you through the process I followed to take a generative AI workflow and turn it into a 'product', where users upload a picture and pay to have it transformed into a gallery of themed Halloween costume ideas. It...
Gaussian Splatting explorations
มุมมอง 27Kปีที่แล้ว
Let's dive into Gaussian Splatting: what is it, how are scenes represented, and what fun things can we do with it? This is a fairly informal and code-heavy video - let me know if you like this format! GS website (with links to paper): My lesson on optimizing things with fun losses such as CLIP: johnowhitaker.github.io/tglcourse/generators_and_losses.html My Twitter, if you want updates on this ...
LLM basics #4 with the LLM Science Exam Kaggle Competition - Retrieval
มุมมอง 979ปีที่แล้ว
In this (delayed) final video of the series, we take a lightning look at one more useful technique to add to your submission arsenal: document retrieval. Finding the closest matches among a collection of documents is an extremely useful tool for all sorts of LLM applications, and this intro shows how easy it can be to get started. Notebook link: www.kaggle.com/johnowhitaker/embedding-documents-...
LLM basics #3 with the LLM Science Exam Kaggle Competition - Training a task-specific model for MCQs
มุมมอง 2.1Kปีที่แล้ว
LLM basics #3 with the LLM Science Exam Kaggle Competition - Training a task-specific model for MCQs
LLM basics #2 with the LLM Science Exam Kaggle Competition - Generating Synthetic Data
มุมมอง 2.7Kปีที่แล้ว
LLM basics #2 with the LLM Science Exam Kaggle Competition - Generating Synthetic Data
LLM basics #1 with the LLM Science Exam Kaggle Competition - Zero-Shot approaches
มุมมอง 4.1Kปีที่แล้ว
LLM basics #1 with the LLM Science Exam Kaggle Competition - Zero-Shot approaches
InstructPix2Pix Explained - Edit Images with Words!
มุมมอง 3.8Kปีที่แล้ว
InstructPix2Pix Explained - Edit Images with Words!
Stable Diffusion Deep Dive Notebook Run-through
มุมมอง 12K2 ปีที่แล้ว
Stable Diffusion Deep Dive Notebook Run-through
Building DistilHN: Using ML to Summarize News Articles
มุมมอง 9852 ปีที่แล้ว
Building DistilHN: Using ML to Summarize News Articles
HuggingFace Class, Unit 2 - Fintuning and Guidance (casual notebook walkthough)
มุมมอง 3.3K2 ปีที่แล้ว
HuggingFace Class, Unit 2 - Fintuning and Guidance (casual notebook walkthough)
HuggingFace Diffusion Model Class, Unit 1 (casual notebook walkthough)
มุมมอง 12K2 ปีที่แล้ว
HuggingFace Diffusion Model Class, Unit 1 (casual notebook walkthough)
Editing Images with Diffusion Models (lit review / overview of different approaches)
มุมมอง 6K2 ปีที่แล้ว
Editing Images with Diffusion Models (lit review / overview of different approaches)
This is great, I learned a lot. Thank you very much for sharing this!
Intermediate steps for coding could be each of the commits in turn that leads to an accepted PR. Often there are even multiple PRs in open projects and one is accepted but not another. Usually if this is the case there would be cross links between the issues.
Fresher should try ml?
banger as always, thanks for doing those king
Take a shot every time I mix up 'policy', 'process' and 'preference' :D
Loved seeing the thought process of how you iterate quickly on research ideas. I did wonder what effect it would have if "CITY" was replaced by a dummy word. In that case, it should initially have very low similarity with "Paris", but the trained model should see it converge. Additionally, I wonder if LoRA had a larger effect on the values. What would happen inf you tried LoRA on the smaller models?
why he moves screen every 3 seconds, there's no need to do this
thank you very informative
GNN model merge please
great explanation!
I love these paper deep dives, can we also do more toy projects here :)
great video can you cover the other sampling method that yannic already talked about? maybe also with the code implementation and side by side comparisons
this is not working on updated stable diffusion webui and forge
You’re my favourite TH-camr!
thanks man
keep doing it
Thank you for highlighting the latest top papers
welcome back keep em coming
thank you for this video. I hope to one day be as capable as you in describing these papers. these are fascinating pieces of technology
Where is this? "GS website (with links to paper):"
Could you please explain the loss function in a bit more detail? Thanks
@datasciencecastnet Loving these videos! Is there going to be coverage of units 3 and 4 at any point?
Can you please share the git repo for all your code, it would be great to follow along and see the results on my end.
Thank you for this nice getting started video. Could learn a lot from it. One question, did you write the function json schema or you used any function to generate the schema.
the video is laggy :(
THat was an interesting video about steganographic techniques. The first thing I thought of was the movie Sneakers, where they used a cassette tape as the steganographic device. The movie is about 20+
Hi, really good video. Could you share the jupyter notebook you showed in this video? It would be so grateful!
very interesting, thank you! Can you also share the notebook code of your on this video?
isnt open ai api a paid feature?
Yeah it valueable. I've just added one to my read list. I don't think anyone goes over papers in a quick-fire method as you've just done where similar (also dis-similar) papers are compared at a high level on why you picked them and some institution of how they work.
Yaaay Johno!
Nice work haha i don't clean up zotero often enough
This is a really interesting idea.. With regards to guidance, do you know if anyone has tried to train models that try and predict those noise deltas you would get from pushing the gradient backwards, but doing it in a forward direction? For example, you could train a model to take the features from the "up" layers in Stable Diffusion, and then predict a secondary noise delta to try and correct the regular Stable Diffusions noise in the right direction. Like estimate what the delta of the back propagation noise would need to be. I'm not sure if that would actually save in compute power during inference, compared to having to do a backwards pass and holding the whole graph while you generate images, because I assume that model would need to be reasonable in size. But it might also allow larger steps in prediction than you might get from a single gradient backwards pass. And it would remove the need for an internal RGB image stage at all, because you would only need to use that model during training.. Although it would likely break the awesome part of this method that it requires very few samples to get good results, at the cost of shifting work to the inference stage.
Thank you very much for such a great work and explanation. Hatt's off
thanks a lot
hello i have a confusion regarding one topic in diffusions, can i get your contact info Jonathan, like your mail ID or any other contact info please, i am working on my last year project anything would help
very great introduction! I can see a lot of efforts have been put into this video! It helps a lot to understand the paper! thank you for sharing!!
However I have one small question about the overfit part at the end of this video. Is it about that the test set translated into Japanese might be learned or finetuned by the math 7B llm?
Oh my god, man you don't understand how happy I am for your storytelling about how things went in the timeline of developing on the idea of model merging up to this point, where it started how it went, that and how they were thinking about reasons why it works, etc.etc.. I want to get into this so that I understand the main ideas and be able to start working on these as well, but it's so hard to get to the root of things, it requires a huge amount of time to read and digest everything and slowly being able to put the pieces together, so boy do I mean it when I say thank you!
Can you share the code please?
Hi Johno, at the beginning you said you're somewhat skeptical of model merging. Iiuc, your criticism is only about iterative merging for a given goal, which leads to overfitting. Or are you skeptical of the general concept of model merging? Thanks!
That was an excellent overview of not just Sakana's evolutionary methods to identify good merge candidates, but also the popular techniques TIES, DARE and Passthrough/Frankenmerge. Appreciate it as usual, Johno!
Deep dive: model merging m.th-cam.com/video/cvOpX75Kz4M/w-d-xo.html
very helpful!
It would've been cool if you could visualize which point in the scene you are showing the spherical harmonics for
Thanks for hosting this
when approaching a manifold, what would happen if the approach was aligned the norm of the manifold surface?
Can you share this code you have made for this video ?
Practical and useful, thank you!
Subscribed, rare to see someone tackle a new category of competition on Kaggle, nice work
enjoying it :)