ML Interpretability: feature visualization, adversarial example, interp. for language models
ฝัง
- เผยแพร่เมื่อ 20 ก.ย. 2024
- In this video, I will be introducing Machine Learning Interpretability, a vast topic that aims at understanding the inner mechanisms of how machine learning models make their predictions, with the aim of debugging them, making them more transparent and trustworthy.
I will start by reviewing deep learning and the back-propagation algorithm, which are necessary for understanding adversarial example generation and feature visualization for computer vision classification models. In the second part, I will show how we can leverage the knowledge built in the first part of the video and apply it to language models. In particular, we will see how we can get insights on the bias of a language model by generating a prompt that maximizes the likelihood of the next token being a certain concept of our choice. This allows us to answer questions like:
"What does my language model think of women?"
"What does my language model think of minorities?"
This video has been built in collaboration with Leap Labs - an AI research lab that deals with machine learning interpretability and built the Leap Labs Interpretability Engine, which allows to get insights on how computer vision models work and how to improve them by generating prototypes, isolating features and understanding entanglement between classes.
Leap Labs: www.leap-labs....
Leap Labs Tutorials: docs.leap-labs...
As usual, the code and PDF slides are available at the following links:
- PDF slides: github.com/hkp...
- Adversarial Example Generation (tricking a classifier): github.com/hkp...
- Generate inputs for language models: github.com/jes...
You are one incredibly underrated youtuber
Agreed
yupppp, just found him few days ago. definitel underated.
And one hell of a teacher
Can't understand why this channel is free! Thanks a lot for all the content, keep it flowing.
I am very thankful for your qualitative content! 😊
One of the best videos on youtube. Please do IJEPA next. And keep on publishing videos and code.
andrej karpathy liked a tweet were some dude said your video on difussion models was incredibly underated, you are going to make it far!
You are a great teacher.
Bruh, I was just looking for this topic & got the notification of this video. Thanks dude
Keep up the amazing work!
Thanks, could you talk about flash attention?
As always, I salute you for this awesome video, keep up the good work 👍
Always interesting topics. Thank you so much
Hello! I was wondering if the blogger might be interested in Microsoft's recently released Graph RAG algorithm. I'm hoping you could do a video explaining it; your explanations are always so excellent!
博主翻译错了😁我会考虑的
Amazing Video! Could you also include a traning script for the Video you made about the transformer model for general LLM task. As the earlier one was about translation only.
Thank you for this video!
How do you stay up to date on Data science research papers?
Amazing , interesting topic.
any plan for 'Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models' video? It'd be awesome with your explanation on the math ~.~
I thought for a moment that Satya is coming for explain ;)
🤓🤓🤓
great videos!
Thanks a lot for this! A request to add the training script to the Stable Diffusion repository, it would be of great help!! Thank you!
thank you for this!
God bless you, you are amazing
I tuoi video sono fantastici, benedizioni dal Sud Africa 🙌
Thank you very much for your support!
can you make tutorial video on model like Perplexity that use website live search
Please sir make a complete course for LLM engineering 😊
cool tutorial❤
hello sir can you please make a tutorial on pytorch to fellow along with your pytorch projects. Thank you in advance
Good vid boss
Does leap labs provide open-source libraries?
You can play with the LLM interpretability notebook, which is open source. Link in the description
It's Kind of architecture similar to Stable diffusion. Stable diffusion Generate the image from text. I am not saying exactly same, But kind of similar. Both generate image or features from noise.
Thanks
When the new video is coming?
Working on it ;)
@@umarjamilai Very much excited❤️
谢谢!