- 27
- 268 008
mildlyoverfitted
Switzerland
เข้าร่วมเมื่อ 31 ส.ค. 2020
Creating educational content with a focus on Machine Learning, Deep Learning and Python.
If you have any video suggestions or you just wanna chat feel free to join the discord server: discord.gg/a8Va9tZsG5
If you have any video suggestions or you just wanna chat feel free to join the discord server: discord.gg/a8Va9tZsG5
BentoML SageMaker deployment
In this video, we are going to discuss the basics of BentoML and then go through a hands-on example of taking a Scikit-learn model and deploying it on SageMaker with the help of BentoML.
The code + sketches from the video can be found here: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml
00:00 Intro
00:52 [diagram] Ideas behind BentoML
03:07 [diagram] Step by step procedure
03:21 [code] Creating a model
06:50 [code] Creating a bento - service.py
14:31 [code] Creating a bento - bentofile.yaml
16:53 [code] bentoctl init
19:34 [code] Inspecting terraform files
21:10 [code] Containerization + pushing to ECR
23:15 [code] Deployment via terraform
25:13 [code] Sending request and running inference
27:41 [code] Destroying resources
29:05 Outro
The code + sketches from the video can be found here: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml
00:00 Intro
00:52 [diagram] Ideas behind BentoML
03:07 [diagram] Step by step procedure
03:21 [code] Creating a model
06:50 [code] Creating a bento - service.py
14:31 [code] Creating a bento - bentofile.yaml
16:53 [code] bentoctl init
19:34 [code] Inspecting terraform files
21:10 [code] Containerization + pushing to ECR
23:15 [code] Deployment via terraform
25:13 [code] Sending request and running inference
27:41 [code] Destroying resources
29:05 Outro
มุมมอง: 1 459
วีดีโอ
Retrieval augmented generation with OpenSearch and reranking
มุมมอง 4.9Kปีที่แล้ว
In this video, we are going to be using OpenSearch and Cohere's Reranker endpoint to implement a minimal Retrieval augmented generation system that is able to perform question answering. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/rag-rerank/mini_tutorials/rag_with_reranking Cohere blogpost: txt.cohere.com/rerank/ 00:00 Intro 00:52 RAG with embeddings (semantic search) 03:16 ...
Named entity recognition (NER) model evaluation
มุมมอง 2.9Kปีที่แล้ว
In this video we are going to talk about different ways how one can evaluate an NER (named entity recognition) model. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation github.com/chakki-works/seqeval 00:00 Intro 00:31 Mispredictions 02:31 IOB2 notation 04:03 Evaluation approaches 07:38 [code] HF evaluate seqeval 14:36 [code] Enitity-level fro...
Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)
มุมมอง 2.8Kปีที่แล้ว
Today we are going to talk about how to use HTTPX to send requests asynchronously and also, we will talk about how to perform rate limiting. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/httpx_rate_limiting/ 00:00 Intro 01:15 [Code] Implement async requests WITHOUT rate limiting 07:20 [Code] Trying it out 08:48 [Code] Implement async requests WITH rate lim...
Few-shot text classification with prompts
มุมมอง 4Kปีที่แล้ว
In this video, I will talk about a possible way how to perform few-shot text classification using prompt engineering and the OpenAI API. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification Inspiration for the video: github.com/explosion/prodigy-openai-recipes/tree/main Chat Completion API from OpenAI: platform.openai.com/docs/guides/g...
OpenAI function calling
มุมมอง 2.9Kปีที่แล้ว
In this video we will go through the new feature "Function calling" of the OpenAI API (see more info here: openai.com/blog/function-calling-and-other-api-updates). First, I talk about the concepts and then I code up a small example where we implement a "financial analyst" bot. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/openai_function_calling/example.py...
Deploying machine learning models on Kubernetes
มุมมอง 20Kปีที่แล้ว
In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials...
Haiku basics (neural network library from DeepMind)
มุมมอง 3.5K2 ปีที่แล้ว
In this video, we will go through basic concepts of Haiku which is a deep learning library created by DeepMind. Official repo: github.com/deepmind/dm-haiku Official docs: dm-haiku.readthedocs.io/en/latest/ Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics Chapters: 00:00 Intro 00:35 Cloning the repo setting things up 01:52 Parameters: hk.transform...
Product quantization in Faiss and from scratch
มุมมอง 7K2 ปีที่แล้ว
In this video, we talk about a vector compression technique called Product quantization. We first explain conceptually, what the main ideas are and then show how one can use an existing implementation of it from Faiss (IndexPQ). Finally, we also implement the algorithm from scratch. Last but not least, we run some experiments and compare different methods. Paper: lear.inrialpes.fr/pubs/2011/JDS...
GPT in PyTorch
มุมมอง 11K2 ปีที่แล้ว
In this video, we are going to implement the GPT2 model from scratch. We are only going to focus on the inference and not on the training logic. We will cover concepts like self attention, decoder blocks and generating new tokens. Paper: openai.com/blog/better-language-models/ Code minGPT: github.com/karpathy/minGPT Code transformers: github.com/huggingface/transformers/blob/0f69b924fbda6a442d7...
The Lottery Ticket Hypothesis and pruning in PyTorch
มุมมอง 9K3 ปีที่แล้ว
In this video, we are going to explain how one can do pruning in PyTorch. We will then use this knowledge to implement a paper called "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". The paper states that feedforward neural networks have subnetworks (winning tickets) inside of them that perform as good as (or even better than) the original network. It also proposes a ...
The Sensory Neuron as a Transformer in PyTorch
มุมมอง 3K3 ปีที่แล้ว
In this video, we implement a paper called "The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning" in PyTorch. It proposes a permutation invariant module called the Attention Neuron. Its goal is to independently process local information from the features and then combine the local knowledge into a global picture. Paper: arxiv.org/abs/2109.02869 O...
Integer embeddings in PyTorch
มุมมอง 2.4K3 ปีที่แล้ว
In this video, we implement a paper called "Learning Mathematical Properties of Integers". Most notably, we use an LSTM network and an Encyclopedia of integer sequences to train custom integer embeddings. At the same time, we also extract integer sequences from already pretrained models - BERT and GloVe. We then compare how good these embeddings are at encoding mathematical properties of intege...
PonderNet in PyTorch
มุมมอง 2.3K3 ปีที่แล้ว
In this video, we implement the PonderNet that was proposed in the paper "PonderNet: Learning to Ponder". It is a network that dynamically decides on the size of its forward pass. We are going to implement it and experiment with it a little bit on the so called ParityDataset. Note that the implementation is based on the labml.ai implementaiotn (see link below). I made some modification though s...
Mixup in PyTorch
มุมมอง 3.4K3 ปีที่แล้ว
In this video, we implement the (input) mixup and manifold mixup. They are regularization techniques proposed in the papers "mixup: Beyond Empirical Risk Minimization" and "Manifold Mixup: Better Representations by Interpolating Hidden States". We investigate how these two schemes compare against more mainstream regularization methods like dropout and weight decay. Paper (Input mixup): arxiv.or...
Differentiable augmentation for GANs (using Kornia)
มุมมอง 2.6K3 ปีที่แล้ว
Differentiable augmentation for GANs (using Kornia)
Growing neural cellular automata in PyTorch
มุมมอง 4.9K3 ปีที่แล้ว
Growing neural cellular automata in PyTorch
torch.nn.Embedding explained (+ Character-level language model)
มุมมอง 36K3 ปีที่แล้ว
torch.nn.Embedding explained ( Character-level language model)
Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients)
มุมมอง 9K3 ปีที่แล้ว
Gradient with respect to input in PyTorch (FGSM attack Integrated Gradients)
NumPy equality testing: multiple ways to compare arrays
มุมมอง 1.9K3 ปีที่แล้ว
NumPy equality testing: multiple ways to compare arrays
Mocking neural networks: unit testing in deep learning
มุมมอง 2.4K3 ปีที่แล้ว
Mocking neural networks: unit testing in deep learning
Visualizing activations with forward hooks (PyTorch)
มุมมอง 15K3 ปีที่แล้ว
Visualizing activations with forward hooks (PyTorch)
hey burh You're so underrated, Happy that I got your video recommended in my feed.
How can we perform pretrain self-supervised model training using DINOv2 with our own dataset?
permission to learn sir. thank you
Cheers mate!
This is nice until it turns out you don't explain each line of the code, at least briefly 😢
Can we use this code for Change detection in two satellite images
What font, and color theme are you using? Looks really nice!
excellent!! I'm curious why my search always shows garbage and videos like this never come up. This was suggested by Gemini when I asked a question about ML model deployment.
the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.
Thank you for sharing this, I was actually looking for results of DINO on smaller compute/data so this is so helpful
its printing Original prediction: 293 how can I check the values or names of this predicted class
I am using custom tags, such as InvoiceNumber and GrossTotal. To work on entity level, does seqeval need tags in the format B- and I-?
Hello authors, thank you for your video. It helped me a lot. However, I have one question about your code. In the original mixup, which is from the link you provided, the author mixed the loss function instead of mixing the label. But I noticed you mixed the label. Could you please explain the reason for this difference in operation? Looking forward to your reply
Really helpful for foundation on ml ops
Glad to hear that!
Is this method good if we want to search for list of products rather than chat-liked response?
Sure:) If you have text descriptions of the products then Elasticsearch/Opensearch + reranking is definitely a great option:)
You are incredible man. -You go at a good pace. -Each project feels well planed. -Nice formating style. -Good explanation. Ive just started really digging into this machine learning space, any recommendation on learning on all the different layer types, and problem types?
Thanks a ton! ML has changed quite a lot over the past few years. I guess one architecture you should be familiar with nowadays is the transformer:) But I guess you have heard about it by now:D Good luck with your learning!
Great example. Thanks for the information
My pleasure!
hi man, do you offer some training or mentorship?
What to do if you want the encoding make by OpenSearch directly?
I concur with what everyone is saying - best video on function calling for sure. I really like the laid back nature of the tutorial - seriously simplifying function calling - even to the uninitiated! Only one suggestion: Please move inset video to top right so output can be seen in its entirety. Obviously not for this video, but for future awesome videos you produce.
Glad it was helpful! And thank you for the constructive feedback:)
This is the best video on this topic. Thank you!
Appreciate your comment!
what's the font you use?
Note sure. I am using this vim theme: github.com/morhetz/gruvbox so maybe you can find it somewhere in their repo.
is there any way of re SSL a pretrained DINO?
Thank you so much for helping me to understand ViT!! Great work
Happy to help!
Great video ! Good explanation. Thanks for all your efforts in making detailed video along with code !
You are welcome!
wow, this is dangerous xd
Hehe
Where did you include positional encoding ? or its not needed when using convolutions for patching and embedding ?
great video as a student, thank you so much! i will say a few lines didn't feel very well explained, however im sure to someone with a bit more knowledge than I it would be clearer but overall 10/10 tysm
Great point actually:) Appreciate your feedback:)
I'm a huge fan of implementing algorithms from scratch by myself and watched this video with a great pleasure. Thanks for your work, it deserves more attention.
Thank you for the message!
great video, can i run the code in a mac with M1 chip as it is?
Thanks! Yes, you can:)
Name of the font?
So the theme I am using is here: github.com/morhetz/gruvbox . The README talks about the fonts I believe.
Doing ML in vim is absolutely gigachad
Hahaha:D
Amazing video. Just curious, what keyboard are you using?
Glad you enjoyed it! Logitech MX Keys S
"mildly overfitted" is how I like to keep my underwear so I don't get the hyena.
Haha:) Made me laugh:D
really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?
Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.
@@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?
@@davidpratr interesting:) yes, it is tmux:)
When starting out, would you recommend just using embedding and vectorsearch or should you also consider the hybrid case of opensearch & vectorsearch? In the video it looks like you should go all in on vectorsearch
I would recommend just doing Opensearch + reranking. No embeddings (=vector search). Assuming you wanna have something minimal really quickly as demonstrated in the video:)
isn't this concurrency limit not rate limit? i.e limit per second
I think you are right:) The video title is definitely misleading. Sorry about that!
novelty explained in just over 6 minutes. 🙇
Hope you liked it:)
hi, im getting this error: ""'sagemaker_service:svc' is not found in BentoML store <osfs '/home/bentoml/bentos'>, you may need to run `bentoml models pull` first'."" any idea ? Thnks a lot
Hmmm, if the problem still persists you can create an issue here: github.com/jankrepl/mildlyoverfitted/issues Describing exactly what you did and I can try to help!
@@mildlyoverfitted solved, i did It. The problem come with bentoml versión, i had install bentoml==1.1.11 this solve the problema for me
Thank you so much, your example helped me to solve some problems :)
Happy to help!
why is the shape of the mlp input at 2nd dim n_patches +1, isnt the mlp just applied to the class token?
So the `MLP` module is used inside of the Transformer block and and it inputs a 3D tensor. See this link for the only place where the CLS is explicitly extracted github.com/jankrepl/mildlyoverfitted/blob/22f0ecc67cef14267ee91ff2e4df6bf9f6d65bc2/github_adventures/vision_transformer/custom.py#L423-L424 Hope that helps:)
thanks, yeah confused the mlp inside the block with the mlp at the end for classification@@mildlyoverfitted
fantastic video, just a quick note: at 16:01 you say that "none of the operations are changing the shape of the tensor", but isnt this wrong, since when applying fc2, the last dim should be out_features, not hidden_features, so the shapes are also wrongly commented.
Nice find and sorry for the mistake:)! Somebody already pointed it out a while ago:) Look at the pinned errata comment:)
ah i see, my bad :D @@mildlyoverfitted
Which frameworks would you recommend if you had to scale to +1000 models? I am looking at custom FastAPI and MLFlow with AWS Lambda, but where each inference request will load the model from object storage and call .predict. The models are generally lightweight and predictions would only have to be made on an hourly basis, so I don't think its necessary to serve them in memory.
If you are not experiencing a cold start (or you don't care) then Lambda is definitely a great solution:)
Thanks for the nice video explanation! Could you please tell me what modifications I can make to get the output in a certain format? Say I want it to output only the label value with no other text?
Thank you! The current template should lead to you only getting the label. However, feel free to prompt engineer it if you are not getting the expected result. You can also request it to give you a valid JSON which you can then easily parse:) Just an idea. Hope that helps:)
@@mildlyoverfitted thanks, it really helped me a lot. I achieved perfect results by restricting my response token limit. So it focusses on outputting the digit label (in flexible forms), from which i can extract it using simple regex. The JSON method seems v clean too.
Thank you for the video! I have a question: If I need to make updates to an existing service, do I have to go through the entire process again, or is there a more efficient way? Bentoctl build seems quite time-consuming. Appreciate your help!"
Appreciate your comment! If the change is inside of your ML model or the serving logic (service.py) you will have to rebuild the image. However, the second time around some layers should be cached (docs.docker.com/build/guide/layers/ ) so in theory it should be faster (it depends though). Another thing you can do is to build the image in some virtual machine rather than locally. A common setup is that you build it + upload to ECR in your CI (e.g. GitHub actions) Just some ideas:)
Is there a method you can use to rate limit by time? Im interacting with an API that limits me to no more than 20 requests a minute, and i've been struggling with a way to handle that. Right now I keep track of the time of the last call, and if I made a request within the last 3 seconds I wait till 3 seconds, then send out the next request. I have multiple API keys I can utilize, and each key has a set limit, so I cycle through them, but it feels like there must be a faster way.
One alternative solution is to use some open source package (e.g. github.com/florimondmanca/aiometer ). I don't really know much about it but maybe it can help:)
what keyboard are you using?
Logitech MX Keys :)
great video thanks a lot really liked the explanation !!!.
Glad it was helpful!
Great video. How to run a text generation model? I tried running a GPT2 model with the below code Creating API : transformers-cli serve --task=text-generation --model=gpt2 Calling API: curl -X POST localhost:8888/forward -H "accept: application/json" -H "Content-Type: application/json" -d '{"inputs":"What is Deep Learning","parameters":{"max_new_tokens":20}}' But getting error in the response {"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}
terminal and theme name please
tmux + gruvbox