mildlyoverfitted
mildlyoverfitted
  • 27
  • 268 008
BentoML SageMaker deployment
In this video, we are going to discuss the basics of BentoML and then go through a hands-on example of taking a Scikit-learn model and deploying it on SageMaker with the help of BentoML.
The code + sketches from the video can be found here: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml
00:00 Intro
00:52 [diagram] Ideas behind BentoML
03:07 [diagram] Step by step procedure
03:21 [code] Creating a model
06:50 [code] Creating a bento - service.py
14:31 [code] Creating a bento - bentofile.yaml
16:53 [code] bentoctl init
19:34 [code] Inspecting terraform files
21:10 [code] Containerization + pushing to ECR
23:15 [code] Deployment via terraform
25:13 [code] Sending request and running inference
27:41 [code] Destroying resources
29:05 Outro
มุมมอง: 1 459

วีดีโอ

Retrieval augmented generation with OpenSearch and reranking
มุมมอง 4.9Kปีที่แล้ว
In this video, we are going to be using OpenSearch and Cohere's Reranker endpoint to implement a minimal Retrieval augmented generation system that is able to perform question answering. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/rag-rerank/mini_tutorials/rag_with_reranking Cohere blogpost: txt.cohere.com/rerank/ 00:00 Intro 00:52 RAG with embeddings (semantic search) 03:16 ...
Named entity recognition (NER) model evaluation
มุมมอง 2.9Kปีที่แล้ว
In this video we are going to talk about different ways how one can evaluate an NER (named entity recognition) model. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation github.com/chakki-works/seqeval 00:00 Intro 00:31 Mispredictions 02:31 IOB2 notation 04:03 Evaluation approaches 07:38 [code] HF evaluate seqeval 14:36 [code] Enitity-level fro...
Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)
มุมมอง 2.8Kปีที่แล้ว
Today we are going to talk about how to use HTTPX to send requests asynchronously and also, we will talk about how to perform rate limiting. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/httpx_rate_limiting/ 00:00 Intro 01:15 [Code] Implement async requests WITHOUT rate limiting 07:20 [Code] Trying it out 08:48 [Code] Implement async requests WITH rate lim...
Few-shot text classification with prompts
มุมมอง 4Kปีที่แล้ว
In this video, I will talk about a possible way how to perform few-shot text classification using prompt engineering and the OpenAI API. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification Inspiration for the video: github.com/explosion/prodigy-openai-recipes/tree/main Chat Completion API from OpenAI: platform.openai.com/docs/guides/g...
OpenAI function calling
มุมมอง 2.9Kปีที่แล้ว
In this video we will go through the new feature "Function calling" of the OpenAI API (see more info here: openai.com/blog/function-calling-and-other-api-updates). First, I talk about the concepts and then I code up a small example where we implement a "financial analyst" bot. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/openai_function_calling/example.py...
Deploying machine learning models on Kubernetes
มุมมอง 20Kปีที่แล้ว
In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials...
Haiku basics (neural network library from DeepMind)
มุมมอง 3.5K2 ปีที่แล้ว
In this video, we will go through basic concepts of Haiku which is a deep learning library created by DeepMind. Official repo: github.com/deepmind/dm-haiku Official docs: dm-haiku.readthedocs.io/en/latest/ Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics Chapters: 00:00 Intro 00:35 Cloning the repo setting things up 01:52 Parameters: hk.transform...
Product quantization in Faiss and from scratch
มุมมอง 7K2 ปีที่แล้ว
In this video, we talk about a vector compression technique called Product quantization. We first explain conceptually, what the main ideas are and then show how one can use an existing implementation of it from Faiss (IndexPQ). Finally, we also implement the algorithm from scratch. Last but not least, we run some experiments and compare different methods. Paper: lear.inrialpes.fr/pubs/2011/JDS...
GPT in PyTorch
มุมมอง 11K2 ปีที่แล้ว
In this video, we are going to implement the GPT2 model from scratch. We are only going to focus on the inference and not on the training logic. We will cover concepts like self attention, decoder blocks and generating new tokens. Paper: openai.com/blog/better-language-models/ Code minGPT: github.com/karpathy/minGPT Code transformers: github.com/huggingface/transformers/blob/0f69b924fbda6a442d7...
The Lottery Ticket Hypothesis and pruning in PyTorch
มุมมอง 9K3 ปีที่แล้ว
In this video, we are going to explain how one can do pruning in PyTorch. We will then use this knowledge to implement a paper called "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". The paper states that feedforward neural networks have subnetworks (winning tickets) inside of them that perform as good as (or even better than) the original network. It also proposes a ...
The Sensory Neuron as a Transformer in PyTorch
มุมมอง 3K3 ปีที่แล้ว
In this video, we implement a paper called "The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning" in PyTorch. It proposes a permutation invariant module called the Attention Neuron. Its goal is to independently process local information from the features and then combine the local knowledge into a global picture. Paper: arxiv.org/abs/2109.02869 O...
Integer embeddings in PyTorch
มุมมอง 2.4K3 ปีที่แล้ว
In this video, we implement a paper called "Learning Mathematical Properties of Integers". Most notably, we use an LSTM network and an Encyclopedia of integer sequences to train custom integer embeddings. At the same time, we also extract integer sequences from already pretrained models - BERT and GloVe. We then compare how good these embeddings are at encoding mathematical properties of intege...
PonderNet in PyTorch
มุมมอง 2.3K3 ปีที่แล้ว
In this video, we implement the PonderNet that was proposed in the paper "PonderNet: Learning to Ponder". It is a network that dynamically decides on the size of its forward pass. We are going to implement it and experiment with it a little bit on the so called ParityDataset. Note that the implementation is based on the labml.ai implementaiotn (see link below). I made some modification though s...
Mixup in PyTorch
มุมมอง 3.4K3 ปีที่แล้ว
In this video, we implement the (input) mixup and manifold mixup. They are regularization techniques proposed in the papers "mixup: Beyond Empirical Risk Minimization" and "Manifold Mixup: Better Representations by Interpolating Hidden States". We investigate how these two schemes compare against more mainstream regularization methods like dropout and weight decay. Paper (Input mixup): arxiv.or...
DINO in PyTorch
มุมมอง 14K3 ปีที่แล้ว
DINO in PyTorch
MLP-Mixer in Flax and PyTorch
มุมมอง 4.4K3 ปีที่แล้ว
MLP-Mixer in Flax and PyTorch
Differentiable augmentation for GANs (using Kornia)
มุมมอง 2.6K3 ปีที่แล้ว
Differentiable augmentation for GANs (using Kornia)
Growing neural cellular automata in PyTorch
มุมมอง 4.9K3 ปีที่แล้ว
Growing neural cellular automata in PyTorch
SIREN in PyTorch
มุมมอง 5K3 ปีที่แล้ว
SIREN in PyTorch
Vision Transformer in PyTorch
มุมมอง 86K3 ปีที่แล้ว
Vision Transformer in PyTorch
torch.nn.Embedding explained (+ Character-level language model)
มุมมอง 36K3 ปีที่แล้ว
torch.nn.Embedding explained ( Character-level language model)
Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients)
มุมมอง 9K3 ปีที่แล้ว
Gradient with respect to input in PyTorch (FGSM attack Integrated Gradients)
NumPy equality testing: multiple ways to compare arrays
มุมมอง 1.9K3 ปีที่แล้ว
NumPy equality testing: multiple ways to compare arrays
Custom optimizer in PyTorch
มุมมอง 7K3 ปีที่แล้ว
Custom optimizer in PyTorch
Mocking neural networks: unit testing in deep learning
มุมมอง 2.4K3 ปีที่แล้ว
Mocking neural networks: unit testing in deep learning
Visualizing activations with forward hooks (PyTorch)
มุมมอง 15K3 ปีที่แล้ว
Visualizing activations with forward hooks (PyTorch)

ความคิดเห็น

  • @tensorthug6802
    @tensorthug6802 10 วันที่ผ่านมา

    hey burh You're so underrated, Happy that I got your video recommended in my feed.

  • @helaluddinmullah2869
    @helaluddinmullah2869 26 วันที่ผ่านมา

    How can we perform pretrain self-supervised model training using DINOv2 with our own dataset?

  • @ahmadkelixo7243
    @ahmadkelixo7243 28 วันที่ผ่านมา

    permission to learn sir. thank you

  • @SmiteLax
    @SmiteLax หลายเดือนก่อน

    Cheers mate!

  • @jimmykhawand1315
    @jimmykhawand1315 หลายเดือนก่อน

    This is nice until it turns out you don't explain each line of the code, at least briefly 😢

  • @PriyaDas-he4te
    @PriyaDas-he4te 3 หลายเดือนก่อน

    Can we use this code for Change detection in two satellite images

  • @tirthasg
    @tirthasg 3 หลายเดือนก่อน

    What font, and color theme are you using? Looks really nice!

  • @bpac90
    @bpac90 3 หลายเดือนก่อน

    excellent!! I'm curious why my search always shows garbage and videos like this never come up. This was suggested by Gemini when I asked a question about ML model deployment.

  • @SunilSamson-w2l
    @SunilSamson-w2l 5 หลายเดือนก่อน

    the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.

  • @JorgeGarcia-eg5ps
    @JorgeGarcia-eg5ps 5 หลายเดือนก่อน

    Thank you for sharing this, I was actually looking for results of DINO on smaller compute/data so this is so helpful

  • @krishsharma4507
    @krishsharma4507 6 หลายเดือนก่อน

    its printing Original prediction: 293 how can I check the values or names of this predicted class

  • @Saevires
    @Saevires 6 หลายเดือนก่อน

    I am using custom tags, such as InvoiceNumber and GrossTotal. To work on entity level, does seqeval need tags in the format B- and I-?

  • @Huawei_Jiang
    @Huawei_Jiang 6 หลายเดือนก่อน

    Hello authors, thank you for your video. It helped me a lot. However, I have one question about your code. In the original mixup, which is from the link you provided, the author mixed the loss function instead of mixing the label. But I noticed you mixed the label. Could you please explain the reason for this difference in operation? Looking forward to your reply

  • @shivendrasingh9759
    @shivendrasingh9759 7 หลายเดือนก่อน

    Really helpful for foundation on ml ops

  • @larrymckuydee5058
    @larrymckuydee5058 7 หลายเดือนก่อน

    Is this method good if we want to search for list of products rather than chat-liked response?

    • @mildlyoverfitted
      @mildlyoverfitted 7 หลายเดือนก่อน

      Sure:) If you have text descriptions of the products then Elasticsearch/Opensearch + reranking is definitely a great option:)

  • @Larmbs
    @Larmbs 7 หลายเดือนก่อน

    You are incredible man. -You go at a good pace. -Each project feels well planed. -Nice formating style. -Good explanation. Ive just started really digging into this machine learning space, any recommendation on learning on all the different layer types, and problem types?

    • @mildlyoverfitted
      @mildlyoverfitted 7 หลายเดือนก่อน

      Thanks a ton! ML has changed quite a lot over the past few years. I guess one architecture you should be familiar with nowadays is the transformer:) But I guess you have heard about it by now:D Good luck with your learning!

  • @humanity-indian
    @humanity-indian 7 หลายเดือนก่อน

    Great example. Thanks for the information

  • @lucianobatista6295
    @lucianobatista6295 7 หลายเดือนก่อน

    hi man, do you offer some training or mentorship?

  • @paolobarba1782
    @paolobarba1782 7 หลายเดือนก่อน

    What to do if you want the encoding make by OpenSearch directly?

  • @akk2766
    @akk2766 7 หลายเดือนก่อน

    I concur with what everyone is saying - best video on function calling for sure. I really like the laid back nature of the tutorial - seriously simplifying function calling - even to the uninitiated! Only one suggestion: Please move inset video to top right so output can be seen in its entirety. Obviously not for this video, but for future awesome videos you produce.

    • @mildlyoverfitted
      @mildlyoverfitted 7 หลายเดือนก่อน

      Glad it was helpful! And thank you for the constructive feedback:)

  • @Munk-tt6tz
    @Munk-tt6tz 8 หลายเดือนก่อน

    This is the best video on this topic. Thank you!

    • @mildlyoverfitted
      @mildlyoverfitted 8 หลายเดือนก่อน

      Appreciate your comment!

  • @swk9015
    @swk9015 8 หลายเดือนก่อน

    what's the font you use?

    • @mildlyoverfitted
      @mildlyoverfitted 8 หลายเดือนก่อน

      Note sure. I am using this vim theme: github.com/morhetz/gruvbox so maybe you can find it somewhere in their repo.

  • @mmazher5826
    @mmazher5826 8 หลายเดือนก่อน

    is there any way of re SSL a pretrained DINO?

  • @danielasefa8087
    @danielasefa8087 8 หลายเดือนก่อน

    Thank you so much for helping me to understand ViT!! Great work

  • @PrafulKava
    @PrafulKava 8 หลายเดือนก่อน

    Great video ! Good explanation. Thanks for all your efforts in making detailed video along with code !

  • @leeuw6481
    @leeuw6481 8 หลายเดือนก่อน

    wow, this is dangerous xd

  • @prajyotmane9067
    @prajyotmane9067 8 หลายเดือนก่อน

    Where did you include positional encoding ? or its not needed when using convolutions for patching and embedding ?

  • @neiro314
    @neiro314 8 หลายเดือนก่อน

    great video as a student, thank you so much! i will say a few lines didn't feel very well explained, however im sure to someone with a bit more knowledge than I it would be clearer but overall 10/10 tysm

    • @mildlyoverfitted
      @mildlyoverfitted 8 หลายเดือนก่อน

      Great point actually:) Appreciate your feedback:)

  • @КириллКлимушин
    @КириллКлимушин 9 หลายเดือนก่อน

    I'm a huge fan of implementing algorithms from scratch by myself and watched this video with a great pleasure. Thanks for your work, it deserves more attention.

    • @mildlyoverfitted
      @mildlyoverfitted 9 หลายเดือนก่อน

      Thank you for the message!

  • @danieltello8016
    @danieltello8016 9 หลายเดือนก่อน

    great video, can i run the code in a mac with M1 chip as it is?

  • @iamragulsurya
    @iamragulsurya 9 หลายเดือนก่อน

    Name of the font?

    • @mildlyoverfitted
      @mildlyoverfitted 9 หลายเดือนก่อน

      So the theme I am using is here: github.com/morhetz/gruvbox . The README talks about the fonts I believe.

  • @navins2246
    @navins2246 9 หลายเดือนก่อน

    Doing ML in vim is absolutely gigachad

  • @harrisnisar5345
    @harrisnisar5345 9 หลายเดือนก่อน

    Amazing video. Just curious, what keyboard are you using?

    • @mildlyoverfitted
      @mildlyoverfitted 9 หลายเดือนก่อน

      Glad you enjoyed it! Logitech MX Keys S

  • @jeffg4686
    @jeffg4686 10 หลายเดือนก่อน

    "mildly overfitted" is how I like to keep my underwear so I don't get the hyena.

  • @davidpratr
    @davidpratr 10 หลายเดือนก่อน

    really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.

    • @davidpratr
      @davidpratr 10 หลายเดือนก่อน

      @@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      @@davidpratr interesting:) yes, it is tmux:)

  • @woutderijck5389
    @woutderijck5389 10 หลายเดือนก่อน

    When starting out, would you recommend just using embedding and vectorsearch or should you also consider the hybrid case of opensearch & vectorsearch? In the video it looks like you should go all in on vectorsearch

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      I would recommend just doing Opensearch + reranking. No embeddings (=vector search). Assuming you wanna have something minimal really quickly as demonstrated in the video:)

  • @Ldmp807
    @Ldmp807 10 หลายเดือนก่อน

    isn't this concurrency limit not rate limit? i.e limit per second

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      I think you are right:) The video title is definitely misleading. Sorry about that!

  • @vidinvijay
    @vidinvijay 10 หลายเดือนก่อน

    novelty explained in just over 6 minutes. 🙇

  • @kascesar
    @kascesar 10 หลายเดือนก่อน

    hi, im getting this error: ""'sagemaker_service:svc' is not found in BentoML store <osfs '/home/bentoml/bentos'>, you may need to run `bentoml models pull` first'."" any idea ? Thnks a lot

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      Hmmm, if the problem still persists you can create an issue here: github.com/jankrepl/mildlyoverfitted/issues Describing exactly what you did and I can try to help!

    • @kascesar
      @kascesar 10 หลายเดือนก่อน

      @@mildlyoverfitted solved, i did It. The problem come with bentoml versión, i had install bentoml==1.1.11 this solve the problema for me

  • @yuricastro522
    @yuricastro522 10 หลายเดือนก่อน

    Thank you so much, your example helped me to solve some problems :)

  • @macx7760
    @macx7760 11 หลายเดือนก่อน

    why is the shape of the mlp input at 2nd dim n_patches +1, isnt the mlp just applied to the class token?

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      So the `MLP` module is used inside of the Transformer block and and it inputs a 3D tensor. See this link for the only place where the CLS is explicitly extracted github.com/jankrepl/mildlyoverfitted/blob/22f0ecc67cef14267ee91ff2e4df6bf9f6d65bc2/github_adventures/vision_transformer/custom.py#L423-L424 Hope that helps:)

    • @macx7760
      @macx7760 10 หลายเดือนก่อน

      thanks, yeah confused the mlp inside the block with the mlp at the end for classification@@mildlyoverfitted

  • @macx7760
    @macx7760 11 หลายเดือนก่อน

    fantastic video, just a quick note: at 16:01 you say that "none of the operations are changing the shape of the tensor", but isnt this wrong, since when applying fc2, the last dim should be out_features, not hidden_features, so the shapes are also wrongly commented.

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      Nice find and sorry for the mistake:)! Somebody already pointed it out a while ago:) Look at the pinned errata comment:)

    • @macx7760
      @macx7760 10 หลายเดือนก่อน

      ah i see, my bad :D @@mildlyoverfitted

  • @TwenTV
    @TwenTV 11 หลายเดือนก่อน

    Which frameworks would you recommend if you had to scale to +1000 models? I am looking at custom FastAPI and MLFlow with AWS Lambda, but where each inference request will load the model from object storage and call .predict. The models are generally lightweight and predictions would only have to be made on an hourly basis, so I don't think its necessary to serve them in memory.

    • @mildlyoverfitted
      @mildlyoverfitted 10 หลายเดือนก่อน

      If you are not experiencing a cold start (or you don't care) then Lambda is definitely a great solution:)

  • @noedie4973
    @noedie4973 11 หลายเดือนก่อน

    Thanks for the nice video explanation! Could you please tell me what modifications I can make to get the output in a certain format? Say I want it to output only the label value with no other text?

    • @mildlyoverfitted
      @mildlyoverfitted 11 หลายเดือนก่อน

      Thank you! The current template should lead to you only getting the label. However, feel free to prompt engineer it if you are not getting the expected result. You can also request it to give you a valid JSON which you can then easily parse:) Just an idea. Hope that helps:)

    • @noedie4973
      @noedie4973 11 หลายเดือนก่อน

      @@mildlyoverfitted thanks, it really helped me a lot. I achieved perfect results by restricting my response token limit. So it focusses on outputting the digit label (in flexible forms), from which i can extract it using simple regex. The JSON method seems v clean too.

  • @idoronen9497
    @idoronen9497 11 หลายเดือนก่อน

    Thank you for the video! I have a question: If I need to make updates to an existing service, do I have to go through the entire process again, or is there a more efficient way? Bentoctl build seems quite time-consuming. Appreciate your help!"

    • @mildlyoverfitted
      @mildlyoverfitted 11 หลายเดือนก่อน

      Appreciate your comment! If the change is inside of your ML model or the serving logic (service.py) you will have to rebuild the image. However, the second time around some layers should be cached (docs.docker.com/build/guide/layers/ ) so in theory it should be faster (it depends though). Another thing you can do is to build the image in some virtual machine rather than locally. A common setup is that you build it + upload to ECR in your CI (e.g. GitHub actions) Just some ideas:)

  • @Lithdren
    @Lithdren 11 หลายเดือนก่อน

    Is there a method you can use to rate limit by time? Im interacting with an API that limits me to no more than 20 requests a minute, and i've been struggling with a way to handle that. Right now I keep track of the time of the last call, and if I made a request within the last 3 seconds I wait till 3 seconds, then send out the next request. I have multiple API keys I can utilize, and each key has a set limit, so I cycle through them, but it feels like there must be a faster way.

    • @mildlyoverfitted
      @mildlyoverfitted 11 หลายเดือนก่อน

      One alternative solution is to use some open source package (e.g. github.com/florimondmanca/aiometer ). I don't really know much about it but maybe it can help:)

  • @gunabalang9543
    @gunabalang9543 11 หลายเดือนก่อน

    what keyboard are you using?

  • @aditya_01
    @aditya_01 11 หลายเดือนก่อน

    great video thanks a lot really liked the explanation !!!.

  • @nandakishorejoshi3487
    @nandakishorejoshi3487 ปีที่แล้ว

    Great video. How to run a text generation model? I tried running a GPT2 model with the below code Creating API : transformers-cli serve --task=text-generation --model=gpt2 Calling API: curl -X POST localhost:8888/forward -H "accept: application/json" -H "Content-Type: application/json" -d '{"inputs":"What is Deep Learning","parameters":{"max_new_tokens":20}}' But getting error in the response {"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}

  • @theAhmd
    @theAhmd ปีที่แล้ว

    terminal and theme name please