How to train a model to generate image embeddings from scratch

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.พ. 2025

ความคิดเห็น • 41

  • @emrahe468
    @emrahe468 8 หลายเดือนก่อน +9

    I had been working on a similar problem for a few weeks and had already implemented most of the code you mentioned (after many trial and errors) . But after watching your video, I realized that I had missed a few crucial details like the dense layer and the loss function. Your clear instructions and fantastic tutorial really saved me tons of of time. I wish you had released this video earlier, but regardless, thank you very much! 🙏

    • @underfitted
      @underfitted  8 หลายเดือนก่อน +1

      Thank you!

  • @LuisAlvarado-hm3br
    @LuisAlvarado-hm3br 8 หลายเดือนก่อน +3

    Great, insightful video with an original approach to explaining embeddings. Most explanations focus on text, so it's refreshing to see image embeddings for a change. It's also fantastic to see such an influential paper used as a reference for the implementation. Thank you!

  • @toddroloff93
    @toddroloff93 8 หลายเดือนก่อน +2

    Great video. I like your enthusiasm, and passion you display in your videos. The way you break things down and explain it is great. Thank you

  • @chidubem31
    @chidubem31 8 หลายเดือนก่อน +2

    cool explanation, i always wondered how embeddings worked at the lower level

  • @sachinmohanty4577
    @sachinmohanty4577 8 หลายเดือนก่อน +2

    Beautiful explanation ❤ loved the tutorial 😊

  • @dcrasto
    @dcrasto 7 หลายเดือนก่อน +1

    Thanks!

  • @ThetaPhiPsi
    @ThetaPhiPsi 8 หลายเดือนก่อน

    Contrastive explained nicely! It's a shame nobody uses it.
    I've some improvements to add:
    1. you can use the model itself to compare pairs and take the loss to discriminate results (but the embedding is fine too for a class of downstream tasks)
    2. you can further take ROC AUC and optimize your threshold on the given training data (I used sigmoid to squish the loss between 0 and 1)
    Works nicely!

  • @kalinduSekara
    @kalinduSekara 8 หลายเดือนก่อน +3

    Clear and great explanation 💯

  • @yaseromar1539
    @yaseromar1539 8 หลายเดือนก่อน

    What a magnificent explanation, every time I watch one of your videos I feel enjoyment and excitement and I can see the same in your way of talking about machine learning 🤩🤩🤩🤩🤩🤩🤩🤩🤩🤩

  • @ojaspatil2094
    @ojaspatil2094 2 หลายเดือนก่อน

    thank you for the intuitive explaination!

  • @Aclodius
    @Aclodius 8 หลายเดือนก่อน +3

    You're doing the Lord's work

  • @ian-haggerty
    @ian-haggerty 8 หลายเดือนก่อน +1

  • @KoenYskout
    @KoenYskout 8 หลายเดือนก่อน +1

    I experimented with modifying the embedding size to 2, and visualize that on a 2d plot (colored by label). Easy to see how all (or most) numbers with the same label are clustered together by the embedding, and numbers with a different label are moved apart.

  • @wilfredomartel7781
    @wilfredomartel7781 3 หลายเดือนก่อน

    Great explication!

  • @sam.scrolls
    @sam.scrolls 4 หลายเดือนก่อน

    Thank you for the wonderful explanation. I understood the importance of loss function here. If I want to create an embedding with multiple objects in one image, can you please give some insights on how it can be done?

  • @chuanana
    @chuanana 7 หลายเดือนก่อน

    Thank you for the video! Is it expected to have the distance of image embeddings of different labels (3 vs. 7) to be greater than 1? I got (1.0468788, 1.087123). Since we normalized the inputs, I had expected the embedding distance to be normalized as well. Is there an expected range for the distance?

  • @LanreOladele
    @LanreOladele 7 หลายเดือนก่อน

    @Underfitted , Thank you for this amazing video. How would you ideally do the same using 3d images?

  • @LanreOladele
    @LanreOladele 7 หลายเดือนก่อน

    I sincerely would like to see how you'd go about it using 3d images while implementing triplet loss

  • @raheemnasirudeen6394
    @raheemnasirudeen6394 7 หลายเดือนก่อน

    A great explanation

  • @arashsheikh65
    @arashsheikh65 4 หลายเดือนก่อน

    Thank you!

  • @ddemmkkimm
    @ddemmkkimm 8 หลายเดือนก่อน +1

    1:51 Image is not 2D data. It is # of pixels dimensional data, i. e. width x height.

    • @underfitted
      @underfitted  8 หลายเดือนก่อน +1

      I meant you need 2 dimensions to represent one image: 1 dimension to represent height and 1 to represent width.

  • @user-wm8xr4bz3b
    @user-wm8xr4bz3b 7 หลายเดือนก่อน

    Thanks for the video! so am i right to say that the process is the supervised learning?

    • @underfitted
      @underfitted  7 หลายเดือนก่อน

      This one is supervised, yes

  • @thevoyager7675
    @thevoyager7675 8 หลายเดือนก่อน

    Thanks for the nice explanation!
    Could we use these image embeddings for classification tasks? if so, how?

    • @underfitted
      @underfitted  8 หลายเดือนก่อน

      You could. You can create 10 template embeddings, representing each digit. To classify a new image, compare it to all 10 embeddings and select the closest one.

    • @KoenYskout
      @KoenYskout 8 หลายเดือนก่อน

      I would say: transform the input into its embedding, and classify based on the embedding coordinates. I guess a simple KNN classifier will already do well, because similar numbers are moved closer together, and different numbers further apart, in the embedding.

  • @mehershahzad-n5s
    @mehershahzad-n5s 5 หลายเดือนก่อน

    Impressive clip

  • @АлексГладун-э5с
    @АлексГладун-э5с 8 หลายเดือนก่อน +1

    amazing

  • @gemini_537
    @gemini_537 8 หลายเดือนก่อน

    Gemini 1.5 Pro: This video is about creating image embeddings from scratch using a neural network.
    The speaker starts by explaining what embeddings are and why they are important. Embeddings are a way of representing data points as vectors in a high-dimensional space. Similar data points will have similar embeddings, while dissimilar data points will have dissimilar embeddings. This makes embeddings useful for tasks such as finding similar documents or images.
    The speaker then introduces the concept of a Siamese network. A Siamese network is a type of neural network that takes two inputs and outputs a measure of similarity between the inputs. The speaker explains how to use a Siamese network to train a model to generate image embeddings.
    The speaker then shows how to train the model on a dataset of handwritten digits. The model learns to generate embeddings for the digits such that similar digits (e.g., two different images of the digit 3) have similar embeddings, while dissimilar digits (e.g., an image of 3 and an image of 7) have dissimilar embeddings.
    Finally, the speaker shows how to use the trained model to generate embeddings for new images. The speaker concludes by discussing some of the applications of image embeddings.

  • @ian-haggerty
    @ian-haggerty 8 หลายเดือนก่อน +1

    Funny, it wasn't too long ago that MNIST wasn't a "toy" problem. The history of computer vision is rather short. Are we writing the beginning of it?

    • @underfitted
      @underfitted  8 หลายเดือนก่อน

      Probably

  • @privateprivate-g3j
    @privateprivate-g3j 3 หลายเดือนก่อน +1

    It lacks a lot of context. It is just about trying some functions. what about the mathematical concept?

  • @sad_man_no_talent
    @sad_man_no_talent 8 หลายเดือนก่อน +1

    9000+ power

  • @alliedeena1141
    @alliedeena1141 หลายเดือนก่อน

    Is this even from scratch?! Using external libraries doesn't mean it's from scratch.

  • @ajanieniola9172
    @ajanieniola9172 3 หลายเดือนก่อน

    Please LangGrpah

  • @anime_comp
    @anime_comp 5 หลายเดือนก่อน

    Way too basic for people who already know about Neural networks, good enthusiasm though