UMAP explained | The best dimensionality reduction?

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ส.ค. 2024
  • UMAP explained! The great dimensionality reduction algorithm in one video with a lot of visualizations and a little code.
    Uniform Manifold Approximation and Projection for all!
    ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
    📺 PCA video: • PCA explained with int...
    📺 Curse of dimensionality video: • The curse of dimension...
    💻 Babyplots interactive 3D visualization in R, Python, Javascript with PowerPoint Add-in! Check it out at bp.bleb.li/
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
    Patreon: / aicoffeebreak
    Ko-fi: ko-fi.com/aicoffeebreak
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Outline:
    * 00:00 UMAP intro
    * 01:31 Graph construction
    * 04:49 Graph projection
    * 05:48 UMAP vs. t-SNE visualized
    * 07:31 Code
    * 08:12 Babyplots
    📚 Coenen, Pearce | Google Pair blog: pair-code.github.io/understan...
    📄 UMAP paper: McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arxiv.org/abs/1802.03426
    📺 Leland McInnes talk ‪@enthought‬ : • UMAP Uniform Manifold ...
    🎵 Music (intro and outro): Dakar Flow - Carmen María and Edu Espinal
    -------------------------------
    🔗 Links:
    TH-cam: / aicoffeebreak
    Twitter: / aicoffeebreak
    Reddit: / aicoffeebreak
    #AICoffeeBreak #MsCoffeeBean #UMAP #MachineLearning #research #AI

ความคิดเห็น • 103

  • @lelandmcinnes9501
    @lelandmcinnes9501 3 ปีที่แล้ว +96

    Thanks for this -- it is a very nice short succinct description (with good visuals) that still manages to capture all the important core ideas. I'll be sure to recommend this to people looking for a quick introduction to UMAP.

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +13

      Wow, we feel honoured by your comment! Thanks.

  • @gregorysech7981
    @gregorysech7981 3 ปีที่แล้ว +17

    Wow, this channel is a gold mine

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +10

      I beg to differ. It is a coffee bean mine. 😉

  • @Shinigami537
    @Shinigami537 2 ปีที่แล้ว +2

    I have seen and 'interpreted' so many UMAP plots and have not understood its utility until today. Thank you.

  • @dengzhonghan5125
    @dengzhonghan5125 3 ปีที่แล้ว +4

    That baby plot really looks amazing!!

  • @bosepukur
    @bosepukur 3 ปีที่แล้ว +3

    didnot know about babyplot...thanks for sharing !

  • @fleurvanille7668
    @fleurvanille7668 3 ปีที่แล้ว +5

    I wish you are the teacher of all subjects in the world! Many thanks

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +4

      Wow, this is so heartwarming! Thanks for this awesome comment! 🤗

  • @dzanaga
    @dzanaga 3 ปีที่แล้ว +4

    Thanks for making this clear and entertaining! I love the coffee bean 😂

  • @dexterdev
    @dexterdev 3 ปีที่แล้ว +10

    wow! that is a very well dimensionally reduced version of UMAP algo

  • @ShubhamYadav-xr8tw
    @ShubhamYadav-xr8tw 3 ปีที่แล้ว +8

    I didn't know about this before! Thanks for this video Letitia!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +3

      Glad it was helpful! UMAP is a must-know for dimensionality reduction nowadays.

  • @CodeEmporium
    @CodeEmporium 2 ปีที่แล้ว +3

    This is really good. Absolutely love the simplicity 👍

  • @willsmithorg
    @willsmithorg 2 ปีที่แล้ว +3

    Thanks. I'd never heard of UMAP. Now I'll definitely be trying it as a replacement the next time I reach for PCA.

  • @ImbaFerkelchen
    @ImbaFerkelchen ปีที่แล้ว +1

    Hey Letitia, really amazing Video on UMAP. Love your easy to follow explanations :D Keep up the good work

  • @DerPylz
    @DerPylz 3 ปีที่แล้ว +8

    I finally understand!

  • @20Stephanus
    @20Stephanus 2 ปีที่แล้ว +2

    1st video i saw. Loved it. Subscribed.

  • @emanuelgerber
    @emanuelgerber 4 หลายเดือนก่อน +1

    Thanks for making this video! Very helpful

  • @python-programming
    @python-programming 3 ปีที่แล้ว +3

    This is incredibly helpful. Thanks!

  • @capcloud
    @capcloud ปีที่แล้ว +1

    Love it, thanks Ms. Coffee and Letitia!

  • @gurudevilangovan
    @gurudevilangovan 2 ปีที่แล้ว +2

    2 videos in and I’m already a fan of this channel. Cool stuff! 😎

    • @AICoffeeBreak
      @AICoffeeBreak  2 ปีที่แล้ว +2

      Hey thanks! Great to have you here.

  • @DungPham-ai
    @DungPham-ai 3 ปีที่แล้ว +5

    Love you so much.

  • @ylazerson
    @ylazerson 2 ปีที่แล้ว +1

    Awesome as always!

  • @vi5hnupradeep
    @vi5hnupradeep 3 ปีที่แล้ว +3

    Thank you so much !

  • @ehtax
    @ehtax 3 ปีที่แล้ว +4

    very fun and educative explanation of a difficult method! keep the vids coming ms coffeebean!!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +3

      Thank you! 😃 There will be more to come.

  • @user-vg3qj1cv8h
    @user-vg3qj1cv8h 3 ปีที่แล้ว +3

    Find a great channel! Thanks for sharing

  • @denlogv
    @denlogv 3 ปีที่แล้ว +7

    Great work, Letitia! Needed this kind of introduction to UMAP :) And thanks for the links!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +4

      Glad it was helpful, Denis!
      Are you interested in UMAP for word embedding visualization? Or for something entirely different?

    • @denlogv
      @denlogv 3 ปีที่แล้ว +3

      @@AICoffeeBreak yeah, something similar. Actually I found its use in BertTopic very interesting, where we reduce dimensionality of document embeddings (which leverage sentence-transformers) to later cluster and visualize different topics :)
      towardsdatascience.com/topic-modeling-with-bert-779f7db187e6

  • @marianagonzales3201
    @marianagonzales3201 2 ปีที่แล้ว +1

    Thank you very much! that was a great explanation 😊

  • @jcwfh
    @jcwfh 2 ปีที่แล้ว +2

    Amazing. Reminds me of Gephi.

  • @damp8277
    @damp8277 3 ปีที่แล้ว +2

    Fantastic! Such a good explanation, and thanks for the babyplot tip. Awesome channel!!!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +3

      So glad you like it! ☺️

    • @damp8277
      @damp8277 3 ปีที่แล้ว +1

      @@AICoffeeBreak It'll be very helpful. In geochemistry we usually work with 10+ variable, so having a complement to PCA will make analysis more robust

  • @BitBlastBroadcast
    @BitBlastBroadcast 2 ปีที่แล้ว +2

    great explanation!

    • @AICoffeeBreak
      @AICoffeeBreak  2 ปีที่แล้ว +1

      Happy it was helpful! 👍

  • @babakravandi
    @babakravandi 9 หลายเดือนก่อน +1

    Great video!

  • @sumailsumailov1572
    @sumailsumailov1572 2 ปีที่แล้ว +2

    Very cool, thanks for it!

  • @hiramcoriarodriguez1252
    @hiramcoriarodriguez1252 3 ปีที่แล้ว +8

    The visuals are amazing

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +4

      You're amazing! *Insert Keanu Reeves meme here* 👀

  • @HoriaCristescu
    @HoriaCristescu 3 ปีที่แล้ว +4

    felicitari pentru un canal excelent

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +1

      Mulțumesc mult pentru apreciere!

  • @rohaangeorgen4055
    @rohaangeorgen4055 3 ปีที่แล้ว +3

    Thank you for explaining it wonderfully 😊

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +1

      So nice of you to leave this lovely comment here! 😊

  • @arminrose4946
    @arminrose4946 2 ปีที่แล้ว +5

    This is really fantastic stuff! Thanks for teaching it in such an easy-to-grasp way. I must admit I didn't manage the original paper, since I am "just" a biologist. But this video helped a lot.
    I would have a question: I wanted to project the phenological similarity of animals at certain stations, to see which stations were most similar in that respect. For each day at each station there is a value of presence or absence of a certain species. Obviously there is also temporal autocorrelation involved here. My first try with UMAP gave a very reasonable result, but I am unsure if is a valid method for my purposes. What do you think, Letitia or others?

  • @floriankowarsch8682
    @floriankowarsch8682 3 ปีที่แล้ว +3

    Very nice explanation!

  • @HighlyShifty
    @HighlyShifty ปีที่แล้ว +1

    Great introduction to UMAP, thanks

  • @furek5
    @furek5 2 ปีที่แล้ว +2

    Thank you!

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 3 ปีที่แล้ว +3

    Great vid

  • @shashankkumaryerukola
    @shashankkumaryerukola 9 หลายเดือนก่อน +1

    Thank you

  • @talithatrost3813
    @talithatrost3813 3 ปีที่แล้ว +5

    Wow! Wow! Ich mag es!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +3

      Cold Mirror, is that you? 👀

  • @dionbridger5944
    @dionbridger5944 ปีที่แล้ว +1

    Very nice explanation. Do you have any other videos with more information about umap? What are the limitations as compared with e.g. deep neural nets?

  • @pl1840
    @pl1840 2 ปีที่แล้ว +7

    I would like to point out that the statement around 6:44 that says that changing the hyperparameters of tSNE completely changes the result of the embedding is very likely to be the result of a random initialisation on tSNE, whereas the UMAP implementation you are using brings the same initialisation for each set of hyperparameters. It is good practice to initialise tSNE with PCA; if that was the case in the video, the results between hyperparameter changes in tSNE and UMAP would be comparable.

  • @AmruteshPuranik
    @AmruteshPuranik 3 ปีที่แล้ว +2

    Amazing!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +1

      You're amazing! [Insert Keanu Reeves meme here] 👀
      Thanks for watching and for dropping this wholesome comment!

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk 3 ปีที่แล้ว +3

    Hello 😎

  • @cw9249
    @cw9249 ปีที่แล้ว +1

    interesting how the 2d graph of the mammoth becomes kind of like the mammoth on its stomach with its limbs spread out

  • @klammer75
    @klammer75 ปีที่แล้ว

    This almost sounds like an extension of KNN to the unsupervised domain….very cool🥳🧐🤓

  • @kiliankleemann4251
    @kiliankleemann4251 9 หลายเดือนก่อน

    Very nice :D

    • @AICoffeeBreak
      @AICoffeeBreak  9 หลายเดือนก่อน

      Thank you! Cheers!

  • @arnoldchristianloaizafabia4657
    @arnoldchristianloaizafabia4657 3 ปีที่แล้ว +4

    Hello, What is the complexity of UMAP? . Thanks for the video.

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +3

      I think the answer to your question is here 👉github.com/lmcinnes/umap/issues/8#issuecomment-343693402

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 หลายเดือนก่อน

    I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.

  • @thomascorner3009
    @thomascorner3009 3 ปีที่แล้ว +2

    Great introduction! What is your background if I may ask?

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +1

      I'm from physics and computer science. 🙃 Ms. Coffee Bean is from my coffee roaster.

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +1

      What is your background if we may ask? And what brings you to UMAP?

    • @thomascorner3009
      @thomascorner3009 3 ปีที่แล้ว +1

      @@AICoffeeBreak Hello :) I thought as much. My background is in theoretical physics, but I am making a living in analyzing neuroscience (calcium imaging) data. It seems that neuroscience is now very excited in using the latest data reduction techniques, hence my interest in UMAP. :) I really like the "coffee bean" idea: friendly, very approachable and to the point.

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +1

      Theoretical physicist in neuroscience! I'm impressed.

  • @hannesstark5024
    @hannesstark5024 3 ปีที่แล้ว +2

    Nice video! And 784 :D

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +2

      Thank you very much! Did Ms. Coffee Bean say something wrong with 784? 😅

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +2

      Ah, now I noticed. She said 764 instead of 784. Seems like Ms. Coffee Bean cannot be trusted with numbers. 🤫

  • @TooManyPBJs
    @TooManyPBJs 3 ปีที่แล้ว

    I think it is import umap-learn instead of import umap. Great video. Just weird I cannot get it to run on google colab. When I run cell with bp variable, it is just blank. No errors. Weird.

  • @terryr9052
    @terryr9052 2 ปีที่แล้ว +1

    I am curious if anyone knows if it is possible to use UMAP (or other projection algorithms) in the other direction: From a low dimensional projection -> a spot in high dimensional space?
    An example would be picking a spot between clusters in the 0-9 digit example (either 2d or 3d) and seeing what the new resulting "number" looked like (in pixel space).

    • @AICoffeeBreak
      @AICoffeeBreak  2 ปีที่แล้ว +3

      What you are asking for is a generative model. But let's start from the bottom.
      I don't want to say that dimensionality reduction is easy, but let's put it like this: summarizing stuff (dim. reduction) is easier than inventing new stuff (going from low to high dimensions). Because the problem you are asking about is a little loser defined since all these new dimensions have to be filled *meaningfully*.
      Happily, there are methods that do these kinds of generations. In a nutshell, one trains them on lots and lots of data to generate the whole data sample (an image of handwritten digits) from summaries. Pointer -> you might want to look onto (variational) Autoencoders and Generative Adversarial Networks.

    • @terryr9052
      @terryr9052 2 ปีที่แล้ว +1

      @@AICoffeeBreak Thank you for the long response! I am moderately familiar with both GANs and VQ-VAEs but did not know if a generated sample could be chosen from the UMAP low dimensional projected space.
      For example, the VAE takes images, compresses it to an embedded space and then restores the original. UMAP could take that embedded space and further reduce it to represent it in a 2D graph.
      So what I want is 2D representation -> embedding -> full reconstructed new sample. I was uncertain if that 1st step is permitted.

    • @AICoffeeBreak
      @AICoffeeBreak  2 ปีที่แล้ว +2

      ​@@terryr9052 I would say yes, this is possible and I think you are on the right track, so I'll push further. :)
      With GANs, this is minimally different, I will focus on VAEs for now:
      *During training* a VAE does exactly as you say: image (I) -> low. dim. embedding (E) -> image (I), therefore the name AUTOencoder. What I think is relevant for you is that E can be 2-dimensional. The dimensionality of E is actually a hyperparameter and you can adjust it like the rest of your architecture flexibly. Choosing such a low dimensionality of E might only mean that when you go from I -> E -> I, the whole process is lossy. I -> E (the summary, encoder) is simple. But E -> I, the reconstruction or in a sense: the re-invention of information (decoder) in many dimensions is complicated to achieve from only 2 dimension. Therefore it is easier when the dimensionality of E is bigger (something like 128-ish in "usual" VAEs).
      In a nutshell, what I just described in the I -> E step is what any other dimensionality reduction algorithm does too (PCA; UMAP; t-SNE). But this time, it's implemented by a VAE: The E -> I step is what you want, and here it comes for free. Because what you need is the *testing step*.
      You have trained a VAE that can take any image, encode it (to 2 dims) and decode it. But now with the trained model, you can just drop the I -> E and position yourself somewhere in the E space (i.e. give it an E vector) and let the E -> I routine run.
      I do not know how far I should go, because I also have thoughts for the case where you really, really want to use I -> E to be forcibly the UMAP routine and not a VAE encoder. Because in that case, you would need to train only a decoder architecture. Or a GAN. Sorry, it gets a little too much to put into a comment. 😅

    • @terryr9052
      @terryr9052 2 ปีที่แล้ว +2

      @@AICoffeeBreak Thanks again! I'm going to read this carefully and give it some thought.

  • @nogribin
    @nogribin ปีที่แล้ว +1

    wow.

  • @divergenny
    @divergenny ปีที่แล้ว

    will be here tSNE ?

  • @lisatrost7486
    @lisatrost7486 3 ปีที่แล้ว +4

    Hoffentlich viele Freunde Vertrauen! Ich bringe meine Freundin, ihr Haus zu kaufen!

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +2

      It looks like we have a strong Cold Mirror fanbase here. Ms. Coffee Bean is also a fan of hers, btw.

  • @quebono100
    @quebono100 2 ปีที่แล้ว +1

    ValueError: cannot reshape array of size 47040000 into shape (60000,784)

    • @quebono100
      @quebono100 2 ปีที่แล้ว +1

      What the matter with this xD

    • @quebono100
      @quebono100 2 ปีที่แล้ว +2

      Ok I solved this, I had 6k instead of 60k

  • @pvlr1788
    @pvlr1788 2 ปีที่แล้ว

    Does the Babyplots librari still supported? It does not work for me in all envs I've tried.. :(

    • @DerPylz
      @DerPylz 2 ปีที่แล้ว +4

      Hi! I'm the creator of babyplots. Yes, the library is still actively supported. If you're having issues with getting started, please join the babyplots discord server, which you'll find on our support page: bp.bleb.li/support or write an issue on one of the github repositories. I'll be sure to help you there.

  • @Skinishh
    @Skinishh 2 ปีที่แล้ว

    How do you judge the performance of UMAP on your data? In PCA you can look at the explained variance, but what about UMAP?

  • @luck3949
    @luck3949 3 ปีที่แล้ว

    You can't say that PCA "can be put in company with SVD". SVD is one of available implementations of PCA. PCA means "a linear transformation, that transform data into a bases with first component aligned with direction of maximum variation, second component aligned with direction of maximum variation of data, projected on hyperplane orthogonal to first component, etc". SVD is a matrix factorisation method. It turns out, that when you perform SVD you get PCA. But it doesn't mean that SVD is dimensionality reduction algorithm - SVD is a way to represent a matrix. It can be used for many different purposes (ex. for quadratic programming), not necessarily reduction of dimensionality. Same for PCA, it can be performed using SVD, but other numerical methods exist as well.

    • @AICoffeeBreak
      @AICoffeeBreak  3 ปีที่แล้ว +6

      You make some good observations, but we do not entirely agree. We think there are important differences between SVD and PCA. In any case, there by "put into company" we did not mean to go into the specific details about the relationship between these algorithms. It was meant more like "if you think about PCA, you should think about matrix factorization like SVD or NMF", this is what we understand by "put into company" as we do not say "it is" or "is absolutely and totally *equivalent* with".

  • @search_is_mouse
    @search_is_mouse 3 ปีที่แล้ว

    와드

  • @joelwillis2043
    @joelwillis2043 ปีที่แล้ว

    I saw no proof of best so you failed to answer your own question.

  • @MrChristian331
    @MrChristian331 3 ปีที่แล้ว

    that coffee bean looks like a "shit"