StatQuest: t-SNE, Clearly Explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 มิ.ย. 2024
  • t-SNE is a popular method for making an easy to read graph from a complex dataset, but not many people know how it works. Here's the inside scoop.
    Here’s how to create a t-SNE graph in R (this is copied from the help file for Rtsne)…
    library("Rtsne")
    iris_unique <- unique(iris) # Remove duplicates
    iris_matrix <- as.matrix(iris_unique[,1:4])
    set.seed(42) # Set a seed if you want reproducible results
    tsne_out <- Rtsne(iris_matrix) # Run TSNE
    Show the objects in the 2D tsne representation
    plot(tsne_out$Y,col=iris_unique$Species)
    This StatQuest is based on the original t-SNE manuscript, and it's not super hard to read (especially if you understand the general idea of how it works): lvdmaaten.github.io/publicati...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    1:19 Overview of what t-SNE does
    2:24 Overview of how t-SNE works
    4:12 Step 1: Determine high-dimensional similarities
    9:26 Step 2: Determine low-dimensional similarities
    10:33 Step 3: Move points in low-d
    11:05 Why the t-distribution is used instead of the normal distribution
    Corrections:
    6:17 I should have said that the blue points have twice the density of the purple points.
    7:08 There should be a 0.05 in the denominator, not a 0.5.
    #statquest #tsne

ความคิดเห็น • 739

  • @statquest
    @statquest  4 ปีที่แล้ว +64

    Corrections:
    6:17 I should have said that the blue points have twice the density of the purple points.
    7:08 There should be a 0.05 in the denominator, not a 0.5.
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @linweitao6470
      @linweitao6470 4 ปีที่แล้ว +1

      Thanks very much for the informative lecture and it is really helpful. UMAP is more and more popular now, could you explain it and compare with tSNE as well? Thanks in advance.

    • @statquest
      @statquest  4 ปีที่แล้ว +5

      @@linweitao6470 I should have a UMAP StatQuest ready in a few weeks. I'm working on it right now.

    • @linweitao6470
      @linweitao6470 4 ปีที่แล้ว +1

      @@statquest Thanks again!

    • @CompBioQuest
      @CompBioQuest 4 ปีที่แล้ว +2

      @@statquest UMAP is great, I dont know if it is more popular. There are more stringent reductions out there like ICA. I wonder the thoughts of Josh about it?

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      @@CompBioQuest I guess it largely depends on the field. Right now, genetics and molecular biology are going bonkers over UMAP. However, ICA is very interesting. Thanks to your question, I found this article which is fascinating: gael-varoquaux.info/science/ica_vs_pca.html

  • @abdulgadirhussein2244
    @abdulgadirhussein2244 4 ปีที่แล้ว +75

    I am always blown away by how you make statistics & machine learning algorithms so simple to understand and how you graciously share your knowldege. Keep up the great work man, you are awesome!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you very much! :)

  • @user-hb2fj3hx2w
    @user-hb2fj3hx2w 3 ปีที่แล้ว +16

    Whenever I find statistics technique I have never seen in scientific article, I always visit your channel. Thanks a lot!!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Happy to help! :)

  • @ramnarasimhan1499
    @ramnarasimhan1499 6 ปีที่แล้ว +1

    Fantastic video. I really appreciate all the slides that you made to get the animation effect. It really helped. Possibly the best explanation of t-SNE around. Keep up the good work.

  • @jjlian1670
    @jjlian1670 4 ปีที่แล้ว +12

    Josh is so far my favorite TH-camr that is able to explain complex stats concepts so smoothly.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you so much! :)

  • @douglasaraujo9763
    @douglasaraujo9763 3 ปีที่แล้ว +92

    As entertaining as watching a Walt t-SNE movie!

    • @statquest
      @statquest  3 ปีที่แล้ว +13

      You made me laugh out loud! BAM! :)

    • @arenashawn772
      @arenashawn772 4 หลายเดือนก่อน

      Best stat-word-play of the year! 😂

  • @edridgedsouza1170
    @edridgedsouza1170 3 ปีที่แล้ว +52

    "This is Josh Starmer, and you're watching Tisney Channel!"

    • @statquest
      @statquest  3 ปีที่แล้ว +8

      Triple BAM! :)

  • @veronikaberezhnaia248
    @veronikaberezhnaia248 2 ปีที่แล้ว +11

    I regret I can't put 1000 likes! I read about 20 articles about t-SNE, they are similar to one another, almost identical - and they don't get me closer to the point. But your video - I watched it 4 times (because the topic is hard, at least for me) with making some and drawing - but finally I understand how it works, up to the point that I can explain it to someone else. So many thanks to you!

    • @statquest
      @statquest  2 ปีที่แล้ว

      HOORAY!!! TRIPLE BAM! I'm glad the video was helpful. BAM! :)

  • @gustavomorais4489
    @gustavomorais4489 3 ปีที่แล้ว +13

    I never leave comments, but I really feel the need to thank you for being able to explain this in such a simple way

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Thank you! :)

  • @gayathrikurada3315
    @gayathrikurada3315 4 ปีที่แล้ว +5

    Josh.. Your explanation is always "simple and easy to understand" even for layman.You are simply "The life Saviour" !!!
    Thank you so much :)

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Hooray! I'm glad my video was helpful. :)

  • @kass8036
    @kass8036 6 ปีที่แล้ว +235

    I never knew machine learning could be as simple as... BAM

    • @thomasrad6296
      @thomasrad6296 3 ปีที่แล้ว +1

      Thats like the most important lesson.

    • @namimiable
      @namimiable 3 ปีที่แล้ว +2

      Double bam 💥

    • @kalyanben10
      @kalyanben10 3 ปีที่แล้ว +3

      Just a random comment so that someone can say triple bam

    • @kass8036
      @kass8036 3 ปีที่แล้ว +4

      Triple bam 💥

    • @birenpatel894
      @birenpatel894 3 ปีที่แล้ว +3

      hurayyyy we have made it to the END !!!

  • @thedrunkprogrammer1474
    @thedrunkprogrammer1474 4 ปีที่แล้ว +1

    I really can't appreciate you enough for your videos.
    Books and blogs only make sense after I watch your videos!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @srishtikumar5544
    @srishtikumar5544 4 ปีที่แล้ว +1

    Excellently explained! I really like your simple, clear, concise explanation - those 3 factors make a world of difference. And, great animations.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Awesome, thank you!

  • @OnSightNoMore
    @OnSightNoMore 4 ปีที่แล้ว +7

    It's impressive how you managed to explain the essential concepts of this chain of algorithms in such a clear way! I'm sharing this video with my beginner fellows, who normally flee as soon as I say words like nearest-neighbor or stochastic.
    Thank you very much!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

    • @willykitheka7618
      @willykitheka7618 2 ปีที่แล้ว +2

      🤣🤣🤣🤣it's that terrifying?!? Barbara Oakley in her book, "a mind for numbers" called them zombies🤣🤣🤣

  • @ImmutableHash
    @ImmutableHash 6 ปีที่แล้ว

    Awesome explanation, thank you so much! I read a few papers/books multiple times and barely have a clue, but with your vid I understand the concept just by watching it once!

  • @atakanekiz
    @atakanekiz 5 ปีที่แล้ว +269

    Great explanations! Can you please do a video explaining UMAP and potentially how it compares to t-SNE? Thanks!

  • @nikhilgoparapu8183
    @nikhilgoparapu8183 3 ปีที่แล้ว +2

    Very clearly explained!
    Loved the way you explained such a complicated concept so intuitively.
    Thank you.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @chauphamminh1121
    @chauphamminh1121 5 ปีที่แล้ว

    You make a complex idea becomes so simple and understanding ! Great video. Thanks a lot

  • @octour
    @octour 5 ปีที่แล้ว

    Thanks for such a clear explanation. You know, your channel already in the top list for me and very soon I'll watch all your videos..

  • @lilmoesk899
    @lilmoesk899 6 ปีที่แล้ว

    Great as always. I've heard of t-SNE before, but this was my first real introduction to it. Definitely want to go look at some more resources now.

  • @snackbob100
    @snackbob100 4 ปีที่แล้ว +4

    Josh, i literally love your videos, they are really helping me get through my ADV CS degree. I am going to buy one of your shirts, and wear it on campus as a thank you!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      That would be awesome!!! Thank you very much! :)

  • @parvezrafi4098
    @parvezrafi4098 6 ปีที่แล้ว

    Thanks a lot. I really struggled to understand the concept first time I came across it in a book. Your video helped a lot. Great job!

  • @DoanQuocHoan
    @DoanQuocHoan 3 ปีที่แล้ว +3

    I was so confusing about t-SNE until I watched this. It's clear and very easy to understand! Thank you! Like your BAM. :D

  • @bright1402
    @bright1402 6 ปีที่แล้ว

    This is the best video for t-SNE that I have ever seen. Thanks a lot, man

  • @sagar_bro
    @sagar_bro 4 ปีที่แล้ว +4

    I just love the way you start all your videos! Stat-Questtttttt :)

  • @abhaymathur9332
    @abhaymathur9332 5 ปีที่แล้ว

    this is such an awesome explanation of tsne that i dont need to watch any other video or read any other website/book. I dont think there can be a better explanation. Superlike.

  • @shanthinagasubramanian2866
    @shanthinagasubramanian2866 2 ปีที่แล้ว +1

    Very nice way of teaching ! ML concepts CLEARLY EXPLAINED and BAM adds lot of curiosity in the videos :) Thanks for your videos. And not to forget your songs are really nice :)

  • @jannelis2845
    @jannelis2845 5 ปีที่แล้ว +2

    Very well explained ! Your video was recommended to us by our professors at ETH-Zürich.:)

  • @Ravi5ingh
    @Ravi5ingh 4 ปีที่แล้ว +1

    It's rare to come across such a brilliant explanation.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @RezaRob3
    @RezaRob3 4 ปีที่แล้ว +3

    I'm writing this comment while having watched only half way into this video, which is pretty unusual for me!
    It is so clearly explained! I once glanced at the t-SNE paper and didn't understand it. If this is what it does then this is how things like this should be explained!
    Really, we need people explaining science like this! It's possible to read scientific papers, but what they fail to do is properly communicate the core idea to the reader so that the reader quickly grasps the big picture and the intent of the mathematical details without getting lost in the details.
    Frequently, even a missing definition can make reading papers much harder for non experts.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      I'm glad you liked this video so much! :)

  • @scifimoviesinparts3837
    @scifimoviesinparts3837 3 ปีที่แล้ว +1

    The Best tutorial and explanation for TSNE so far! It's of great help! Thanks a lot!

  • @redaaitouahmed8250
    @redaaitouahmed8250 4 ปีที่แล้ว +2

    Super Mega BAM !! So great at what you do as always ... Tons of love sent your way ! Keep up the amazing work :D

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thanks so much!!

  • @sarangak.mahanta6168
    @sarangak.mahanta6168 2 ปีที่แล้ว +1

    The only educational channel which brings a smile to my face.

  • @pierrefoidart5368
    @pierrefoidart5368 3 ปีที่แล้ว

    Thanks a lot!! These videos are much more clear than any article!
    A video explaining UMAP (related to t-SNE) would be awesome !

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'm working on UMAP. For now, however, know that it is almost 100% the same as t-SNE. The differences are very subtle.

  • @benw4361
    @benw4361 5 ปีที่แล้ว +1

    Love the vid. I was wondering how tsne works and you broke it down great and the explanation for the t distribution was short and to the point.

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      Thank you! :)

  • @saiakhil4751
    @saiakhil4751 3 ปีที่แล้ว +3

    Why I couldn't stop bamming the like button??!! You're the best Josh!!

  • @camilaarcu2254
    @camilaarcu2254 3 ปีที่แล้ว +1

    You are incredible, Josh Starmer!! I loved this

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @simonandrews5604
    @simonandrews5604 5 ปีที่แล้ว

    Incredibly helpful and well presented. Thank you.

  • @markcoffer9290
    @markcoffer9290 5 ปีที่แล้ว +1

    Well done! I would love to see videos on handling data outliers for regressions. Thanks!

  • @soumitachel3844
    @soumitachel3844 3 ปีที่แล้ว +1

    Hello Josh, thank you for coming with such incredible videos. Data scientist’s life becomes easy.😬

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

    • @soumitachel3844
      @soumitachel3844 3 ปีที่แล้ว

      StatQuest with Josh Starmer Hi a request to do a tutorial of UMAP.

  • @NirajKumar-hq2rj
    @NirajKumar-hq2rj 6 ปีที่แล้ว

    excellent explanation , this intuition helps to follow maths behind t-SNE

  • @goeCK
    @goeCK 5 ปีที่แล้ว +2

    Came here for understanding the t-SNE plots used in single cell transcriptomics - which I finally did, thanks! Overall, you helped me out already plenty of times!
    To display cells in during cell fate transition/acquisition e.g. different time points during neurodevelopment, often pseudo-temporal ordering is used.
    Since scRNA seq is becoming more and more popular, this might be a good next topic

    • @erazael
      @erazael 5 ปีที่แล้ว

      Same here, and I did not expect to understand so fast and clearly!

  • @abarnaabalakrishnan1862
    @abarnaabalakrishnan1862 6 ปีที่แล้ว

    VERY CLEAR EXPLANATIONS :) THANK YOU FOR ALL YOUR VIDEOS

  • @imamalva5603
    @imamalva5603 3 ปีที่แล้ว +1

    you are the hero, keep explaining complex thing into simple. thankss

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @UxJoy
    @UxJoy 4 ปีที่แล้ว +1

    Dude this is super clear. Love the content! BAM

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @nanopore-sequence
    @nanopore-sequence 4 ปีที่แล้ว +5

    I am a student in Japan.
    I'm not good at English, but it was very easy to understand and I learned a lot:)

  • @vishnumuralidharan9858
    @vishnumuralidharan9858 ปีที่แล้ว +2

    Hi Josh, I can't thank you enough for how much I have benefitted from your videos even though I do data science as part of my day job. Thank you so much for sharing your knowledge!
    One request for a video: could you do a video of when to use which methods / models in a typical data science problem? Much appreciated.

    • @statquest
      @statquest  ปีที่แล้ว +1

      That's a great idea.

  • @p.b.3697
    @p.b.3697 4 ปีที่แล้ว +1

    Thank you very much Josh . You made it easier to understand.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Hooray! I'm glad the video was helpful! :)

  • @axeleriksson8978
    @axeleriksson8978 6 ปีที่แล้ว +46

    Hey, love your videos!
    Just a typo but it should be 0.05 on the values to the right at 07:19. Confused me for a second so might clear things up for others.

  • @iamsiddhantsahu
    @iamsiddhantsahu 6 ปีที่แล้ว

    Nice explanation of t-SNE for beginners.

  • @somethingandapie
    @somethingandapie 5 ปีที่แล้ว +1

    Subscribed because that intro gave me life!

    • @statquest
      @statquest  5 ปีที่แล้ว

      Ha!!! Thanks! :)

  • @leixiao169
    @leixiao169 4 ปีที่แล้ว

    your explanation is very very good! thanks!!!

  • @thoniageo
    @thoniageo 3 ปีที่แล้ว +1

    i am a huge fan of this channel! greetings from brazil ^^

    • @statquest
      @statquest  3 ปีที่แล้ว

      Muito obrigado! :)

  • @veeek8
    @veeek8 ปีที่แล้ว +1

    Brilliant explanation, this has been bugging me all day, thank you!!

    • @statquest
      @statquest  ปีที่แล้ว

      Glad it helped!

  • @Tony-Man
    @Tony-Man 2 หลายเดือนก่อน

    Hi Josh, quality content! This channel continuously helps me to understand the idea behind so that the dry textbook explanations actually make sense. I still have a question. When you calculate the unscaled similarity score, how do you exactly determine the width of your guassian? I get it in the example that we already know the cluster. If I only want to visualize the data without having pre-defined clusters, what happens then?

    • @statquest
      @statquest  2 หลายเดือนก่อน

      I talk more about the details of t-SNE and how it works in my videos on UMAP: th-cam.com/video/eN0wFzBA4Sc/w-d-xo.html and th-cam.com/video/jth4kEvJ3P8/w-d-xo.html

  • @HR-yd5ib
    @HR-yd5ib 6 ปีที่แล้ว +19

    Excellent video! Perhaps you could add another video where you go through the actual algorithm and how the moves is actually computed.

  • @alvarovs89
    @alvarovs89 ปีที่แล้ว +1

    Just hear about t-SNE and I did not quite understand how it works so I crossed my fingers hoping that josh did a video of this and of course he did!! haha
    I have my popcorn ready to enjoy this video :)

  • @alexeilazarev9576
    @alexeilazarev9576 5 ปีที่แล้ว +1

    Amazing explanation! Thank you!

  • @precisionimmunologyincubat2315
    @precisionimmunologyincubat2315 4 ปีที่แล้ว +2

    Thank you so much! Right now everyone in our department (Systems Genetics at NYU Langone) is using UMAP. There aren't many great videos about it - it would be awesome if you could help us understand what all the hype is about!

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      UMAP is on the to-do list. I hope to get to it in the spring.

  • @deepika3389
    @deepika3389 3 ปีที่แล้ว +1

    Kudos, I understood so effortlessly....tripple BAM!!!

  • @hulaalol
    @hulaalol 3 ปีที่แล้ว +1

    thank you so much for this nice explanation. will help me a lot in my exams

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad to hear that!

  • @mic9657
    @mic9657 ปีที่แล้ว +1

    Amazing work! perfectly explained!!!

  • @reedayoungblood
    @reedayoungblood 3 ปีที่แล้ว +2

    Great video - thank you! One small insertion that I think would improve it: at ~2:07, right after showing what projecting on to the X or Y axis would look like, show one more example of projecting onto an arbitrary line to try to retain as much variance as possible (basically PCA). I think this could be done in 15-20 seconds, and would be helpful in comparing t-SNE to one of its most popular alternatives, which is helpful in deciding *when* to use an algorithm - one of the hardest things for beginners like myself.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thanks for the tip!

  • @carlosalfonso5829
    @carlosalfonso5829 6 ปีที่แล้ว +1

    OH God, this is a great explanation, as Radel mention below, it would be nice to have an extended video of the algorithm as the one from PCA!!

    • @statquest
      @statquest  6 ปีที่แล้ว

      Thank you! Yes, one day I'll break the actual equations down and do "step-by-step" explanation of t-SNE.

    • @niteshturaga
      @niteshturaga 6 ปีที่แล้ว

      Looking forward to this.

  • @kpuano
    @kpuano 6 ปีที่แล้ว

    Very clear explanations, thanks a lot!

  • @abcdefghi2650
    @abcdefghi2650 2 ปีที่แล้ว +1

    Great videos! Great channel! Big thumbs UP!

  • @daivazian
    @daivazian 5 ปีที่แล้ว +1

    Fantastic explanation and comments. Thanks so much!

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you!! I'm glad you like the video. :)

  • @Elmirgtr
    @Elmirgtr 5 ปีที่แล้ว +6

    Your speak like Kevin from The Office. Great explanation, thanks a lot:)

  • @Chinukapoor
    @Chinukapoor 5 ปีที่แล้ว

    Great Video. Thanks StatQuest! One question, in your computation of the similarity scores, you have taken a normal distribution etc. But would be correct if the computation is done through Euclidean distance or some other distance formulae?

    • @autodidactengineer5073
      @autodidactengineer5073 4 ปีที่แล้ว

      The distance between any two vectors can be calculated by taking the length/magnitude/euclidian norm of the difference of two vectors. This is the same as the distance formula in R^2. For example, here are 2 points (1, 3) and (5,2). The difference in the x axis is -4 and in the y axis 1. The distance is the square root of the sum of squares; sqrt((-4)^2+1^2)=sqrt(17). This can be generalized to R^n by defining two R^n vectors a and b. distance = ||a - b|| = ||b - a|| = sqrt( (a_1 - b_1)^2 + … + (a_n - b_n)^2 )
      Note that when these distances are plotted on a normal distribution. There won’t be a value which would be lower than the mean since the distance is the absolute value of the relative distance between two points. This results in only points to the right of the mean and having points in different clusters not close to each other falling close to each other on the number line since their absolute distances are the same. A signed distance can be computed by doing a - b or b - a without squaring the differences nor taking its square root.
      The minor difference of getting a negative or positive signed “distance” doesn’t change how the distance will be used apart of t-SNE. This is because the points being plotted on a number line is ultimately a means to getting the likelihood of the data point when a fixed point is at the center. The normal distribution is symmetrical to the left and right of this mean, so the y-axis or likelihood value will be the same regardless of whether a point is -1 or 1 units away from the mean.

  • @SomeOfOthers
    @SomeOfOthers 5 ปีที่แล้ว

    Awesome! You've done a lot of videos on dimensionality reduction, and I'm wondering if there is any criterion for which dimensionalty reduction algorithm to use?

    • @TheBjjninja
      @TheBjjninja 5 ปีที่แล้ว

      My initial guess would be performance differences between each of them on huge amount of data. tSNE is not super efficient. Takes a while to run. my theory would be PCA is faster.

  • @Underscore_1234
    @Underscore_1234 3 ปีที่แล้ว

    Super clear. Is the small move carresponding to some learning rate multiplying the gradient of some distance between the expected distance matrix and the one we have?

    • @statquest
      @statquest  3 ปีที่แล้ว

      I believe so.

  • @cofud90
    @cofud90 5 ปีที่แล้ว +1

    A great one! Easy to follow easy to understand ❤️ i love the BAM ahahahahah

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you!!!! :)

  • @teresitaeyzaguirre4741
    @teresitaeyzaguirre4741 ปีที่แล้ว

    hey Josh! great video as always. Is it necessary to normalize or scale the data before applying this algorithm?

    • @statquest
      @statquest  ปีที่แล้ว

      I'm not sure. In theory, no, but in practice, PCA is usually used as a first pass to remove noise, and PCA requires things to be on the same scale.

  • @rafael.feitosa
    @rafael.feitosa 5 ปีที่แล้ว

    Great explanation! Congratulations!

  • @drvivekverma
    @drvivekverma 6 ปีที่แล้ว

    Hi John! Your videos are amazing! Could you please explain SPADE and SOM algorithms? Thanks!

  • @prateekyadav7679
    @prateekyadav7679 3 ปีที่แล้ว

    I never thought I'd not understand a statquest video! :(

    • @statquest
      @statquest  3 ปีที่แล้ว

      Bummer. What time point was confusing?

  • @petersu4869
    @petersu4869 3 ปีที่แล้ว +2

    "Bam, I made that terminology up" :D :D , great vid, keep up the good work.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thanks! 😁

  • @chaitanyakulkarni243
    @chaitanyakulkarni243 2 ปีที่แล้ว +2

    Wish I could *Triple Bam* like this video! Such a simple explanation. Thanks a lot Josh :-)

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Glad you liked it!

  • @MariyaMardamshina
    @MariyaMardamshina 4 ปีที่แล้ว

    Thanks a lot for your videos! The explanation is simple and clear. I was wondering whether you could do a similar video for Topological Data Analysis? Thank you :)

  • @janiobachmann5029
    @janiobachmann5029 6 ปีที่แล้ว

    Thank you Joshua for this amazing video! I just want to make sure I understand, so basically t-SNE determines the clusters in the high dimensional data then takes it "randomly" to the lower dimensional data and it tries to follow the patterns made in the high dimensional data. The only thing I was a bit confused at the end is what does the algorithm do at each step that it moves the point closer to the other clusters? (where you show the two matrix) The matrix to the left was all mixed up and the one to the right was organized due to the scaling scores we previously did. So how does the matrix to the left (lower dimension(1-D)) learns from the matrix to the right (higher dimension (2-D))? I hope you understood me but anyways Joshua best video on t-SNE in youtube. Thanks for sharing!

    • @janiobachmann5029
      @janiobachmann5029 6 ปีที่แล้ว

      Thanks Joshua! Now it makes perfect sense to me thanks for giving the explanation. I really find this algorithm useful for finding clusters and I just find fascinating how it determines groups of clusters. Again, thanks for your quick response to my question!

  • @kalkikukreja5193
    @kalkikukreja5193 6 ปีที่แล้ว

    Thanks for the videos Joshua. Could also explain - Manifold learning and Markov models in your videos? Thanks

  • @zhenyishen5102
    @zhenyishen5102 6 ปีที่แล้ว

    Great video! This video helped me a lot

  • @rajarajeshwaripremkumar2842
    @rajarajeshwaripremkumar2842 3 ปีที่แล้ว

    Awesome explanation. Thank you, sir. Quick question - if two clusters are placed close to each other does that mean they are similar than a cluster place far away from it?

    • @statquest
      @statquest  3 ปีที่แล้ว

      That's the idea.

  • @yuqima1194
    @yuqima1194 4 ปีที่แล้ว

    Hi Josh! Thank you for this video! It makes so much more sense now. But I'm still confused why the normal distribution was used in the 2D graph, but a t-distribution for the line? How do you know when to use which?

    • @statquest
      @statquest  4 ปีที่แล้ว

      That's just the way it is and it seems to work.

  • @RajeshSharma-bd5zo
    @RajeshSharma-bd5zo 3 ปีที่แล้ว +1

    One word reaction after watching this video --> AWESOME!!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you so much 😀!

  • @vikramreddy3699
    @vikramreddy3699 3 ปีที่แล้ว

    Thank you Josh . I love the way you present concepts with simple examples.
    Could you please explain how you decided the red dot directions to the left, where as the orange on right side @5:30 ?

    • @statquest
      @statquest  3 ปีที่แล้ว

      It doesn't matter what side of the curve the points are on, since the distance from the y-axis values on the curve will be the same (normal curves are symmetrical). However, in order for the points to be easily seen, I spread them out on different sides rather than piling them all up on top of each other.

    • @vikramreddy3699
      @vikramreddy3699 3 ปีที่แล้ว +1

      @@statquest Thank you again

  • @BusinessScience
    @BusinessScience 5 ปีที่แล้ว +2

    Hey, love your videos! We are actually using it to help explain key concepts in our application-focused courses. I'd love to see UMAP (similar to t-SNE), which is a bit more scalable.

    • @statquest
      @statquest  5 ปีที่แล้ว +3

      Thank you so much! It's on the to-do list. :)

    • @BusinessScience
      @BusinessScience 5 ปีที่แล้ว +1

      @@statquest Awesome! I'm using your content in my courses - Students love it. PCA, K-Means, & t-SNE. Will be using your ML videos as well. Your explanations are the best!

  • @pabloruiz577
    @pabloruiz577 5 ปีที่แล้ว +1

    Hi @StatQuest with Josh Starmer, great video!
    The thing I am missing is what is happening in each of this steps to move each point. What are the 'attract' and 'repel' real values and how they are use to make the Similarity Matrices closer each of these steps?

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      This is a good question. The actual math is a little too messy to put in this comment, however, the idea is that the matrices are made similar using Gradient Descent, and that's where the attractions and repulsions come in. Here's a quote from the original paper (the link to the paper comes after the quote):
      Physically, the gradient [ minimized by gradient descent ] may be interpreted as the resultant force created by a set of springs between the [low dimensional point A] yi and all other [low dimensional points] yj. All springs exert a force along the direction (yi −y j). The spring between yi and y j repels or attracts the map points depending on whether the distance between the two in the map is too small or too large to represent the similarities between the two high-dimensional datapoints. The force exerted by the spring between yi and y j
      is proportional to its length, and also proportional to its stiffness, which is the mismatch (p j|i −qj|i + pi| j −qi| j) between the pairwise similarities of the data points and the map points.
      Here's the link to the paper: www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

  • @Bedivine777angelprayer
    @Bedivine777angelprayer ปีที่แล้ว +1

    Thanks really great videos understood concepts so well

    • @statquest
      @statquest  ปีที่แล้ว

      Glad it was helpful!

  • @sathias7667
    @sathias7667 5 ปีที่แล้ว +1

    John Starmer,
    This video impressed a lot.
    I would like to know how to ensure whether all similar points are grouped perfectly or any spill out by mistake.
    Kindly suggest

    • @statquest
      @statquest  5 ปีที่แล้ว

      You ask a good question and I'm not sure I know the answer. I do know that t-SNE will often run for as many iterations as you let it run, meaning that it does not converge on an optimal clustering. So, in that case, you could say that the clustering is never perfect.

  • @techwellness6142
    @techwellness6142 4 ปีที่แล้ว +1

    Excellent Explaination. Tripple BAM !!!

  • @tuongminhquoc
    @tuongminhquoc ปีที่แล้ว +1

    Thank you. I am not sure if you remember me from the PCA video. I have a job now. My job do not have high salary, but I could now support you by donating and thank you now. 😊

    • @statquest
      @statquest  ปีที่แล้ว +1

      WOW! Thank you so much. And congratulations on getting a job!!! HOORAY!!! TRIPLE BAM! :)

    • @tuongminhquoc
      @tuongminhquoc ปีที่แล้ว +1

      @@statquest Keep doing great work sir! Also, it would be great if you could make a video about the comparation between clustering methods. 😁

    • @statquest
      @statquest  ปีที่แล้ว +1

      @@tuongminhquoc Thanks and I'll keep that in mind!

  • @MathPhysicsFunwithGus
    @MathPhysicsFunwithGus 9 หลายเดือนก่อน +1

    This is a great explanation thank you!

    • @statquest
      @statquest  9 หลายเดือนก่อน +1

      Glad you enjoyed it!

  • @Thorstlasse
    @Thorstlasse 3 ปีที่แล้ว

    Thanks for the video - awesome work. I don't get where you get the width of the normal distribution from? How do you know that it is 1 for the tight cluster and 2 for the wide cluster?

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      You can estimate that from the data.

  • @ZeusHugo
    @ZeusHugo 3 ปีที่แล้ว

    Thank you very much for these (and all the other) videos, they are very helpful to understand the basics. Regarding t-SNE, the only think I do not fully understand is the use of a t-distribution to model the scores of the points in the low-dimensional space. Why "without it the clusters would all clump up in the middle and be harder to see"? Is it due to the fact that the bell of the normal distribution is higher in the middle?

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      It has to do with the t-distribution having fatter tails - more wiggle room for variation.

  • @txhays
    @txhays 4 ปีที่แล้ว

    Thanks, love the channel. Question: since the t-SNE involves randomly plotting data on a number line, will identical plots be created if the function is repeated? i.e. could you get different looking t-SNE plots on repeated analyses?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Unless you set the seed for the random number generator, you will get a different graph every single time.

  • @Kmysiak1
    @Kmysiak1 4 ปีที่แล้ว +1

    This explanation almost makes tSME sound like a clustering technique not a reduction technique..... That said, this was by far the best explanation I've heard to date.

    • @statquest
      @statquest  4 ปีที่แล้ว

      That's a good observation. In many ways t-SNE is a hybrid method that reduces dimensions by clustering.

    • @Kmysiak1
      @Kmysiak1 4 ปีที่แล้ว +1

      @@statquest Now if you can explain how to interpret a tSME plot. This would help immensely as it's virtually impossible to determine the correct perplexity number without understanding how to interpret the plot. This seems like one of those "blackbox" methods which we just trust. Keep up the great work!

  • @henkhbit5748
    @henkhbit5748 3 ปีที่แล้ว

    Great, never heard of t-sne. Is it only for visualisation purposes? In PCA points are clustered based on the correlations. Do you have also an analogy with the loadings of the features like in PCA?

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      It's only for visualization. It does not have the equivalent of loading scores in PCA.

  • @ShubhamMajmudar
    @ShubhamMajmudar 5 ปีที่แล้ว +1

    Made it look real simple.. thanks!

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      Hooray! :)

  • @aagupta1993
    @aagupta1993 5 ปีที่แล้ว +1

    Hi, your videos are really nice. If possible can you upload videos with some math formulas related to this topic and maybe others as well and some sample problems and solutions to them, that would really help us understand even better. Thank you!

    • @statquest
      @statquest  5 ปีที่แล้ว

      No problem. I have plans to make a video about the math behind t-SNE on my To-Do list.