- 161
- 435 219
Herman Kamper
เข้าร่วมเมื่อ 8 เม.ย. 2020
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
มุมมอง: 342
วีดีโอ
Reinforcement learning from human feedback (NLP817 12.3)
มุมมอง 5395 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ PPO theory: th-cam.com/video/3uvnoVjM8nY/w-d-xo.html Proximal policy optimization explained: th-cam.com/video/HrapVFNBN64/w-d-xo.html
The difference between GPT and ChatGPT (NLP817 12.2)
มุมมอง 2015 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/
Large language model training and inference (NLP817 12.1)
มุมมอง 2755 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's LLM video: th-cam.com/video/zjkBMFhNj_g/w-d-xo.html Byte pair encoding: th-cam.com/video/20xtCxAAkFw/w-d-xo.html Transformers: th-cam.com/play/PLmZlBIcArwhOPR2s-FIR7WoqNaBML233s.html
Extensions of RNNs (NLP817 9.7)
มุมมอง 1275 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's blog: karpathy.github.io/2015/05/21/rnn-effectiveness/
Solutions to exploding and vanishing gradients (in RNNs) (NLP817 9.6)
มุมมอง 1075 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdfFull playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSjiCourse.html website: www.kamperh.com/nlp817/ Gradient descent: th-cam.com/video/BlnLoqn3ZBo/w-d-xo.html Colah's blog: colah.github.io/posts/2015-08-Understanding-LSTMs/
Vanishing and exploding gradients in RNNs (NLP817 9.5)
มุมมอง 1545 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: th-cam.com/video/xOx2SS6TXHQ/w-d-xo.html
Backpropagation through time (NLP817 9.4)
มุมมอง 3485 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: th-cam.com/video/xOx2SS6TXHQ/w-d-xo.html Computational graphs for neural networks: th-cam.com/video/fBSm5ElvJEg/w-d-xo.html Forks in neural networks: th-cam.com/video/6mmEw738MQo/w-d-xo.html
RNN definition and computational graph (NLP817 9.3)
มุมมอง 2096 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
RNN language model loss function (NLP817 9.2)
มุมมอง 1966 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
From feedforward to recurrent neural networks (NLP817 9.1)
มุมมอง 4436 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
Embedding layers in neural networks
มุมมอง 4647 หลายเดือนก่อน
Full video list and slides: www.kamperh.com/data414/ Introduction to neural networks playlist: th-cam.com/play/PLmZlBIcArwhMHnIrNu70mlvZOwe6MqWYn.html Word embeddings playlist: th-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html
Git workflow extras (including merge conflicts)
มุมมอง 1277 หลายเดือนก่อน
Full playlist: th-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
A Git workflow
มุมมอง 3608 หลายเดือนก่อน
Full playlist: th-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
Evaluating word embeddings (NLP817 7.12)
มุมมอง 3498 หลายเดือนก่อน
Full playlist: th-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html Lecture notes: www.kamperh.com/nlp817/notes/07_word_embeddings_notes.pdf Course website: www.kamperh.com/nlp817/
Skip-gram with negative sampling (NLP817 7.10)
มุมมอง 1K8 หลายเดือนก่อน
Skip-gram with negative sampling (NLP817 7.10)
Continuous bag-of-words (CBOW) (NLP817 7.9)
มุมมอง 2568 หลายเดือนก่อน
Continuous bag-of-words (CBOW) (NLP817 7.9)
Skip-gram as a neural network (NLP817 7.7)
มุมมอง 6838 หลายเดือนก่อน
Skip-gram as a neural network (NLP817 7.7)
Skip-gram model structure (NLP817 7.5)
มุมมอง 2738 หลายเดือนก่อน
Skip-gram model structure (NLP817 7.5)
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
มุมมอง 15110 หลายเดือนก่อน
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Hidden Markov models in practice (NLP817 5.13)
มุมมอง 257ปีที่แล้ว
Hidden Markov models in practice (NLP817 5.13)
Why expectation maximisation works (NLP817 5.11)
มุมมอง 172ปีที่แล้ว
Why expectation maximisation works (NLP817 5.11)
At 5:55 when calculating negative log likelihood, is it base-10 log or natural log?
I'm not sure but in 12:12 I think using a FFN to transform would be very very cool. However I don't know whether it's better to put the net after the red vector then concat or concat then FFN.
11:55 at this moment when you finished the sentence I realized I am immensely enjoying math after years
one important question: in 2:51 you say ŷ₁ is a vector of probabilities. but isn't that just the word embedding that the model has predicted, that is then going to be superimposed to vocab size and then softmaxed to get the output word? or am I understanding it wrong?
wow. I am attending a god awful university for my bachelor's and subjects are explained in the most superficial way possible. Don't get me wrong, our professors are very kind and welcoming but the environment is not. especially with other students who are not interested in these subjects. Watching these I realize how badly I wanted to be in your classes xD
Thank you! the flow and explanation in this series is consice, informative and on-point!
Super useful, thanks!
SO happy this helps! :)
Got to learn many things about RNN Thanks
Thanks a ton for the kind message! :)
😘😘😘😘😘 Thank YOu SOo soo muchhhh
This is incredible, thank you for providing such high quality resources online for free. My university teacher could not do in 1 semester what you taught me in 1 video.
Thanks so much for the encouragement!!
why are we writng that k value in vec at 4:57, as our prediction will already contain some value at that point
greate explaintion the orignal image are so misleading
nice explanation sir
nice video. The Wikipedia link proved useful for my econometrics 101101 class
tried but failed again lol..thanks a lot
Your explanation just keeps getting better and better into the video, incredible job!
Thank you so much, this is the best explanation I have came across, I went through 10+ videos from popular instructors and institute but this was clear and through
great explanation
Sir which book is this
This is amazing!!!
17:35 What confuses me about this is, can we do the comparison to figure out if the same word is in the signal or if the both signals came from the same speaker? (IIrc the algo used for this is called DTW which is very similar to the Edit Distance algo)
Cool video, thanks.
Simple to understand. Thank you for writing the intermediate steps out. It really helps!
Great video but I still dont understand why you would have to use sin and cosine. You can just adjust the frequency of sin or cos and then get unique encodings and still maintain relative distance relationships between tokens. Why bother with sin and cosine? I know it has to do with the linear transform but I dont see why you cant perform a linear transform with cos or sin only.
Hey, I love your explanations and I use your TH-cam channel for machine learning almost exclusively. Could you please make a playlist to explain SVMs
So happy this helps! :) I should negotiate with my boss so I can make videos full time...
But how do I use polynomial regression when I have multiple points?
Bro this is like the best video I have seen on this topic
Thanks so much! Really appreciate! :)
Please do one on hierarchical clustering
Your lectures are amazing! Could you do some videos on hierarchical clustering?
Thanks so much for the encouraging message! :) I wish I had time to just make lecture videos... But hierarchical clustering is high on the list!
@@kamperh Thankss!!
K,Q,V is one of those concepts that will go into the history of Computer Science as one of the most unfortunate metaphors ever.
Great
Great lecture Prof. May I ask what are d=6 and d=7 here? Is it the embedding dimension? If so, for d=6, we should be having 3 pairs of sine-cosine waves right?
Hey Arnab! Sorry if this was a bit confusing. No, d=6 is the 6th dimension of the positional embedding. The embedding dimensionality itself will typically be the dimensionality of the word embeddings. If you jump to 14:00-ish, you will see the complete positional embedding. The earlier plots would be one dimension of this (when I should d=6, that would be the 6th dimension within this embedding).
Thanks so much for the prompt clarification Prof!
And since B is the max element, this justifies the interpretation of the log-sum-exp as a 'smooth max operator'
Really well explained, thanks.
Great video.
Drake Forges
I'm confused about the deriative of a vector function at 5:40, i think the gradient of a function f:Rn→Rm should be a matrix of size m×n. not sure about it
2:59 actually the pseudo algo you are using is 0 index.
Wonderful Lecture. Thank you
Cuz world wars!
For AnyOne having any doubts in relation bt NLL abd Cross entropy . this is a must watch !!!
This helped a lot. Fantastic intuitive explanation.
Super happy that it helped! :)
Great video series! The algorithm video was the one that finally got me to "get" DTW!
Your content is amazing !!!
Thanks Aditya!
So if I have a list of categorical inputs, where the order indeed imply their closeness, then I should not use one-hot encoding, but just use numerical values to represent the categories, is that right?
love this series! you explained the concepts really well and dive into details!
So super grateful for the positive feedback!!!
You look like Benedict Cumberbatch
The nicest thing that anyone has ever said!
Thanks for posting Herman, super insightful!
Thanks a ton for the feedback! :)
You are good
The nicest thing anyone has ever said ;)
Awesome, very great video!!