- 161
- 383 233
Herman Kamper
เข้าร่วมเมื่อ 8 เม.ย. 2020
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
Can we solve inequality in South Africa? Interview with Dieter von Fintel (TGIF 2024)
มุมมอง: 137
วีดีโอ
Reinforcement learning from human feedback (NLP817 12.3)
มุมมอง 244หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ PPO theory: th-cam.com/video/3uvnoVjM8nY/w-d-xo.html Proximal policy optimization explained: th-cam.com/video/HrapVFNBN64/w-d-xo.html
The difference between GPT and ChatGPT (NLP817 12.2)
มุมมอง 118หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/
Large language model training and inference (NLP817 12.1)
มุมมอง 179หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/12_llm_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOVLRdimL3lS9F_33fzh9jU.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's LLM video: th-cam.com/video/zjkBMFhNj_g/w-d-xo.html Byte pair encoding: th-cam.com/video/20xtCxAAkFw/w-d-xo.html Transformers: th-cam.com/play/PLmZlBIcArwhOPR2s-FIR7WoqNaBML233s.html
Extensions of RNNs (NLP817 9.7)
มุมมอง 802 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Andrej Karpathy's blog: karpathy.github.io/2015/05/21/rnn-effectiveness/
Solutions to exploding and vanishing gradients (in RNNs) (NLP817 9.6)
มุมมอง 522 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdfFull playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSjiCourse.html website: www.kamperh.com/nlp817/ Gradient descent: th-cam.com/video/BlnLoqn3ZBo/w-d-xo.html Colah's blog: colah.github.io/posts/2015-08-Understanding-LSTMs/
Vanishing and exploding gradients in RNNs (NLP817 9.5)
มุมมอง 842 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: th-cam.com/video/xOx2SS6TXHQ/w-d-xo.html
Backpropagation through time (NLP817 9.4)
มุมมอง 1792 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/ Vector and matrix derivatives: th-cam.com/video/xOx2SS6TXHQ/w-d-xo.html Computational graphs for neural networks: th-cam.com/video/fBSm5ElvJEg/w-d-xo.html Forks in neural networks: th-cam.com/video/6mmEw738MQo/w-d-xo.html
RNN definition and computational graph (NLP817 9.3)
มุมมอง 952 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
RNN language model loss function (NLP817 9.2)
มุมมอง 922 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
From feedforward to recurrent neural networks (NLP817 9.1)
มุมมอง 2373 หลายเดือนก่อน
Lecture notes: www.kamperh.com/nlp817/notes/09_rnn_notes.pdf Full playlist: th-cam.com/play/PLmZlBIcArwhOSBWBgRR70xip-NnbOwSji.html Course website: www.kamperh.com/nlp817/
Embedding layers in neural networks
มุมมอง 2343 หลายเดือนก่อน
Full video list and slides: www.kamperh.com/data414/ Introduction to neural networks playlist: th-cam.com/play/PLmZlBIcArwhMHnIrNu70mlvZOwe6MqWYn.html Word embeddings playlist: th-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html
Git workflow extras (including merge conflicts)
มุมมอง 1064 หลายเดือนก่อน
Full playlist: th-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
A Git workflow
มุมมอง 2814 หลายเดือนก่อน
Full playlist: th-cam.com/play/PLmZlBIcArwhPFPPZp7br31Kbjt4k0NJD1.html Notes: www.kamperh.com/notes/git_workflow_notes.pdf
Evaluating word embeddings (NLP817 7.12)
มุมมอง 2095 หลายเดือนก่อน
Full playlist: th-cam.com/play/PLmZlBIcArwhPN5aRBaB_yTA0Yz5RQe5A_.html Lecture notes: www.kamperh.com/nlp817/notes/07_word_embeddings_notes.pdf Course website: www.kamperh.com/nlp817/
Skip-gram with negative sampling (NLP817 7.10)
มุมมอง 5215 หลายเดือนก่อน
Skip-gram with negative sampling (NLP817 7.10)
Continuous bag-of-words (CBOW) (NLP817 7.9)
มุมมอง 1455 หลายเดือนก่อน
Continuous bag-of-words (CBOW) (NLP817 7.9)
Skip-gram as a neural network (NLP817 7.7)
มุมมอง 3845 หลายเดือนก่อน
Skip-gram as a neural network (NLP817 7.7)
Skip-gram model structure (NLP817 7.5)
มุมมอง 1615 หลายเดือนก่อน
Skip-gram model structure (NLP817 7.5)
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
มุมมอง 1426 หลายเดือนก่อน
What can large spoken language models tell us about speech? (IndabaX South Africa 2023)
Hidden Markov models in practice (NLP817 5.13)
มุมมอง 2019 หลายเดือนก่อน
Hidden Markov models in practice (NLP817 5.13)
Why expectation maximisation works (NLP817 5.11)
มุมมอง 1499 หลายเดือนก่อน
Why expectation maximisation works (NLP817 5.11)
Brown Paul Wilson Deborah Harris Shirley
2:59 actually the pseudo algo you are using is 0 index.
Wonderful Lecture. Thank you
Cuz world wars!
Allen Shirley Miller Elizabeth Thomas Linda
For AnyOne having any doubts in relation bt NLL abd Cross entropy . this is a must watch !!!
This helped a lot. Fantastic intuitive explanation.
Super happy that it helped! :)
Hall Margaret Jones Angela Wilson Larry
Great video series! The algorithm video was the one that finally got me to "get" DTW!
Lopez Sharon Davis George Taylor Laura
Your content is amazing !!!
Thanks Aditya!
So if I have a list of categorical inputs, where the order indeed imply their closeness, then I should not use one-hot encoding, but just use numerical values to represent the categories, is that right?
love this series! you explained the concepts really well and dive into details!
So super grateful for the positive feedback!!!
You look like Benedict Cumberbatch
The nicest thing that anyone has ever said!
Thanks for posting Herman, super insightful!
Thanks a ton for the feedback! :)
You are good
The nicest thing anyone has ever said ;)
Awesome, very great video!!
I am not a student at your university, but I am glad that you are such a good prof.
Very happy you find this helpful!! 😊
I learned a lot as an Azerbaijani student. Thanks a lot <3
Really great explanations. I also really like your calm way of explaining things. I get the feeling that you distill everything important before recording the video. Keep up the great work!
Thanks a ton for this!! I enjoy making the videos, but it definitely takes a bit of time :)
Thank you
bro just keep teaching, that is great!
These videos are sorely underrated. Your explanations are concise and clear, thank you for making this topic so easy to understand and implement. Cheers from Pittsburgh.
Thanks so much for the massive encouragement!!
Working in NLP myself, I very much enjoy your videos as a refresher of the current ongoings. Continuing from your epilogue, will you cover the DPO process in detail?
Thanks for the encouragement @Aruuuq! Jip I still have one more video in this series to make (hopefully next week). It won't explain every little detail of the RL part, but hopefully the big stuff.
your way of explanation is very good
Thomas 🤣
good sir
thank you very much professor.
One of the best explanations on PCA relationship with SVD!
Why is it prefered to solve the problem as minimize the cross entropy over minimize de NLL? Are there more efficient properties doing that?
Thank you, really great explanation, I think I can understand it now.
Thanks for lecture.
With regards to the clock analogy (0:48): "If you know where you are on the clock then you will know where you are in the input". Why not just a single clock with very small frequency? A very small frequency will guarantee that even for large sentences there will be no "overlap" at the same position in the clock for different positions in the input.
The best explanation!
Great explanation!! Thank you so much for uploading!
Great video. That meow from the cat though
Thanks ! great video
This is one of the better explanations of how the heck we go from maximum likelihood to using NLL loss to log of softmax. Thanks!
Great Explanation
Thank you
Sticking to a simple Git workflow is beneficial, particularly using feature branches. However, adopting a 'Gitflow' working model should be avoided as it can become a cargo cult practice within an organization or team. As you mentioned, the author of this model has reconsidered its effectiveness. Gitflow can be cognitively taxing, promote silos, and delay merge conflicts until the end of sprint work cycles. Instead, using a trunk-based development approach is preferable. While this method requires more frequent pulls and daily merging, it ensures that everyone stays up-to-date with the main branch.
Thanks a ton for this, very useful. I think we ended up doing this type of model anyway. But good to know the actual words to use to describe it!
It very clear explanation, thank you very much!
Does this algorithm work with negative instance? I mean can i use vectors with both negative and postive values?
Good explanation. Thank you Herman
Hello Herman, first of all a very informative video! I have a question: How are the weight matrices defined? Are the matrices simply randomized in each layer? Do you have any literature on this? Thank you very much!
This is a good question! These matices will start out being randomly initialised, but then -- crucially -- they will be updated through gradient descent. Stated informally, each parameter in each of the matrices will be wiggled so that the loss goes down. Hope that makes sense!
Great vid!
6:23 Your face need not be excused :)
:)
Had to basically learn git in 10 minutes and cook it down to 5 minutes for a group project at school - glad to something so visual and well explained (and code included!)
Wasn't sure this video was worth posting, so very happy this helped someone! :)
so in Q = XW, every single entry on the right side of this calculation needs to be learned?
Q, K and V are all populated with parameters all of which need to be learned?