41
56 404

How I Finally Understood Self-Attention (With PyTorch)

18:04

How I Finally Understood LLM Attention

5:43

How I Finally Learned to Think in 4D

2:39

how I built my first neural network in pytorch

5:34

How Neural Networks Work: Understanding Feedforward, Backpropagation, and the Training Loop

5:55

Building the Simplest Neural Network: A Perceptron Explained

5:19

How I'm Learning AI (as a Staff Data Engineer)

Feeling overwhelmed learning AI?
I’ve been there.
This is the map I wish I had when starting out.
As a staff data engineer working on an AI research team, I’ve spent the last six months diving into deep learning-and it hasn’t been easy. From application-level concepts like prompt engineering to advanced math at the core of neural networks, understanding how it all fits together can be daunting. That’s why I created this conceptual map to guide you through the levels of AI understanding.
In this video, I’ll break down:
The six levels of AI learning: From applications to advanced math, and how to decide which levels you need to focus on.
Key takeaways from my journey: What I learned at each level, who should specialize in them, and why it matters.
Essential resources: Courses, books, and videos that made a difference in my AI learning path.
Whether you're an aspiring AI researcher, data engineer, or just curious about AI, this map will help you cut through the noise and focus on what truly matters for your goals.
Key Takeaways:
* Don’t try to learn everything-focus on the levels that align with your goals.
* Start with a big-picture understanding before diving deeper into specific areas.
* AI is about augmenting humans, not replacing them.
Timestamps:
00:00 Intro
00:41 Big Picture Overview - How applications, models, and math connect.
01:27 Level 1: Applications - Basics of prompt engineering and AI tools.
03:06 Level 2: Modeling - Choosing and understanding AI models.
04:22 Level 3: Architecture - Neural networks, pre-training, and fine-tuning.
05:44 Level 4: Components - Diving into transformer blocks and self-attention.
07:13 Level 5: Mechanisms - Cutting-edge research and implementation.
08:04 Level 6: Math - The foundations behind AI algorithms.
08:40 Wrap-Up - How to use this map to focus your learning.
AI Learning Map with Resources: github.com/bitsofchris/deep-learning/blob/main/notes/map_of_ai_learning.md

มุมมอง: 534

วีดีโอ

How I Finally Understood Self-Attention (With PyTorch)

18:04

How I Finally Understood Self-Attention (With PyTorch)

มุมมอง 19K21 วันที่ผ่านมา

Understand the core mechanism that powers modern AI: self-attention.In this video, I break down self-attention in large language models at three levels: conceptual, process-driven, and implementation in PyTorch. Self-attention is the foundation of technologies like ChatGPT and GPT-4, and by the end of this tutorial, you’ll know exactly how it works and why it’s so powerful. Key Takeaways: * Hig...

5:43

How I Finally Understood LLM Attention

มุมมอง 28K28 วันที่ผ่านมา

Words are just points on many number lines that capture part of the meaning. Self-attention in large language models (LLMs) finally made sense when I visualized words as points in 12,000 dimensions-this mental model changed everything for me. Here’s what you’ll learn: How LLMs represent words in high-dimensional space to capture nuanced meanings. How self-attention updates word meanings dynamic...

2:39

How I Finally Learned to Think in 4D

มุมมอง 511หลายเดือนก่อน

Struggling to think about higher dimensions? When visualizing breaks down beyond 3D, there’s a simple way to make sense of 5D, 10D, or even 12,000D. This is what clicked for me: - Each dimension is it’s own feature that contributes to the final point. - Think of each dimension as it’s own number line. In this video, I explain how to stop forcing spatial reasoning and start thinking in terms of ...

how I built my first neural network in pytorch

5:34

how I built my first neural network in pytorch

มุมมอง 730หลายเดือนก่อน

Want to understand neural networks but feel overwhelmed by the math? In this video, I’ll walk you through the basics of neural networks using Python and PyTorch, with zero complex equations. As a data engineer, I know the challenges of approaching AI concepts without a research or mathematics background. That’s why this guide focuses on practical, high-level understanding. What You’ll Learn: - ...

How Neural Networks Work: Understanding Feedforward, Backpropagation, and the Training Loop

5:55

How Neural Networks Work: Understanding Feedforward, Backpropagation, and the Training Loop

มุมมอง 332หลายเดือนก่อน

How do neural networks work? In this video, we’ll break down the entire process, from how data moves through a network (feedforward) to how it learns and improves through backpropagation and training loops. Whether you’re a beginner in AI or a data engineer looking to understand the fundamentals, this video will give you a clear roadmap. What you’ll learn: - The structure of a neural network: n...

Building the Simplest Neural Network: A Perceptron Explained

5:19

Building the Simplest Neural Network: A Perceptron Explained

มุมมอง 268หลายเดือนก่อน

Learn the simplest form of a neural network: the perceptron. We'll explore how a single neuron, using a basic step activation function, can solve a fundamental logic problem. Using just one neuron with two inputs, we demonstrate how to compute an AND gate, showing how neural networks process binary outcomes with weighted inputs and a bias term. You’ll learn: * What a Perceptron is: The most bas...

What is a Neural Network? A simple 5 minute explanation.

5:49

What is a Neural Network? A simple 5 minute explanation.

มุมมอง 2602 หลายเดือนก่อน

In this video, I’ll explain the basics of neural networks in a simple, beginner-friendly way. We’ll look at how a neural network works, why it can be thought of as just a “magic box,” and how it learns to make predictions through training. If you’re new to AI, machine learning, or just curious about how neural networks process information, this video is for you. I’ll break down key concepts wit...

Impactful Listening & Effective Onboarding | Sophia Sithole, Founder Ofstaff

37:21

Impactful Listening & Effective Onboarding | Sophia Sithole, Founder Ofstaff

มุมมอง 242 หลายเดือนก่อน

In this episode, I talk with Sophie Sithole about her journey building Ofstaff, an AI-powered onboarding and performance management solution. We explore the challenges of effective employee onboarding, and get into a deeper discussion about customer development, active listening, and handling vulnerability in business. Key Lessons Effective Onboarding * Alignment and clear expectations between ...

I just built my first Neural Network: Here's my framework for learning in public

13:25

I just built my first Neural Network: Here's my framework for learning in public

มุมมอง 702 หลายเดือนก่อน

I recently joined a research team building time series Transformer models and have become infatuated with the field of deep learning. As a former trader, turned data engineer, I am now trying to understand the AI side of things. And this week I just hit my first significant milestone: building my first neural network from scratch, using no machine learning libraries. Today, I want to share this...

Domain Expertise and AI Tools for Data Analysts | Meghan Maloy, Staff Analytics Engineer

55:58

Domain Expertise and AI Tools for Data Analysts | Meghan Maloy, Staff Analytics Engineer

มุมมอง 332 หลายเดือนก่อน

Domain Expertise and AI Tools for Data Analysts | Meghan Maloy, Staff Analytics Engineer

Pilot Life, Basics of LLMs, and AI for Beginners | Greg Lettieri, Corporate Aviator

34:54

Pilot Life, Basics of LLMs, and AI for Beginners | Greg Lettieri, Corporate Aviator

มุมมอง 373 หลายเดือนก่อน

Pilot Life, Basics of LLMs, and AI for Beginners | Greg Lettieri, Corporate Aviator

Start your Second Brain: A Quick Guide for Staff Engineers

9:59

Start your Second Brain: A Quick Guide for Staff Engineers

มุมมอง 743 หลายเดือนก่อน

Start your Second Brain: A Quick Guide for Staff Engineers

Deploying AI Models at Google Scale | Eugene Weinstein, Engineering Director @ Google

46:53

Deploying AI Models at Google Scale | Eugene Weinstein, Engineering Director @ Google

มุมมอง 403 หลายเดือนก่อน

Deploying AI Models at Google Scale | Eugene Weinstein, Engineering Director @ Google

11:34

How To Be A Consistent Learner

มุมมอง 434 หลายเดือนก่อน

How To Be A Consistent Learner

Finding Opportunities and Maximizing Impact: A Staff Engineer's Framework | Quick Bits

11:57

Finding Opportunities and Maximizing Impact: A Staff Engineer's Framework | Quick Bits

มุมมอง 274 หลายเดือนก่อน

Finding Opportunities and Maximizing Impact: A Staff Engineer's Framework | Quick Bits

AI in the Classroom: From Teachers to Facilitators | Shawn Cryan, Educational Systems Coordinator

44:32

AI in the Classroom: From Teachers to Facilitators | Shawn Cryan, Educational Systems Coordinator

มุมมอง 244 หลายเดือนก่อน

AI in the Classroom: From Teachers to Facilitators | Shawn Cryan, Educational Systems Coordinator

Handling Work Stress with Awareness & Homework for Life | Quick Bits #2

10:21

Handling Work Stress with Awareness & Homework for Life | Quick Bits #2

มุมมอง 134 หลายเดือนก่อน

Handling Work Stress with Awareness & Homework for Life | Quick Bits #2

Augmented Intelligence for Engineers, Feynman Technique, FX Carry Trade | Quick Bits #1

12:51

Augmented Intelligence for Engineers, Feynman Technique, FX Carry Trade | Quick Bits #1

มุมมอง 224 หลายเดือนก่อน

Augmented Intelligence for Engineers, Feynman Technique, FX Carry Trade | Quick Bits #1

AI for Sales, Augmentation, and Learning | Account Executive, Ryan Burwell

41:34

AI for Sales, Augmentation, and Learning | Account Executive, Ryan Burwell

มุมมอง 74 หลายเดือนก่อน

AI for Sales, Augmentation, and Learning | Account Executive, Ryan Burwell

The Crowdstrike Incident: A developer mistake explained in 3 minutes

3:58

The Crowdstrike Incident: A developer mistake explained in 3 minutes

มุมมอง 1595 หลายเดือนก่อน

The Crowdstrike Incident: A developer mistake explained in 3 minutes

34 - AI for Real Life: Augmented Creativity, Robot Hockey, and storytelling

34:51

34 - AI for Real Life: Augmented Creativity, Robot Hockey, and storytelling

มุมมอง 365 หลายเดือนก่อน

34 - AI for Real Life: Augmented Creativity, Robot Hockey, and storytelling

Reading with AI - Hierarchical Summarization and Extraction using LLMs

3:59

Reading with AI - Hierarchical Summarization and Extraction using LLMs

มุมมอง 916 หลายเดือนก่อน

Reading with AI - Hierarchical Summarization and Extraction using LLMs

AI Game Development with my 5 Year Old, Working Game in 1 Prompt

10:47

AI Game Development with my 5 Year Old, Working Game in 1 Prompt

มุมมอง 896 หลายเดือนก่อน

AI Game Development with my 5 Year Old, Working Game in 1 Prompt

Invest Your Time for Maximum Impact with this framework

9:45

Invest Your Time for Maximum Impact with this framework

มุมมอง 666 หลายเดือนก่อน

Invest Your Time for Maximum Impact with this framework

Stop Niching Down: How to embrace your diverse interests and be yourself

8:31

Stop Niching Down: How to embrace your diverse interests and be yourself

มุมมอง 676 หลายเดือนก่อน

Stop Niching Down: How to embrace your diverse interests and be yourself

How to take Smart Notes to Build your Second Brain [Audio Only]

12:11

How to take Smart Notes to Build your Second Brain [Audio Only]

มุมมอง 275หลายเดือนก่อน

How to take Smart Notes to Build your Second Brain [Audio Only]

ความคิดเห็น

@HasnatAbdullahz 4 วันที่ผ่านมา
Do a roadmap video for Data Engineering also. And if you already have any please provide the link. TIA
@BitsOfChris 2 วันที่ผ่านมา
I do not have one yet, but thank you for the idea :)
@coastofkonkan 4 วันที่ผ่านมา
Great. Please use examples like How to make Thai Curry? ....so audience knows Curry then Thai then where to focus & where to provide attention.
@nmirza2013 5 วันที่ผ่านมา
great intuitive explanation
@shloktalhar3981 5 วันที่ผ่านมา
from where are you studying?
@BitsOfChris 5 วันที่ผ่านมา
Self study, I work as a data engineer on an AI research team
@shloktalhar3981 4 วันที่ผ่านมา
@BitsOfChris that's great..even I am studying data engineering..can you tell me what are the tools that you are working on ..I mean to say their are lots of tool but very few are used typical
@RK-fr4qf 7 วันที่ผ่านมา
Very good indeed. Keep doing what you are doing. Thanks.
@BitsOfChris 7 วันที่ผ่านมา
Appreciate hearing that, thank you!
@FinanceLogic 7 วันที่ผ่านมา
I agree with you that you seem dialed in. Nice video.
@BitsOfChris 7 วันที่ผ่านมา
Thank you :)
@babusivaprakasam9846 10 วันที่ผ่านมา
Excellent explanation. Appreciate the effort.!
@BitsOfChris 7 วันที่ผ่านมา
Thanks, happy to hear :)
@sythatsokmontrey8879 10 วันที่ผ่านมา
Ahh, this is good. It's so good.
@BitsOfChris 7 วันที่ผ่านมา
Happy to hear, thanks :)
@marcelo123456789lope 11 วันที่ผ่านมา
I would like to see more content like this. Sharing this with my company
@BitsOfChris 7 วันที่ผ่านมา
Appreciate that, thank you!
@dancar2537 11 วันที่ผ่านมา
sorry: i am confident i saw this idea of words related to each other far before the paper you mention. it is not a new idea. what i am not sure is that is what the brain does, meaning it does only that. i really like your explanation it is clear. wquery is the possibilities available for lightly wkeys is the possibilities available for the other words and w value is probably the other meanings possible for the same words in the sentence in other contexts. or something close to that. maybe i am wrong but it does not matter i am close. i don t see why this is necessary probably the brain just skips it. thx. bigger nn is all you need
@BitsOfChris 11 วันที่ผ่านมา
Yes it’s definitely not a new idea, never claimed it to be novel and not sure why it matters. :) I’m just sharing what I learned about self attention and trying to explain it as I understand it for beginners or folks new to the concept. To your point about the brain doing this- yes I think we do this without realizing. Agreed though too, in general it seems more data, more parameters, and/or more compute at inference time matter for performance of models. Thanks for watching!
@user-ug8qc6tr6b 12 วันที่ผ่านมา
Thank god! I found your channel, the best explanation I've ever seen
@BitsOfChris 12 วันที่ผ่านมา
Really happy you found it too! Thanks :)
@ak3728 12 วันที่ผ่านมา
Can you please create a whole video or complete playlist would be also fine Deep learning? As you already have a lot of videos out on the topic. Btw i really enjoy your videos cz its kinda "you explain it simply" cz you understand it well. As Einstein said.
@BitsOfChris 12 วันที่ผ่านมา
I do think it's time for a playlist - thank you for the suggestion, this might be the motivation I need to just do that :) And thank you for the compliment! I totally believe the "explain it simply" philosophy. Einstein & Feynman being heroes of mine who really embody that.
@ak3728 12 วันที่ผ่านมา
@@BitsOfChris I can relate cz I'm also struggling sart doing TH-cam to be precise it's been a waopping 7 years almost since the day I thought doing yt. But my Perfectionism won't let me start. Any advice for that might help me ( I genuinely need it ) if you could share your experience, how did you overcome the initial resistance to post the very first video also for Perfectionism. TLDR - If you could share any advice that helped you break the barrier?
@BitsOfChris 11 วันที่ผ่านมา
@ ⁠I hear you, getting started is the hardest part. I think you just have to accept that the first 10 videos aren’t going to be great and you need to just publish them. You can always delist later. Realize it’s a process where you just keep making each video a little bit better than the last and that you will NEVER be perfect. But if you never hit publish you won’t ever get started. Reddit has r/NewTubers which is a helpful community of people trying to get started. I’d say just get those first ten videos done as soon as you can, they can be about anything. It’s just a prerequisite for making “good” videos, to get started. Subscribe to you now, looking forward to your first :)
@ak3728 11 วันที่ผ่านมา
@@BitsOfChris Thank you so much it means a lot ❤️ I really needed to hear that - deep down I knew I'll have to post few bad videos initially! ( Intentionally in my context I'm Literally a perfectionist ) this is the reason I haven't been to get started from especially I spent the last two years 2023/2024 doing nothing but trying to do youtube. But I couldn't somehow! The perfection HOLD had me so strong.. but now I will do it - in fact I'm just going to do .. Thank You! Thanks A lot ❤️ You're Like A brother to me now. I'll Post at least one-video before 2024 end.
@BitsOfChris 11 วันที่ผ่านมา
@ happy to hear, looking forward to watching your journey unfold. It’s a constant balance for me between getting things done and making it good. The tradeoff between consistency and quality. I ebb and flow between them, for you it seems you were over indexed on quality preventing you from taking action. The book The War of Art by Steven Pressfield could be a good read for you, but the best thing would be to just hit publish :)
@ScottSummerill 12 วันที่ผ่านมา
Very nice. Think this is the freaking first time I understand it.
@BitsOfChris 12 วันที่ผ่านมา
Sometimes it just takes seeing something through a different lens, happy it helped!
@ADiddy-cq2wk 12 วันที่ผ่านมา
This was super intuitive!! The ability to focus on 5 dimensions and walk through a work like light that has multiple meanings that only becomes clear based on context was very helpful. Do you have a Twitter account or Substack I can follow?
@BitsOfChris 12 วันที่ผ่านมา
Appreciate that, glad it helped :) I'm mainly focused on TH-cam right now but I use bitsofchris.com Substack as my "hub" for all things.
@قيمنقعبود-ب2ل 12 วันที่ผ่านมา
Amazing my friend
@BitsOfChris 12 วันที่ผ่านมา
Thanks!
@ryparks 12 วันที่ผ่านมา
TH-cam algorithm! I like this. Find people that are wanting to understand AI, push these vids. Well done Chris! Stay focused. Keep publishing!
@BitsOfChris 12 วันที่ผ่านมา
Thank you for kind words of support :)
@قيمنقعبود-ب2ل 12 วันที่ผ่านมา
Excellent
@BitsOfChris 12 วันที่ผ่านมา
Thanks
@thederpydude2088 13 วันที่ผ่านมา
Given that there are less easily human interpretable dimensions or properties or whatever of a word, I feel like maybe if we could figure out what those dimensions mean and what they represent then that could be a way of learning about nuances in the meanings of words that maybe haven't even been considered 🤔 Anyone know if there's some kind of thing like this where you could find unique meanings or patterns of like a word that AI tech uncovered? I've heard about how, for example, people working with neural networks or something found that the (AIs? networks? systems? ig I'll just say) models picked up on totally unexpected patterns in their input data, with some examples like I think one model meant to find patterns between images of eyes and diseases also found patterns that'd help predict someone's gender based on the image of the eye. And I think another example involved an LLM developing a neuron for like positive or negative sentiment in parts of text, so it could be configured to create outputs of a certain sentiment and it could also be used to measure the sentiment of text. I've considered looking into stuff like this further because it seems so cool that models can just pick up on patterns that we don't haven't even consciously noticed, and I wonder if anyone might have some ideas for how we can learn from the actual patterns that AIs have essentially discovered in different kinds of input data. Anyone know topics involving stuff like this that I could look into further?
@BitsOfChris 12 วันที่ผ่านมา
Dude - this is getting into some really deep but fascinating territory. It's the balance between human interpretable and the emergent properties of these models. Some work to follow up on: Golden Gate Claude: www.anthropic.com/news/golden-gate-claude The field as a whole is called "mechanistic interpretability" where you are effectively reverse engineering neural networks. Please share anything you find - sounds like you are very interested in this angle, would love to learn more too :)
@thederpydude2088 11 วันที่ผ่านมา
@@BitsOfChris Oh cool 🤔 After my comment, I ending up coming across a new video from the Welch Labs channel about mechanistic interpretability, which kinda did seem like this idea I was thinking about, and now it sounds like that topic was indeed on the right track. The Golden Gate Claude thinking of the bridge as its own ideal form is hilarious btw xD Looks like there is sort of a whole entire field of interpretability within the overall scope of neural networks, and some links on the article seem to point to some deeper research on the topic. I'm not too intent on exploring it atm, but it's good to know that there is this stuff out there, and I imagine I would revisit it sometime in the future. Oh also I recall there was a 3b1b YT short about word vectors that kinda showed how models approximately represent concepts like [ woman - man ≈ aunt - uncle ] and even [ Hitler + Italy - German ≈ Mussolini ] lol. I didn't watch it but the short linked to a longer video about transformers explain visually, which might touch on some interesting sides of this topic as well if you're interested.
@thenextension9160 13 วันที่ผ่านมา
Subbed. Keep it coming.
@alhdlakhfdqw 14 วันที่ผ่านมา
really amazing explanation and visualization! thank you! subbed! :)
@BitsOfChris 12 วันที่ผ่านมา
Thank you for the kind words :)
@BitsOfChris 15 วันที่ผ่านมา
Thank you all for the feedback on this video! I just want to highlight a few things for transparency and completeness. I deliberately chose to simplify certain aspects of self-attention in this video to focus on conceptual clarity and make it approachable for beginners. For example: - I didn’t dive deeply into the query, key, and value matrices - I don't discuss causal masking (which ensures that when a model predicts the next word in a sentence, it only looks at the words that came before it and ignores anything that comes after). In fact, I do the opposite of this just to illustrate how context changes meaning. Going forward I will be sure to include these disclaimers and instructional shortcuts in the video itself rather than afterwards here as a comment. Thank you all for the feedback, it's been incredibly humbling and motivating to see :)
@timmygilbert4102 15 วันที่ผ่านมา
Im still bum out thar people aren't able to draw connection with old parser and chatbot, i strongly feel that knowing how the stanford parser or chatscript works is a great insight about how llm works. Llm would feel a lot less black boxy, because they improve and do not exactly replace.
@BitsOfChris 15 วันที่ผ่านมา
Hey thanks for the suggestion - I've never heard of the Stanford parser. Is this (nlp.stanford.edu/software/lex-parser.shtml) what you refer to? I agree - LLMs and neural networks in general are very black boxy. What do you mean by they improve and do not replace?
@timmygilbert4102 15 วันที่ผ่านมา
@BitsOfChris it's a bit long, so we will go through very simplistic illustration. Neural network are graph, that imply constraints in how they process data. So the question is, given immediate observation of LLM property, such as sequentiality, what shape should have a neural DAG to implement one instance of sequence, purely as a dag. You will find out that neuron can't, because its operations are commutative, order don't matter but it's needed to create sequence dependant operations like in language. By constructing manually a sequence in a DAG that respect ANN architecture, it reveal you need layer to encode basic sequences. If we turn ourselves to neuron, instead of considering them as magic statistic pixie faery, we can break it down into atomic operations and reason about them. Neuron do dot product with input vector, but that's not really saying much, let's go more basic, they pick an input and apply mul, that's a mask because it's in the range of 0-1, that first operation filter out data unneeded, then we sum the results, let's call it decibel, then with the bias we threshold through the activation function. Sounds trivial what's the big deal? Well it's about shifting the mental image to make it clearer, masking is equivalent to a bag of word solution aka set detection. Then the sum and activation implement logics on that set, aka a generalization of binary logics to set, with OR being ANY with low bias (any input will trigger) and AND being ALL (all input are needed) and a spectrum between the too being SOME. It's a shortcut to say that, but, chat script and the Stanford parser are bag of word solution, as they need a dictionary of words in classes or template. The difference being in how neural network encode atomic lexeme, ie not on a word basis but at token basis. But the fundamental are the same, the token vector encoding is just a black box representation of statistical proximity of words token, which is a type of bag of words, the difference is that bag of words have ad hoc relationship by virtue of being in a bag, statistic solving the creation part of the dictionary. If this true, we should be able to back port idea from LLM to regular chatbot by generating bag of word from token (it's loosely byte pair encoding), or using the class of ontology, like WordNet,as embedding and have a less nuanced LLM as proof. It also explain why we can quantized LLM down to 3 bit, 1 being set detection, -1 being anti set. Think of it like a naive character detector, pixel in the character need to be turned on for detection, pixel not part of the character turn down, that's two neuron for set and anti set, and a third neuron that takes the set and anti set and declare final solution, the bias of the set and anti set modulating sensibility to uncertainty, as seen in the generalized logics. It's more than just composition, it's circuitry, such as that looking at the semantics of a neuron is misleading, the same way that finding the addition transistor is misleading instead of looking for adder structure. That's a very cartoony presentation to fit in a comment, but I'll let you fill the blank for the complete picture. Here is an exercise, create a program that turn embedding into WordNet encoding, Aka human readable ontology, which will pick up word not encoded in that ontology.
@MudroZvon 15 วันที่ผ่านมา
Why do you do this? Are you working at OpenAI/Anthropic developing LLMs?
@BitsOfChris 15 วันที่ผ่านมา
I like making content as a way to learn things deeper - publishing something is a forcing function for me to make sure I understand something well enough to explain it. Personally, I find the field fascinating and recently joined an Applied AI Research team working on time series foundation models. My background is data engineering so I'm focused on learning the modeling side now :)
@MudroZvon 15 วันที่ผ่านมา
And how do you use AI to work with this? NotebookLM?
@BitsOfChris 15 วันที่ผ่านมา
Hey great question - Obsidian has a few community plugins that use various AI models (local or paid ones with your API key). Obsidian Copilot & SmartConnections are the two I use.
@greatgatsby6953 15 วันที่ผ่านมา
The number lines (to visualize higher dimensions) helps, greatly. I never thought of it like that and was struggling with dimensions greater then three!
@BitsOfChris 15 วันที่ผ่านมา
Glad it helped! :)
@gagandwaz6823 15 วันที่ผ่านมา
Very good explanation
@BitsOfChris 15 วันที่ผ่านมา
Thank you, happy to hear :)
@iam-wille-dev 16 วันที่ผ่านมา
I resonated with this. Now i want to be friends
@BitsOfChris 16 วันที่ผ่านมา
Let's be friends!
@iam-wille-dev 15 วันที่ผ่านมา
@ sure, I have lots to learn and share.
@BitsOfChris 15 วันที่ผ่านมา
Nice - just checking out your videos - looking nice. I need to figure out lighting/ production quality like that for my "talking head" style videos. Been doing faceless because it's easier for me to illustrate the AI concepts I'm recently sharing.
@iam-wille-dev 14 วันที่ผ่านมา
@@BitsOfChris and I have been watching yours because I want to document my ML journey and I need illustrations and diagrams. I’ll email you
@human_shaped 16 วันที่ผ่านมา
This really describes embeddings more than attention itself.
@BitsOfChris 16 วันที่ผ่านมา
Fair point- I think understanding embeddings is key to understanding self-attention. But then the scope of this video was more the conceptual understanding of what attention means rather than the implementation. Thank you for watching :) Is there anything additional you would have liked to see?
@human_shaped 15 วันที่ผ่านมา
@@BitsOfChris Well, it would have been useful if it explained attention instead of just saying it exists.
@BitsOfChris 15 วันที่ผ่านมา
I totally agree but sometimes it's helpful to not overwhelm with too much information. This video I deliberately chose to keep the scope small. I tried going deeper on attention in this video - I hope this one helps: th-cam.com/video/FepOyFtYQ6I/w-d-xo.html
@davide0965 16 วันที่ผ่านมา
It repeats the same phrases and words of the paper and other videos. Doesn't explain or add anything
@BitsOfChris 16 วันที่ผ่านมา
Thanks for your feedback! I'm not going to lie, this comment stings a little bit but I don't disagree. Sorry the video wasn't what you were looking for. I'm a data engineer who recently joined an Applied AI research team building time series foundation models. I use content to help me learn topics deeper. Right now, I'm an advanced beginner in these more nuanced AI topics and try to share what helped me as I learn. I hope you'll give the channel a second chance in the future. Anyway, have a great day!
@ehza 17 วันที่ผ่านมา
Thanks
@BitsOfChris 16 วันที่ผ่านมา
You're very welcome :)
@SkegAudio 17 วันที่ผ่านมา
it's paramount to have a great understanding of word2vec (aka word embedding vector) but even more important is understanding n-grams to have a grasp as to why word embeddings is such a significant advancement in nlp
@BitsOfChris 16 วันที่ผ่านมา
Great point about the progression from n-grams to word embeddings! While my work focuses more on time series data than NLP, these concepts have interesting parallels - like how we handle sequential dependencies in financial or climate data compared to text. Would love to hear your thoughts on how these foundational NLP concepts might apply to non-text sequences?
@superfliping 17 วันที่ผ่านมา
That's a great perspective. Thank You.
@BitsOfChris 16 วันที่ผ่านมา
Thanks for the kind words, happy it helped :)
@superfliping 16 วันที่ผ่านมา
@@BitsOfChris I am working on scaling Ai personal memories, so your slide scaling_factoring in my nowhere Nexus development really optimize the performance and output based on necessary settings for each self.dimensions memories embedded into subdimensions function generateLabels. Duel dialogue associated with each iteration in conversation to convert into recursive fractal dimensions into appropriate layers. later I had my first results with models. Was very good 👍
@punk3900 17 วันที่ผ่านมา
the fun part is when you try to understand queries, keys, and values...
@BitsOfChris 17 วันที่ผ่านมา
Honestly - that took me about a month of reading, implementing, and watching TH-cam. Here's the video I published after this one that goes into more detail: th-cam.com/video/FepOyFtYQ6I/w-d-xo.htmlsi=Gg9iUT-teDDkr7sD
@ehza 17 วันที่ผ่านมา
@BitsOfChris 17 วันที่ผ่านมา
Hi there!
@michael_gaio 17 วันที่ผ่านมา
subbed what tool are you using for your presentation ?
@BitsOfChris 17 วันที่ผ่านมา
Thanks :) I'm using Descript to record the screen and Excalidraw to make the visuals.
@robinjac4322 18 วันที่ผ่านมา
That was very helpful, thanks! I recently had to learn more about AI for my job and even though I'm generally somewhat informed, I lack a lot of depth in understanding AI at this point. This example paints a clear picture on how LLM's think, very cool!
@BitsOfChris 17 วันที่ผ่านมา
Happy to hear it helped! I'm in a similar boat, as a data engineer now supporting an AI research team this is world of AI is new to me too. I think the channel here is mainly me (as a engineer) sharing what I learn/ is helpful for other engineers needing to learn AI. Thanks for the comment :)
@hamadaag5659 18 วันที่ผ่านมา
Drawing in excalidraw makes this tons more impressive lol
@BitsOfChris 18 วันที่ผ่านมา
@@hamadaag5659 thank you ;) I’d love to learn another tool but sticking with a simple one has been a good constraint. Makes it easier for me to get something done.
@coastofkonkan 18 วันที่ผ่านมา
Chris, this is super cool. Keep up the good work. Conceptual things have tremendous impact on new learners.
@BitsOfChris 18 วันที่ผ่านมา
Thanks- appreciate hearing that. I'm with you- there's so much to learn but it's hard to tie the deep dives to something you can grasp. Glad this was helpful :)
@yassinebahou4088 19 วันที่ผ่านมา
Very good explanation
@BitsOfChris 18 วันที่ผ่านมา
Thank you :)
@GrowStackAi 19 วันที่ผ่านมา
AI: The silent force behind incredible breakthroughs 🔥
@BitsOfChris 18 วันที่ผ่านมา
Exciting times we live in.
@andydataguy 19 วันที่ผ่านมา
Bro this is the best explanation I've ever seen. Thank you!
@BitsOfChris 18 วันที่ผ่านมา
Thank you for the kind feedback, glad it helped :)
@AB-wf8ek 19 วันที่ผ่านมา
Thanks for the explanation. I've been thinking about how to better describe LLMs in general, and one angle I came up with is that it's more like a language calculator. Similar to how you wouldn't say a calculator understands arithmetic, it does arithmetic - LLMs don't understand language, they do language. I don't know if others would agree with that or not.
@BitsOfChris 19 วันที่ผ่านมา
I think that's an excellent analogy. I've heard that used a few times as well. And really - LLMs are just a tool. My feeling is most people who hear "AI" are really picturing the Hollywood version of AGI that is there to replace them. This I think turns a lot people off from the whole topic in the first place. It's this school of thought that complained how ChatGPT couldn't count the number of "r"s in the word strawberry- which IMO completely misses the point. LLMs are just tools. Tools that are useful when used correctly.
@AB-wf8ek 19 วันที่ผ่านมา
@BitsOfChris Nice, really glad to get your feedback. Thanks again!
@sapdalf 20 วันที่ผ่านมา
I think this is a very good, basic explanation. It's quite illustrative. I believe it's an excellent introduction to understanding how self-attention works in transformers and how it's implemented in large language models.
@BitsOfChris 19 วันที่ผ่านมา
Thank you for the kind words :)
@mshonle 20 วันที่ผ่านมา
Great explainer! As for other videos I’d be interested in… what is the deal with positional encoding, specifically the current state of the art; how does a text embedding actually guide the diffusion process in image generation; and, how is there even a gradient that can be useful in training these attention matrices.
@BitsOfChris 19 วันที่ผ่านมา
Thank you for the ideas! Positional encoding is something I need to go deeper on myself.
@coopernik 20 วันที่ผ่านมา
Subbed. I really like how you presented the topic. The software you use is great for breaking ideas down. I would have loved if you went through the paper at the same time. Trying to break down the complicated equations into what “they’re essentially saying”
@BitsOfChris 19 วันที่ผ่านมา
Thanks for the feedback :) I like that idea of breaking a paper down section by section like that. Thanks for the suggestion - I'll that one.
@JamesHoover 20 วันที่ผ่านมา
This is an excellent insight in how to explain the high dimensionality of AI models
@BitsOfChris 19 วันที่ผ่านมา
Thank you :)
@JamesHoover 20 วันที่ผ่านมา
Great insight. For fun, I also recommend reading “Adventures of A. Square” which helps a 2-D being comprehend 3-D beings - great little read to open the mind.
@BitsOfChris 19 วันที่ผ่านมา
Wow that sounds interesting. Is this the story you are referring to? en.wikipedia.org/wiki/Flatland
@Jeremy-Ai 20 วันที่ผ่านมา
What happens then we choose not to speak no more? Who then is paying attention?
@BitsOfChris 19 วันที่ผ่านมา
If a tree falls in the woods.. and no one is there to hear it.. does it even make a sound?
@yvesbernas1772 20 วันที่ผ่านมา
Just another video that describes but does not explain. Why is being able to describe so often confused with understanding?
@BitsOfChris 20 วันที่ผ่านมา
Sorry this video didn’t meet your expectations. What part specifically do you think needs more depth?
@aiamfree 20 วันที่ผ่านมา
@yvesbernas1772 I got you bro. It's like this: Simplified: Encoder makes the inputs compatible with the model dimensions, then decoder finds the best response by applying best weights of encoded input using attention. Detailed: There's Dataloader, Encoder, Decoder, and Attention. - Dataloader takes data and creates batch inputs and batch outputs (you specify how many batches you want to produce), by prepping before passing to dataloader through methods like sliding windows for example. - Encoder is trained on how to transform (hint hint) these batch inputs to the model dimensionality via embedding sizes, normalization and returning hidden states (relationships between the tokens) and initializes weights and returns encoder outputs and hidden. - Decoder takes data from batch outputs from Data Loaders (to be passed as tensors) as well as the encoder outputs and hidden from Encoder, and this is the part where Attention comes in...Based on the decoder inputs (batch outputs from Data Loaders), the Decoder creates embeddings of the batch outputs, then passes the encoder_outputs to the Attention mechanism to create context vectors (context = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs) and returns attention weights attn_weights, here's example code: encoder_outputs = self.layer_norm(encoder_outputs) hidden = self.layer_norm(hidden) hidden = hidden.unsqueeze(1).repeat(1, encoder_outputs.size(1), 1) combined = torch.cat((hidden, encoder_outputs), dim=2) energy = torch.tanh(self.attn(combined)) energy = torch.matmul(energy, self.v) attn_weights = torch.softmax(energy, dim=1) attn_weights = torch.clamp(attn_weights, min=1e-9, max=1 - 1e-9) Then the Decoder finally gives token output by combining the embedding it created with the Attention context it received, and returns output and hidden state for the next token prediction, like this: # Embed the decoder input embedded = self.embedding(decoder_input) # Compute attention context, attn_weights = self.attention(hidden[-1], encoder_outputs) # Concatenate embedded input and context rnn_input = torch.cat((embedded, context), dim=2) # Pass through GRU output, hidden = self.rnn(rnn_input, hidden) # Compute vocabulary logits, by this step either training or prediction token has been produced output = self.fc_out(output.squeeze(1)) Here's some (heavily truncated) code of the full process: #all_pairs is from training data ie: [[x,y]] where x = "The cat is" y = "cat is playing" dataloader = DataLoader(all_pairs, batch_size=batch_size, shuffle=True, collate_fn=collate_fn_stride, pin_memory=True) embedding_size = 768 hidden_size = 768 num_layers = 1 encoder = Encoder(vocab_size, embedding_size, hidden_size, num_layers).to(device) decoder = Decoder(vocab_size, embedding_size, hidden_size, num_layers, Attention(hidden_size)).to(device) train_model( encoder, decoder, dataloader, num_epochs=10, learning_rate=0.0001, ) def train_model(encoder, decoder, dataloader, num_epochs, learning_rate): #There's other steps in between like optimizers and entropy loss that can be setup before dealing with the encoders and decoders as well encoder.train() decoder.train() encoder_outputs, hidden = encoder(batch_inputs, input_lengths) decoder_input = torch.tensor([special_token_ids[SOS_TOKEN]] * batch_outputs.size(0), device=device).unsqueeze(1) # at this next step, this batch of inputs and outputs is trained for the epoch decoder_output, hidden = decoder(decoder_input, hidden, encoder_outputs, trg_mask) How they are used at inference: The only difference from the training to actually using the LLM is that instead of batch inputs and outputs from Dataloader (training data) you pass prompt inputs as tensors to the Encoder (and attention better thought of as weights and context is pretrained, so that is not a concern in generation). There's also things like source and target masks for improved attention (ie context weighting).
@justinpeter5752 20 วันที่ผ่านมา
How do the attention scores get set for each word and how do they get updated?
@aiamfree 20 วันที่ผ่านมา
@@justinpeter5752 well the attention usually takes an input sequence and output sequence (ie each input offset by a stride or sliding window of let's say 10 words) then transformed to tensors. It doesn't work on word by word basis, but as a weighted score of their relationships of what comes next (hence next token prediction). Any deeper than that then you're beginning to think about how to build transformer tech not what they are doing, which is ok if that's your thing.
@tristanreid5770 19 วันที่ผ่านมา
@@justinpeter5752in a neural net anything that is called a parameter is set through a process called back propagation. The attention scores are parameters, and are set in this way. The way that it works is that going backwards through the neural net you send corrections from your training data, and the amount each node’s value changes is determined using calculus based on what its function is, and all the nodes attached to it. The main purpose of doing all this with PyTorch is that it automatically sets up all that calculus, for example that softmax function’s in the video: that function’s derivative gets chained together with all the other functions in the neural net to know how much to alter those weights when it’s exposed to training data
@ryparks 20 วันที่ผ่านมา
These white board style videos are really helpful. Keep it up! You got a subscriber in me and I look forward to seeing the grow!
@BitsOfChris 20 วันที่ผ่านมา
Thank you! I really appreciate hearing that :). It's helpful to hear any sort of feedback, please keep it coming!

Bits of Chris

ความคิดเห็น