- 41
- 56 404
Bits of Chris
เข้าร่วมเมื่อ 21 มิ.ย. 2024
Hi there!
I’m Chris, currently working as a Staff Data Engineer on an AI research team where I optimize datasets for deep learning. Former day trader.
Follow my work and deep learning projects on Github.
Other Things:
- I podcast for fun.
- I’m an advocate of seconds brains and learning in public.
- I believe AI is a useful tool but not a replacement. Learn how to Augment, Stay Human.
- My Joyspan philosophy is my attempt to balance work, time, and "joyskill" to maximize life satisfaction.
- I’m a dad of two young kids- doing my best to be there for them.
I’m Chris, currently working as a Staff Data Engineer on an AI research team where I optimize datasets for deep learning. Former day trader.
Follow my work and deep learning projects on Github.
Other Things:
- I podcast for fun.
- I’m an advocate of seconds brains and learning in public.
- I believe AI is a useful tool but not a replacement. Learn how to Augment, Stay Human.
- My Joyspan philosophy is my attempt to balance work, time, and "joyskill" to maximize life satisfaction.
- I’m a dad of two young kids- doing my best to be there for them.
How I'm Learning AI (as a Staff Data Engineer)
Feeling overwhelmed learning AI?
I’ve been there.
This is the map I wish I had when starting out.
As a staff data engineer working on an AI research team, I’ve spent the last six months diving into deep learning-and it hasn’t been easy. From application-level concepts like prompt engineering to advanced math at the core of neural networks, understanding how it all fits together can be daunting. That’s why I created this conceptual map to guide you through the levels of AI understanding.
In this video, I’ll break down:
The six levels of AI learning: From applications to advanced math, and how to decide which levels you need to focus on.
Key takeaways from my journey: What I learned at each level, who should specialize in them, and why it matters.
Essential resources: Courses, books, and videos that made a difference in my AI learning path.
Whether you're an aspiring AI researcher, data engineer, or just curious about AI, this map will help you cut through the noise and focus on what truly matters for your goals.
Key Takeaways:
* Don’t try to learn everything-focus on the levels that align with your goals.
* Start with a big-picture understanding before diving deeper into specific areas.
* AI is about augmenting humans, not replacing them.
Timestamps:
00:00 Intro
00:41 Big Picture Overview - How applications, models, and math connect.
01:27 Level 1: Applications - Basics of prompt engineering and AI tools.
03:06 Level 2: Modeling - Choosing and understanding AI models.
04:22 Level 3: Architecture - Neural networks, pre-training, and fine-tuning.
05:44 Level 4: Components - Diving into transformer blocks and self-attention.
07:13 Level 5: Mechanisms - Cutting-edge research and implementation.
08:04 Level 6: Math - The foundations behind AI algorithms.
08:40 Wrap-Up - How to use this map to focus your learning.
AI Learning Map with Resources: github.com/bitsofchris/deep-learning/blob/main/notes/map_of_ai_learning.md
I’ve been there.
This is the map I wish I had when starting out.
As a staff data engineer working on an AI research team, I’ve spent the last six months diving into deep learning-and it hasn’t been easy. From application-level concepts like prompt engineering to advanced math at the core of neural networks, understanding how it all fits together can be daunting. That’s why I created this conceptual map to guide you through the levels of AI understanding.
In this video, I’ll break down:
The six levels of AI learning: From applications to advanced math, and how to decide which levels you need to focus on.
Key takeaways from my journey: What I learned at each level, who should specialize in them, and why it matters.
Essential resources: Courses, books, and videos that made a difference in my AI learning path.
Whether you're an aspiring AI researcher, data engineer, or just curious about AI, this map will help you cut through the noise and focus on what truly matters for your goals.
Key Takeaways:
* Don’t try to learn everything-focus on the levels that align with your goals.
* Start with a big-picture understanding before diving deeper into specific areas.
* AI is about augmenting humans, not replacing them.
Timestamps:
00:00 Intro
00:41 Big Picture Overview - How applications, models, and math connect.
01:27 Level 1: Applications - Basics of prompt engineering and AI tools.
03:06 Level 2: Modeling - Choosing and understanding AI models.
04:22 Level 3: Architecture - Neural networks, pre-training, and fine-tuning.
05:44 Level 4: Components - Diving into transformer blocks and self-attention.
07:13 Level 5: Mechanisms - Cutting-edge research and implementation.
08:04 Level 6: Math - The foundations behind AI algorithms.
08:40 Wrap-Up - How to use this map to focus your learning.
AI Learning Map with Resources: github.com/bitsofchris/deep-learning/blob/main/notes/map_of_ai_learning.md
มุมมอง: 534
วีดีโอ
How I Finally Understood Self-Attention (With PyTorch)
มุมมอง 19K21 วันที่ผ่านมา
Understand the core mechanism that powers modern AI: self-attention.In this video, I break down self-attention in large language models at three levels: conceptual, process-driven, and implementation in PyTorch. Self-attention is the foundation of technologies like ChatGPT and GPT-4, and by the end of this tutorial, you’ll know exactly how it works and why it’s so powerful. Key Takeaways: * Hig...
How I Finally Understood LLM Attention
มุมมอง 28K28 วันที่ผ่านมา
Words are just points on many number lines that capture part of the meaning. Self-attention in large language models (LLMs) finally made sense when I visualized words as points in 12,000 dimensions-this mental model changed everything for me. Here’s what you’ll learn: How LLMs represent words in high-dimensional space to capture nuanced meanings. How self-attention updates word meanings dynamic...
How I Finally Learned to Think in 4D
มุมมอง 511หลายเดือนก่อน
Struggling to think about higher dimensions? When visualizing breaks down beyond 3D, there’s a simple way to make sense of 5D, 10D, or even 12,000D. This is what clicked for me: - Each dimension is it’s own feature that contributes to the final point. - Think of each dimension as it’s own number line. In this video, I explain how to stop forcing spatial reasoning and start thinking in terms of ...
how I built my first neural network in pytorch
มุมมอง 730หลายเดือนก่อน
Want to understand neural networks but feel overwhelmed by the math? In this video, I’ll walk you through the basics of neural networks using Python and PyTorch, with zero complex equations. As a data engineer, I know the challenges of approaching AI concepts without a research or mathematics background. That’s why this guide focuses on practical, high-level understanding. What You’ll Learn: - ...
How Neural Networks Work: Understanding Feedforward, Backpropagation, and the Training Loop
มุมมอง 332หลายเดือนก่อน
How do neural networks work? In this video, we’ll break down the entire process, from how data moves through a network (feedforward) to how it learns and improves through backpropagation and training loops. Whether you’re a beginner in AI or a data engineer looking to understand the fundamentals, this video will give you a clear roadmap. What you’ll learn: - The structure of a neural network: n...
Building the Simplest Neural Network: A Perceptron Explained
มุมมอง 268หลายเดือนก่อน
Learn the simplest form of a neural network: the perceptron. We'll explore how a single neuron, using a basic step activation function, can solve a fundamental logic problem. Using just one neuron with two inputs, we demonstrate how to compute an AND gate, showing how neural networks process binary outcomes with weighted inputs and a bias term. You’ll learn: * What a Perceptron is: The most bas...
What is a Neural Network? A simple 5 minute explanation.
มุมมอง 2602 หลายเดือนก่อน
In this video, I’ll explain the basics of neural networks in a simple, beginner-friendly way. We’ll look at how a neural network works, why it can be thought of as just a “magic box,” and how it learns to make predictions through training. If you’re new to AI, machine learning, or just curious about how neural networks process information, this video is for you. I’ll break down key concepts wit...
Impactful Listening & Effective Onboarding | Sophia Sithole, Founder Ofstaff
มุมมอง 242 หลายเดือนก่อน
In this episode, I talk with Sophie Sithole about her journey building Ofstaff, an AI-powered onboarding and performance management solution. We explore the challenges of effective employee onboarding, and get into a deeper discussion about customer development, active listening, and handling vulnerability in business. Key Lessons Effective Onboarding * Alignment and clear expectations between ...
I just built my first Neural Network: Here's my framework for learning in public
มุมมอง 702 หลายเดือนก่อน
I recently joined a research team building time series Transformer models and have become infatuated with the field of deep learning. As a former trader, turned data engineer, I am now trying to understand the AI side of things. And this week I just hit my first significant milestone: building my first neural network from scratch, using no machine learning libraries. Today, I want to share this...
Domain Expertise and AI Tools for Data Analysts | Meghan Maloy, Staff Analytics Engineer
มุมมอง 332 หลายเดือนก่อน
Domain Expertise and AI Tools for Data Analysts | Meghan Maloy, Staff Analytics Engineer
Pilot Life, Basics of LLMs, and AI for Beginners | Greg Lettieri, Corporate Aviator
มุมมอง 373 หลายเดือนก่อน
Pilot Life, Basics of LLMs, and AI for Beginners | Greg Lettieri, Corporate Aviator
Start your Second Brain: A Quick Guide for Staff Engineers
มุมมอง 743 หลายเดือนก่อน
Start your Second Brain: A Quick Guide for Staff Engineers
Deploying AI Models at Google Scale | Eugene Weinstein, Engineering Director @ Google
มุมมอง 403 หลายเดือนก่อน
Deploying AI Models at Google Scale | Eugene Weinstein, Engineering Director @ Google
Finding Opportunities and Maximizing Impact: A Staff Engineer's Framework | Quick Bits
มุมมอง 274 หลายเดือนก่อน
Finding Opportunities and Maximizing Impact: A Staff Engineer's Framework | Quick Bits
AI in the Classroom: From Teachers to Facilitators | Shawn Cryan, Educational Systems Coordinator
มุมมอง 244 หลายเดือนก่อน
AI in the Classroom: From Teachers to Facilitators | Shawn Cryan, Educational Systems Coordinator
Handling Work Stress with Awareness & Homework for Life | Quick Bits #2
มุมมอง 134 หลายเดือนก่อน
Handling Work Stress with Awareness & Homework for Life | Quick Bits #2
Augmented Intelligence for Engineers, Feynman Technique, FX Carry Trade | Quick Bits #1
มุมมอง 224 หลายเดือนก่อน
Augmented Intelligence for Engineers, Feynman Technique, FX Carry Trade | Quick Bits #1
AI for Sales, Augmentation, and Learning | Account Executive, Ryan Burwell
มุมมอง 74 หลายเดือนก่อน
AI for Sales, Augmentation, and Learning | Account Executive, Ryan Burwell
The Crowdstrike Incident: A developer mistake explained in 3 minutes
มุมมอง 1595 หลายเดือนก่อน
The Crowdstrike Incident: A developer mistake explained in 3 minutes
34 - AI for Real Life: Augmented Creativity, Robot Hockey, and storytelling
มุมมอง 365 หลายเดือนก่อน
34 - AI for Real Life: Augmented Creativity, Robot Hockey, and storytelling
Reading with AI - Hierarchical Summarization and Extraction using LLMs
มุมมอง 916 หลายเดือนก่อน
Reading with AI - Hierarchical Summarization and Extraction using LLMs
AI Game Development with my 5 Year Old, Working Game in 1 Prompt
มุมมอง 896 หลายเดือนก่อน
AI Game Development with my 5 Year Old, Working Game in 1 Prompt
Invest Your Time for Maximum Impact with this framework
มุมมอง 666 หลายเดือนก่อน
Invest Your Time for Maximum Impact with this framework
Stop Niching Down: How to embrace your diverse interests and be yourself
มุมมอง 676 หลายเดือนก่อน
Stop Niching Down: How to embrace your diverse interests and be yourself
How to take Smart Notes to Build your Second Brain [Audio Only]
มุมมอง 275หลายเดือนก่อน
How to take Smart Notes to Build your Second Brain [Audio Only]
Do a roadmap video for Data Engineering also. And if you already have any please provide the link. TIA
I do not have one yet, but thank you for the idea :)
Great. Please use examples like How to make Thai Curry? ....so audience knows Curry then Thai then where to focus & where to provide attention.
great intuitive explanation
from where are you studying?
Self study, I work as a data engineer on an AI research team
@BitsOfChris that's great..even I am studying data engineering..can you tell me what are the tools that you are working on ..I mean to say their are lots of tool but very few are used typical
Very good indeed. Keep doing what you are doing. Thanks.
Appreciate hearing that, thank you!
I agree with you that you seem dialed in. Nice video.
Thank you :)
Excellent explanation. Appreciate the effort.!
Thanks, happy to hear :)
Ahh, this is good. It's so good.
Happy to hear, thanks :)
I would like to see more content like this. Sharing this with my company
Appreciate that, thank you!
sorry: i am confident i saw this idea of words related to each other far before the paper you mention. it is not a new idea. what i am not sure is that is what the brain does, meaning it does only that. i really like your explanation it is clear. wquery is the possibilities available for lightly wkeys is the possibilities available for the other words and w value is probably the other meanings possible for the same words in the sentence in other contexts. or something close to that. maybe i am wrong but it does not matter i am close. i don t see why this is necessary probably the brain just skips it. thx. bigger nn is all you need
Yes it’s definitely not a new idea, never claimed it to be novel and not sure why it matters. :) I’m just sharing what I learned about self attention and trying to explain it as I understand it for beginners or folks new to the concept. To your point about the brain doing this- yes I think we do this without realizing. Agreed though too, in general it seems more data, more parameters, and/or more compute at inference time matter for performance of models. Thanks for watching!
Thank god! I found your channel, the best explanation I've ever seen
Really happy you found it too! Thanks :)
Can you please create a whole video or complete playlist would be also fine Deep learning? As you already have a lot of videos out on the topic. Btw i really enjoy your videos cz its kinda "you explain it simply" cz you understand it well. As Einstein said.
I do think it's time for a playlist - thank you for the suggestion, this might be the motivation I need to just do that :) And thank you for the compliment! I totally believe the "explain it simply" philosophy. Einstein & Feynman being heroes of mine who really embody that.
@@BitsOfChris I can relate cz I'm also struggling sart doing TH-cam to be precise it's been a waopping 7 years almost since the day I thought doing yt. But my Perfectionism won't let me start. Any advice for that might help me ( I genuinely need it ) if you could share your experience, how did you overcome the initial resistance to post the very first video also for Perfectionism. TLDR - If you could share any advice that helped you break the barrier?
@ I hear you, getting started is the hardest part. I think you just have to accept that the first 10 videos aren’t going to be great and you need to just publish them. You can always delist later. Realize it’s a process where you just keep making each video a little bit better than the last and that you will NEVER be perfect. But if you never hit publish you won’t ever get started. Reddit has r/NewTubers which is a helpful community of people trying to get started. I’d say just get those first ten videos done as soon as you can, they can be about anything. It’s just a prerequisite for making “good” videos, to get started. Subscribe to you now, looking forward to your first :)
@@BitsOfChris Thank you so much it means a lot ❤️ I really needed to hear that - deep down I knew I'll have to post few bad videos initially! ( Intentionally in my context I'm Literally a perfectionist ) this is the reason I haven't been to get started from especially I spent the last two years 2023/2024 doing nothing but trying to do youtube. But I couldn't somehow! The perfection HOLD had me so strong.. but now I will do it - in fact I'm just going to do .. Thank You! Thanks A lot ❤️ You're Like A brother to me now. I'll Post at least one-video before 2024 end.
@ happy to hear, looking forward to watching your journey unfold. It’s a constant balance for me between getting things done and making it good. The tradeoff between consistency and quality. I ebb and flow between them, for you it seems you were over indexed on quality preventing you from taking action. The book The War of Art by Steven Pressfield could be a good read for you, but the best thing would be to just hit publish :)
Very nice. Think this is the freaking first time I understand it.
Sometimes it just takes seeing something through a different lens, happy it helped!
This was super intuitive!! The ability to focus on 5 dimensions and walk through a work like light that has multiple meanings that only becomes clear based on context was very helpful. Do you have a Twitter account or Substack I can follow?
Appreciate that, glad it helped :) I'm mainly focused on TH-cam right now but I use bitsofchris.com Substack as my "hub" for all things.
Amazing my friend
Thanks!
TH-cam algorithm! I like this. Find people that are wanting to understand AI, push these vids. Well done Chris! Stay focused. Keep publishing!
Thank you for kind words of support :)
Excellent
Thanks
Given that there are less easily human interpretable dimensions or properties or whatever of a word, I feel like maybe if we could figure out what those dimensions mean and what they represent then that could be a way of learning about nuances in the meanings of words that maybe haven't even been considered 🤔 Anyone know if there's some kind of thing like this where you could find unique meanings or patterns of like a word that AI tech uncovered? I've heard about how, for example, people working with neural networks or something found that the (AIs? networks? systems? ig I'll just say) models picked up on totally unexpected patterns in their input data, with some examples like I think one model meant to find patterns between images of eyes and diseases also found patterns that'd help predict someone's gender based on the image of the eye. And I think another example involved an LLM developing a neuron for like positive or negative sentiment in parts of text, so it could be configured to create outputs of a certain sentiment and it could also be used to measure the sentiment of text. I've considered looking into stuff like this further because it seems so cool that models can just pick up on patterns that we don't haven't even consciously noticed, and I wonder if anyone might have some ideas for how we can learn from the actual patterns that AIs have essentially discovered in different kinds of input data. Anyone know topics involving stuff like this that I could look into further?
Dude - this is getting into some really deep but fascinating territory. It's the balance between human interpretable and the emergent properties of these models. Some work to follow up on: Golden Gate Claude: www.anthropic.com/news/golden-gate-claude The field as a whole is called "mechanistic interpretability" where you are effectively reverse engineering neural networks. Please share anything you find - sounds like you are very interested in this angle, would love to learn more too :)
@@BitsOfChris Oh cool 🤔 After my comment, I ending up coming across a new video from the Welch Labs channel about mechanistic interpretability, which kinda did seem like this idea I was thinking about, and now it sounds like that topic was indeed on the right track. The Golden Gate Claude thinking of the bridge as its own ideal form is hilarious btw xD Looks like there is sort of a whole entire field of interpretability within the overall scope of neural networks, and some links on the article seem to point to some deeper research on the topic. I'm not too intent on exploring it atm, but it's good to know that there is this stuff out there, and I imagine I would revisit it sometime in the future. Oh also I recall there was a 3b1b YT short about word vectors that kinda showed how models approximately represent concepts like [ woman - man ≈ aunt - uncle ] and even [ Hitler + Italy - German ≈ Mussolini ] lol. I didn't watch it but the short linked to a longer video about transformers explain visually, which might touch on some interesting sides of this topic as well if you're interested.
Subbed. Keep it coming.
really amazing explanation and visualization! thank you! subbed! :)
Thank you for the kind words :)
Thank you all for the feedback on this video! I just want to highlight a few things for transparency and completeness. I deliberately chose to simplify certain aspects of self-attention in this video to focus on conceptual clarity and make it approachable for beginners. For example: - I didn’t dive deeply into the query, key, and value matrices - I don't discuss causal masking (which ensures that when a model predicts the next word in a sentence, it only looks at the words that came before it and ignores anything that comes after). In fact, I do the opposite of this just to illustrate how context changes meaning. Going forward I will be sure to include these disclaimers and instructional shortcuts in the video itself rather than afterwards here as a comment. Thank you all for the feedback, it's been incredibly humbling and motivating to see :)
Im still bum out thar people aren't able to draw connection with old parser and chatbot, i strongly feel that knowing how the stanford parser or chatscript works is a great insight about how llm works. Llm would feel a lot less black boxy, because they improve and do not exactly replace.
Hey thanks for the suggestion - I've never heard of the Stanford parser. Is this (nlp.stanford.edu/software/lex-parser.shtml) what you refer to? I agree - LLMs and neural networks in general are very black boxy. What do you mean by they improve and do not replace?
@BitsOfChris it's a bit long, so we will go through very simplistic illustration. Neural network are graph, that imply constraints in how they process data. So the question is, given immediate observation of LLM property, such as sequentiality, what shape should have a neural DAG to implement one instance of sequence, purely as a dag. You will find out that neuron can't, because its operations are commutative, order don't matter but it's needed to create sequence dependant operations like in language. By constructing manually a sequence in a DAG that respect ANN architecture, it reveal you need layer to encode basic sequences. If we turn ourselves to neuron, instead of considering them as magic statistic pixie faery, we can break it down into atomic operations and reason about them. Neuron do dot product with input vector, but that's not really saying much, let's go more basic, they pick an input and apply mul, that's a mask because it's in the range of 0-1, that first operation filter out data unneeded, then we sum the results, let's call it decibel, then with the bias we threshold through the activation function. Sounds trivial what's the big deal? Well it's about shifting the mental image to make it clearer, masking is equivalent to a bag of word solution aka set detection. Then the sum and activation implement logics on that set, aka a generalization of binary logics to set, with OR being ANY with low bias (any input will trigger) and AND being ALL (all input are needed) and a spectrum between the too being SOME. It's a shortcut to say that, but, chat script and the Stanford parser are bag of word solution, as they need a dictionary of words in classes or template. The difference being in how neural network encode atomic lexeme, ie not on a word basis but at token basis. But the fundamental are the same, the token vector encoding is just a black box representation of statistical proximity of words token, which is a type of bag of words, the difference is that bag of words have ad hoc relationship by virtue of being in a bag, statistic solving the creation part of the dictionary. If this true, we should be able to back port idea from LLM to regular chatbot by generating bag of word from token (it's loosely byte pair encoding), or using the class of ontology, like WordNet,as embedding and have a less nuanced LLM as proof. It also explain why we can quantized LLM down to 3 bit, 1 being set detection, -1 being anti set. Think of it like a naive character detector, pixel in the character need to be turned on for detection, pixel not part of the character turn down, that's two neuron for set and anti set, and a third neuron that takes the set and anti set and declare final solution, the bias of the set and anti set modulating sensibility to uncertainty, as seen in the generalized logics. It's more than just composition, it's circuitry, such as that looking at the semantics of a neuron is misleading, the same way that finding the addition transistor is misleading instead of looking for adder structure. That's a very cartoony presentation to fit in a comment, but I'll let you fill the blank for the complete picture. Here is an exercise, create a program that turn embedding into WordNet encoding, Aka human readable ontology, which will pick up word not encoded in that ontology.
Why do you do this? Are you working at OpenAI/Anthropic developing LLMs?
I like making content as a way to learn things deeper - publishing something is a forcing function for me to make sure I understand something well enough to explain it. Personally, I find the field fascinating and recently joined an Applied AI Research team working on time series foundation models. My background is data engineering so I'm focused on learning the modeling side now :)
And how do you use AI to work with this? NotebookLM?
Hey great question - Obsidian has a few community plugins that use various AI models (local or paid ones with your API key). Obsidian Copilot & SmartConnections are the two I use.
The number lines (to visualize higher dimensions) helps, greatly. I never thought of it like that and was struggling with dimensions greater then three!
Glad it helped! :)
Very good explanation
Thank you, happy to hear :)
I resonated with this. Now i want to be friends
Let's be friends!
@ sure, I have lots to learn and share.
Nice - just checking out your videos - looking nice. I need to figure out lighting/ production quality like that for my "talking head" style videos. Been doing faceless because it's easier for me to illustrate the AI concepts I'm recently sharing.
@@BitsOfChris and I have been watching yours because I want to document my ML journey and I need illustrations and diagrams. I’ll email you
This really describes embeddings more than attention itself.
Fair point- I think understanding embeddings is key to understanding self-attention. But then the scope of this video was more the conceptual understanding of what attention means rather than the implementation. Thank you for watching :) Is there anything additional you would have liked to see?
@@BitsOfChris Well, it would have been useful if it explained attention instead of just saying it exists.
I totally agree but sometimes it's helpful to not overwhelm with too much information. This video I deliberately chose to keep the scope small. I tried going deeper on attention in this video - I hope this one helps: th-cam.com/video/FepOyFtYQ6I/w-d-xo.html
It repeats the same phrases and words of the paper and other videos. Doesn't explain or add anything
Thanks for your feedback! I'm not going to lie, this comment stings a little bit but I don't disagree. Sorry the video wasn't what you were looking for. I'm a data engineer who recently joined an Applied AI research team building time series foundation models. I use content to help me learn topics deeper. Right now, I'm an advanced beginner in these more nuanced AI topics and try to share what helped me as I learn. I hope you'll give the channel a second chance in the future. Anyway, have a great day!
Thanks
You're very welcome :)
it's paramount to have a great understanding of word2vec (aka word embedding vector) but even more important is understanding n-grams to have a grasp as to why word embeddings is such a significant advancement in nlp
Great point about the progression from n-grams to word embeddings! While my work focuses more on time series data than NLP, these concepts have interesting parallels - like how we handle sequential dependencies in financial or climate data compared to text. Would love to hear your thoughts on how these foundational NLP concepts might apply to non-text sequences?
That's a great perspective. Thank You.
Thanks for the kind words, happy it helped :)
@@BitsOfChris I am working on scaling Ai personal memories, so your slide scaling_factoring in my nowhere Nexus development really optimize the performance and output based on necessary settings for each self.dimensions memories embedded into subdimensions function generateLabels. Duel dialogue associated with each iteration in conversation to convert into recursive fractal dimensions into appropriate layers. later I had my first results with models. Was very good 👍
the fun part is when you try to understand queries, keys, and values...
Honestly - that took me about a month of reading, implementing, and watching TH-cam. Here's the video I published after this one that goes into more detail: th-cam.com/video/FepOyFtYQ6I/w-d-xo.htmlsi=Gg9iUT-teDDkr7sD
Hi there!
subbed what tool are you using for your presentation ?
Thanks :) I'm using Descript to record the screen and Excalidraw to make the visuals.
That was very helpful, thanks! I recently had to learn more about AI for my job and even though I'm generally somewhat informed, I lack a lot of depth in understanding AI at this point. This example paints a clear picture on how LLM's think, very cool!
Happy to hear it helped! I'm in a similar boat, as a data engineer now supporting an AI research team this is world of AI is new to me too. I think the channel here is mainly me (as a engineer) sharing what I learn/ is helpful for other engineers needing to learn AI. Thanks for the comment :)
Drawing in excalidraw makes this tons more impressive lol
@@hamadaag5659 thank you ;) I’d love to learn another tool but sticking with a simple one has been a good constraint. Makes it easier for me to get something done.
Chris, this is super cool. Keep up the good work. Conceptual things have tremendous impact on new learners.
Thanks- appreciate hearing that. I'm with you- there's so much to learn but it's hard to tie the deep dives to something you can grasp. Glad this was helpful :)
Very good explanation
Thank you :)
AI: The silent force behind incredible breakthroughs 🔥
Exciting times we live in.
Bro this is the best explanation I've ever seen. Thank you!
Thank you for the kind feedback, glad it helped :)
Thanks for the explanation. I've been thinking about how to better describe LLMs in general, and one angle I came up with is that it's more like a language calculator. Similar to how you wouldn't say a calculator understands arithmetic, it does arithmetic - LLMs don't understand language, they do language. I don't know if others would agree with that or not.
I think that's an excellent analogy. I've heard that used a few times as well. And really - LLMs are just a tool. My feeling is most people who hear "AI" are really picturing the Hollywood version of AGI that is there to replace them. This I think turns a lot people off from the whole topic in the first place. It's this school of thought that complained how ChatGPT couldn't count the number of "r"s in the word strawberry- which IMO completely misses the point. LLMs are just tools. Tools that are useful when used correctly.
@BitsOfChris Nice, really glad to get your feedback. Thanks again!
I think this is a very good, basic explanation. It's quite illustrative. I believe it's an excellent introduction to understanding how self-attention works in transformers and how it's implemented in large language models.
Thank you for the kind words :)
Great explainer! As for other videos I’d be interested in… what is the deal with positional encoding, specifically the current state of the art; how does a text embedding actually guide the diffusion process in image generation; and, how is there even a gradient that can be useful in training these attention matrices.
Thank you for the ideas! Positional encoding is something I need to go deeper on myself.
Subbed. I really like how you presented the topic. The software you use is great for breaking ideas down. I would have loved if you went through the paper at the same time. Trying to break down the complicated equations into what “they’re essentially saying”
Thanks for the feedback :) I like that idea of breaking a paper down section by section like that. Thanks for the suggestion - I'll that one.
This is an excellent insight in how to explain the high dimensionality of AI models
Thank you :)
Great insight. For fun, I also recommend reading “Adventures of A. Square” which helps a 2-D being comprehend 3-D beings - great little read to open the mind.
Wow that sounds interesting. Is this the story you are referring to? en.wikipedia.org/wiki/Flatland
What happens then we choose not to speak no more? Who then is paying attention?
If a tree falls in the woods.. and no one is there to hear it.. does it even make a sound?
Just another video that describes but does not explain. Why is being able to describe so often confused with understanding?
Sorry this video didn’t meet your expectations. What part specifically do you think needs more depth?
@yvesbernas1772 I got you bro. It's like this: Simplified: Encoder makes the inputs compatible with the model dimensions, then decoder finds the best response by applying best weights of encoded input using attention. Detailed: There's Dataloader, Encoder, Decoder, and Attention. - Dataloader takes data and creates batch inputs and batch outputs (you specify how many batches you want to produce), by prepping before passing to dataloader through methods like sliding windows for example. - Encoder is trained on how to transform (hint hint) these batch inputs to the model dimensionality via embedding sizes, normalization and returning hidden states (relationships between the tokens) and initializes weights and returns encoder outputs and hidden. - Decoder takes data from batch outputs from Data Loaders (to be passed as tensors) as well as the encoder outputs and hidden from Encoder, and this is the part where Attention comes in...Based on the decoder inputs (batch outputs from Data Loaders), the Decoder creates embeddings of the batch outputs, then passes the encoder_outputs to the Attention mechanism to create context vectors (context = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs) and returns attention weights attn_weights, here's example code: encoder_outputs = self.layer_norm(encoder_outputs) hidden = self.layer_norm(hidden) hidden = hidden.unsqueeze(1).repeat(1, encoder_outputs.size(1), 1) combined = torch.cat((hidden, encoder_outputs), dim=2) energy = torch.tanh(self.attn(combined)) energy = torch.matmul(energy, self.v) attn_weights = torch.softmax(energy, dim=1) attn_weights = torch.clamp(attn_weights, min=1e-9, max=1 - 1e-9) Then the Decoder finally gives token output by combining the embedding it created with the Attention context it received, and returns output and hidden state for the next token prediction, like this: # Embed the decoder input embedded = self.embedding(decoder_input) # Compute attention context, attn_weights = self.attention(hidden[-1], encoder_outputs) # Concatenate embedded input and context rnn_input = torch.cat((embedded, context), dim=2) # Pass through GRU output, hidden = self.rnn(rnn_input, hidden) # Compute vocabulary logits, by this step either training or prediction token has been produced output = self.fc_out(output.squeeze(1)) Here's some (heavily truncated) code of the full process: #all_pairs is from training data ie: [[x,y]] where x = "The cat is" y = "cat is playing" dataloader = DataLoader(all_pairs, batch_size=batch_size, shuffle=True, collate_fn=collate_fn_stride, pin_memory=True) embedding_size = 768 hidden_size = 768 num_layers = 1 encoder = Encoder(vocab_size, embedding_size, hidden_size, num_layers).to(device) decoder = Decoder(vocab_size, embedding_size, hidden_size, num_layers, Attention(hidden_size)).to(device) train_model( encoder, decoder, dataloader, num_epochs=10, learning_rate=0.0001, ) def train_model(encoder, decoder, dataloader, num_epochs, learning_rate): #There's other steps in between like optimizers and entropy loss that can be setup before dealing with the encoders and decoders as well encoder.train() decoder.train() encoder_outputs, hidden = encoder(batch_inputs, input_lengths) decoder_input = torch.tensor([special_token_ids[SOS_TOKEN]] * batch_outputs.size(0), device=device).unsqueeze(1) # at this next step, this batch of inputs and outputs is trained for the epoch decoder_output, hidden = decoder(decoder_input, hidden, encoder_outputs, trg_mask) How they are used at inference: The only difference from the training to actually using the LLM is that instead of batch inputs and outputs from Dataloader (training data) you pass prompt inputs as tensors to the Encoder (and attention better thought of as weights and context is pretrained, so that is not a concern in generation). There's also things like source and target masks for improved attention (ie context weighting).
How do the attention scores get set for each word and how do they get updated?
@@justinpeter5752 well the attention usually takes an input sequence and output sequence (ie each input offset by a stride or sliding window of let's say 10 words) then transformed to tensors. It doesn't work on word by word basis, but as a weighted score of their relationships of what comes next (hence next token prediction). Any deeper than that then you're beginning to think about how to build transformer tech not what they are doing, which is ok if that's your thing.
@@justinpeter5752in a neural net anything that is called a parameter is set through a process called back propagation. The attention scores are parameters, and are set in this way. The way that it works is that going backwards through the neural net you send corrections from your training data, and the amount each node’s value changes is determined using calculus based on what its function is, and all the nodes attached to it. The main purpose of doing all this with PyTorch is that it automatically sets up all that calculus, for example that softmax function’s in the video: that function’s derivative gets chained together with all the other functions in the neural net to know how much to alter those weights when it’s exposed to training data
These white board style videos are really helpful. Keep it up! You got a subscriber in me and I look forward to seeing the grow!
Thank you! I really appreciate hearing that :). It's helpful to hear any sort of feedback, please keep it coming!