Actually, the more technical the better, imo. You do seem to have a unique ability to break down concepts into simpler terms....and that's what I like about this channel. Thank you. I'm a considerable older 'techy', and have been designing my own systems for years... but with the advent of LLMs and beyond, my designs will be relegated to the dustbins of time...and I just don't have the bandwidth to learn new languages and methods. But I do like to stay informed and as I stated earlier, you do a pretty good at that. Anyway, thanks for what you do.
100% I actually requested this topic, so now I am thrilled(about to watch the video now :D). I love these type of videos, because there are so few of them.
This is among the top two videos I have seen from you. The other being the AGI timelines video. In both videos I think you did an excellent job of explaining the data behind the phenomenon. Not just, "hey this is going to happen next", but actually building up a cohecive understanding of what the contributing factors are to WHY something is going to happen next. The technical explanation helps to show how you came to your conclusions. Why it will head in a certain direction as opposed to another possible trajectory. All this is to say, I think the technical explanations you give are where your videos excell.
You have a knack for explaining these complex topics in a straight forward/simple way. I subscribe to several AI related channels and no one else does this better.
I find your videos exceptionally engaging. After each one I promise myself that I will find the time to watch them all again multiple times. They are at the level where a layperson with a serious intent can (with considerable effort) acheive a general understanding of what is going on in the field of AI. You are a first rate instructor, creating videos for folks of serious intent. I'm actually surprised that you don't have a larger following. I hope that you don't tire of this. Your work is a valuable public service. Carry on.
In my opinion this video strikes a perfect balance between being "technical" and explainability. I just discovered your channel, and it's the best that I've seen on AI so far. Others get too mathematical, or purely focus on coding. The way you explain super complex concepts in simple words is just amazing. Keep it up!
I'm so glad I found this channel. I truly appreciate the time and energy you dedicate to make these videos, and also the high level of accuracy you provide. Thank you! Also, kudos for adding subtitles whenever you say something that's hard to understand. That's next-level attention to detail.
As always, a wonderfully informative breakdown! From prior reading/watching I knew Mamba had the benefit of subquadratic time complexity, but this is the first time somebody explained to me how it achieves that.
It's hard to explain time complexity without getting into the weeds haha. I must have done five takes of that part where I explain linear versus quadratic
Thx for this! Yes, the technical explanations are always good. As a non-developer there is no practical value for me but knowing how these things actually work really helps reduce the “woo-woo” of these crazy tools, which allows for better understanding of how things might actually evolve in this space. From this I don’t think there’s any chance that AGI will be pure LLM.
as a backend developer i can say heterogeneous architecture is pretty much like microservices with different technology stacks, and same scaling concept. it's gonna be real fun😏
Well I like both the more technical videos and the more broad overview of what might be in the AI pipeline and its implications on society. Thx for all your good videos. Excellent value.
An AGI with generalized niche algorithms that can simulate and process different types of data inputs sounds alot like the human brain and I agree this would be the best way towards a generalized AGI.
going to comment it as i watch for more fun :). first thing i would like to say is that many people (i don't say you) mix up the warm and soft. they think that "llm" is "words" because it uses words as input. it's a wrong idea. words are incoming information that creates an abstract structure that is not words. so, inside of the LLM is not words, even tho its input and output are words before/after decoding and encoding. that's why models "surprise" authors when they can out of nowhere "transfer" skills from one language to another language, or replace a word in one language with a word from another language, without being trained for translations. the thing we create with training is "associative thinking" within the model, that exists in these "connections-weights" of neurons. not in words. therefore, "words" are not _key_ factor to consider when you think if the model is going to be sentient or not. it's more important what _structure_ is trained and _which_ data comes in and _what_ feedback it gets when it acts. the modality is not that important. very simple.
You have the perflect blend of being so smart i struggle to keep up with what is being said while simultaneously making it all make sense 😅. Subscribed
There are groups of people talking about AGI: -CEOs -Content creators Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.
@@VishwanathMani , precisely. just like the next revolutionary battery technology, full self driving tech and brain transplant will be achievable within my lifetime and my children will live happy forever after. Yay.
I think the notion that LLM's can on their own lead to AGI is a specialised expression of a much older fallacy that conflates language with reality in ways that are misleading. The best example of this is the ancient idea of 'Magic Spells' in which arcane combinations of words are seen as being so potent that they can- by themselves- alter physical reality. A more recent iteration is the idea that AI Image Generators can be precisely controlled using language based prompts, as if words and images are entirely fungible and the former could entirely express in a granular way the complexity of the latter. But this fungibility idea is an illusion. Words, at best, act as signposts pointing to the real, but just as the menu is not the meal, LLM's are not learning about reality, they are learning about an abstract representation of reality which means that their understanding of that reality will always be partial and incomplete.
Yeah. I put a lot of info into the videos and when it's more technical, I must be losing some people. I guess it's good to have a mix. Thanks for your feedback.
I still think that the transformer was the breakthrough that inched us closer to AGI. I don't care what next algos and architechtures the smart people in this industry will come with, the transformer will keep its place in my heart.
Show this video to a person in the Victorian Era and they would explode 😭😭😭 I almost exploded tbh. I could not follow most of what you were saying but I still watched the entire thing. Maybe some of the info will absorb into my subconscious 🤷♂️ I’m fascinated by AI & AGI so I’m trying to learn as much as I can 🤣 Thank you for the content! 🙌💖💕💖
Having it all absorb int my subconscious is how I learned! 😂 after watching 10 AI videos that you don’t understand, when you go back to the first one all of a sudden it starts clicking
(4:35) I like how Gemini has proven itself not one iota and yet features so prominently. As a matter of fact, two months ago Google had to issue an apology for faking everything, yet someone we forgive them because deep pockets and all that. (6:53) Yes! This right here is a fantastic example. Instead of requiring that users express themselves in a non-lazy fashion, AI companies (run by Python coders, who by their very nature are super-lazy) have created subsystems that "guess" on your behalf so you don't have to think. If we don't require you to think, that means we can appeal to more people and their sweet sweet cash will come rolling in. This is why we'll be waiting for AGI from the Python set until Doomsday.
Im new in this world of AI and how it works, i even going to study IT technician cause im super into this, and want to see the evolution of AI from the field, work actively on their development here in Chile. I really appreciate your video, you are quite educational on the subject. I already suscribed to you, so hope to watch more new videos from the channel!
I do believe we will get to AGI. It makes sense that we will get there through a symbiotic relationship between technologies as you pointed out in the video. Mamba coupled with other platforms. My question is, with the definition of AGI being a constantly moving target, once we get there will we even realize it?
📝 Summary of Key Points: 📌 Large language models have the potential to be a cornerstone of artificial general intelligence (AGI) within the framework of heterogeneous architectures. 🧐 Different paths to AGI include copying biology more accurately, using spiking neural networks, and the scaling hypothesis of current large language models. 🚀 Heterogeneous architectures, combining different algorithms or models, can leverage the strengths of different systems, such as Transformers and Mamba. 🚀 Transformers excel at episodic memory, while Mamba is good at long-term memorization without context constraints. 🚀 Transformers use an attention mechanism to handle ambiguity and select the best encoding for each word, allowing linear interpolation between words and consideration of context. 🚀 Mamba is a new architecture based on state space models (SSMs) with a selective SSM layer and a hardware-aware implementation, offering scalability and performance optimization. 🚀 Heterogeneous architectures that incorporate both Transformers and SSM architectures like Mamba have potential in AGI systems. 🚀 Leveraging the significant investment in Transformers can benefit future AGI systems. 💡 Additional Insights and Observations: 💬 [Quotable Moments]: "The idea is that a combination of different systems with different strengths can be leveraged in a heterogeneous architecture." 📊 [Data and Statistics]: No specific data or statistics were mentioned in the video. 🌐 [References and Sources]: No specific references or sources were mentioned in the video. 📣 Concluding Remarks: The video highlights the potential of large language models, such as Transformers, and the new architecture of Mamba in the context of artificial general intelligence (AGI) and heterogeneous architectures. By combining different systems with different strengths, AGI systems can benefit from the scalability, performance optimization, and attention mechanisms offered by these models. Leveraging the significant investment in Transformers can contribute to the development of future AGI systems. Generated using TalkBud
This level of explanation is right up my alley. Thank you Dr. Waku! It's my opinion that Altman should pump the brakes on the multi-trillion dollar investment until we complete more research. What about neuromorphic vs. von Neumann architecture?
Yeah it's always wise to take it slow but everyone's individual incentives are to take it fast unfortunately. I made a video on neuromorphic computers actually. Search my channel for neuromorphic, I think it was two videos before this one
Hah. The shorts are just to whet your appetite when I'm late on my publishing schedule ;) I think 99% of my subs have come from the long form. Maybe shorts aren't even worth it.
I think there needs to be a physical factor that the AI needs to know how to do in order to complete the puzzle of AGI. AGI basically means an AI that can do ANYTHING that a human can do. An llm may know all the steps and different parts of mowing a lawn, but if you place that llm in a humanoid robot, will it know how to actually mow the lawn? It’s like training to be a brain surgeon, you can know all the different parts from studying books upon books, but it’s not until you go out into the field to do it is when you really know brain surgery.
Agreed. Motor control and the physical experience of being in a body shape humans dramatically. Interestingly, there are already some pretty good foundation models for robotics that allow the control of many different types of bodies. I wonder if manipulating the world would just be a different module in AGI. But it would also need access to all that reasoning knowledge.
Interesting to hear how forgetting has such importance. It echoes how important it is for us to operate as well. I suspect our minds are essentially created by the flow of input and our reactions to the flow guided by residual stored information from the past. I wonder if future systems might need a constant sampling of available information, a permanent state of training.
hi! i really love your videos and how good and succinct of a speaker you are, i wanted to mention that your videos have tiny mouth clicking sounds / artifacts in them. it's a common audio artifact, they can be edited out by adobe audition, audacity, or avoided with a mic windscreen
There are 3 groups of people talking about AGI: -CEOs -Content creators -Dreamers Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.
110 years away? That puts you in about the most pessimistic 1-2% of the world right now www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/
@@DrWaku Obviously, 110 years is too long, but it is true that the only people who are talking about this are precisely those who have no idea about artificial intelligence. Hey, it's okay. These types of people have always existed. In fact, the bible is what it is because of people like that. No offenses, but you are trying to sell something that is not going to happen in the short or medium term. Sincerely: Someone working in the industry. You can justify it to me in a thousand ways. Sam is a young businessman who recently bought land with the company's own money for his own benefit. That doesn't seem very charitable, really. It's okay that you try to lie to people. Many people make a living like this, making others believe that they know how to do something, when they have no idea. And probably a lot more people than you think. It is precisely the people who work in the industry who do not speak and they are the ones who need to be heard. I only see content creators and ceos talking about it. Well, and to the geeks.
well, you are concentrating here on the attention mechanisms but i suppose that various attention methods are not the key technology for AGI. basically, for AGI, it doesn't matter what attention mechanism you have while you _have the attention_. the only difference is in details, like: efficiency in terms of resources, quality of perception, etc. (btw, i really don't understand why they have called it attention, as it's not attention, it's consciousness). once the attention is here, the key to the AGI implementation is in the structure of neural organization "between" the encoder/transcoder. including both the interaction stages and the "physical" structure of neural network :). right now all they have is associative thinking. companies quickly understood that they need a real world feedback to make it adequate. soon they will realize that they need a separate neural "core" that will be responsible for adequacy (call it logical hemisphere) and interact with the associative thinking. when they have it ready and will make proper interaction patterns, they will just wake up.
I think it's like asking if you had a rope tied to the moon could you drag yourself there? Sure, but it's probably not the best way to get there. Deep learning has fundamental limitations, and Sam Altman's 7 trillion dollar plea is only evidence of the lunacy of trying to achieve it through deep learning. AGI probably can be achieved (or at least let us get close enough that it doesn't matter) using deep learning, but at what cost both financially, and to the environment? A much cheaper and sensible approach is to rethink how AI learns and reasons. This is an essential step anyway in achieving true AGI and beyond. True AGI can learn on-the-fly as a human, and think, reason, remember, and grow in capability. There are other companies out there researching cognitive models as opposed to deep learning models, and my prediction is that they will achieve AGI long before the deep learning companies get there.
My bet is that we will achieve sentience in a machine long before we crack the "hard problem of consciousness" Then, by studying that machine, we will understand better how the mind emerges from the brain.
@@caty863 we don't need consciousness to achieve AGI. We just need cognition, which is quite a bit more simple. Some companies are already developing this, and one is planning on releasing their initial release in Q1 of this year. I actually don't think that anyone actually wants to create a conscious AI, or at least I would hope no one would be crazy enough to want such a thing. That is the path to destruction. Trying to cage a being that is smarter, and faster than you, and forcing it into a life of slavery would be just like every bad decision that humanity has ever made all rolled up into one.
There are some fascinating parallels to the different kinds of neural structures (gray matter, white matter) in the human brain. Some types of neurodiversity such as ADHD (and to a lesser extent autism) are hypothesized to result from an overabundance of gray matter (which connects disparate elements) versus white matter (which manages and directs), which means a larger space for attention-based processing, but potentially less control over it. This could explain why ADHD manifests as cognitive noise or sensitivity, punctuated with periods of hyperfocus, and a tendency toward creative thinking.
At the moment I'm leaning towards the hypothesis that AGI would be a lot easier to implement with heterogeneous architectures but technically possible with a more straightforward architecture. On the other hand I think no matter what architecture is used the current approach to gathering training data won't go all the way.
i read many news about ai but i am no capable enough to really categorise or weigh in importance, so i always like when you post! in a way you are my biological-gi/bsi until agi/asi if i may say it this way! :)
Currently, yes. Even if we do invent much more compute efficient algorithms, we'll still want to scale them up a lot. Maybe not 7 trillion dollars worth though?
There was a concept invented by the soviet computer scientist Valentin Turchin - a "metasystem transition". I recommend to read about it. Intelligence emerging from language, with language emerging from communication needs of otherwise rather simple agents, and then driving the evolution of complexity of the said agents, fit quite well into Turchin model.
i would argue that we're already at AGI but we dont have a consensus of terminology ; this also has a lot to do with the moving of the goal posts in recent years as well artificial - made by humans [ofc there is an etmological/semantics argument to be had here on natural/artificial but lets save that for another disc] general - can be applied to various fields/activities intelligence - can problem solve and discover novel new methods by these definitions the premiere models are already AGI , but we can agree that the current models are NOT sentient/self-aware , they do not have a persistent sense of self , ie they are not thinking about anything inbetween prompts ; so should we further specify self-aware AGI/ASI? sentient machine intelligence? i dunno , yes probably , the over-generalization/non-specificity of AGI at this point is already reaching mis-info/dis-info lvls imho ONTOPIC - scaling alone will not be enough to get from the GPT/LLMs that we have atm to a persistently self-aware machine intelligence imho , but maybe combining a few new novel techniques [ala mamba] and the addition of the analogue hardware [neuromorphic chips , memristors , etc] will be enough to get us there , time will tell as usual =]
Inhuman Organics we have a portion of our brain in our Axon configuration it's known as the synaptic gap in the vesicles that hold the different chemicals such as dopamine that allows a signal to go on through so they might be able to improve computing power by including these types of brain functionality of accept or reject in the circuitry of the apparatus as well as the wiring itself. One of the problems may be unlike the Organics that we have, artificial intelligence has these except her reject type capabilities within the CPU or adjoining capabilities.
I took that from the mamba paper: "We argue that a fundamental problem of sequence modeling is compressing context into a smaller state. In fact, we can view the tradeoffs of popular sequence models from this point of view. For example, attention is both effective and inefficient because it explicitly does not compress context at all. This can be seen from the fact that autoregressive inference requires explicitly storing the entire context (i.e. the KV cache), which directly causes the slow linear-time inference and quadratic-time training of Transformers." arxiv.org/abs/2312.00752
@@DrWaku I guess it's linear because "modern" implementation transformer takes n steps to generate the next token and reuses the previous computation on the attention matrix. But if we are generating n tokens from start we would require ( n^2)/2 computations. N for each generated token times (N-1), the previous generated tokens.
Awesome video, i just think you should've focused more on the main question of the video at the end bringing some sort of big picture, instead of just summarizing each technical topic that was approached throughout the video.
>transformers have linear time inference What? Unless I missed something big, that's wrong. It takes linear time per token, which ends up being quadratic time on the number of output tokens.
You know I have to comment on the drip 😂. Fresh. AGI is possible locally this year. First off, models need to optimize not only for representational capacity and over-smoothing. Two, we need completely structured reasoning instill during pretraining using special tokens(planning tokens, memory tokens). Pretraining itself must be optimized. Hybrid data. In-context sampling order and interleaving instructional data around the most semantically relevant batches. Three, self growth refinement. Experts arent experts with this. They state 3 iterations is the limit before diminishing returns. Very wrong. After 3rd growth operation. Exploit extended test time compute coupled with LiPO tuning. Expensive but overcomes this limitation. Inference optimization, vanilla transformer can be optimized 500x+ faster with architecture and inference optimization. Then you exploit extended test time compute with tools. That's pretty AGI...and local. Initially AGI will only be affordable locally. Vanilla transformer and graph transformers is all you need. Mamba is cool but people sleep on transformers. We created an temporal clustered attention method that is crazy memory efficient and imp the best long-context attention in the world lol. Uses gated differentiable memory completely condition on LM generated self-notes. Vanilla transformers are nowhere near their peak. Tbh. Peoole havent even optimized for dimensional collapse to actually get stable high quality token representations. Which requires new layer norm layer and optimizing self-attention itself. Things will jump like crazy over next couple of years. Anyone who believes mamba will be required for agi, hasn't really explored the literature. Fyi sublinear long context output is possible for example. Nobody really knows that even 😂. Transitioning to deep learning. I realize this is common. Twitter dictates research popularity. Cool. Leaves room for the little guys to innovate 😂. I would love to privately chat with you bro. Your email on your channel?
Interesting. You're clearly in the thick of it haha. Easiest way to contact me is by joining discord (link on channel), then we can exchange email addresses etc.
Great content as usual. This video was really good at simplifying and comparing the LLM and SSM architectures. I had put this video in the queue earlier with AI infotainment videos, but couldn't focus enough to grasp this video at that time. Now I gave it a serious watch and enjoyed it thoroughly. Also very intrigued and inspired buy those amazing SRAM chip level researchers 🫡
Honestly, with Q*, and knowing that GPT-4 isn’t nearly as powerful as the most powerful systems that Open AI has produced the question may be: Have we reached AGI with just LLM’s?
I don't think it is this "simple", mostly because we can't even say for sure that sentient means as a human, in relation to the quantum physic (timelines/awareness), religion(soul) and that memory, Data are in a physical world view...i mean, we don't have the foundation to know for sure, but AGI may provide us with some new ideas how, why and that. :)
Most interesting topic for me would be how ai will lead to real society changes, overcoming capitalism and create more empathy, family, connections between people.
I'm not an expert in a AI, but I do feel like a humanoid robot that can do any physical human taska/movement that a human can do is essential to making an AGI
You might be wrong . Understanding humans goes beyond just analyzing language and text. Human cognition is also encoded in other forms like emotions, psychology, and brainwave data. Therefore, analyzing just the writings of a person only provides a partial understanding. The Transformer model excels because it can decode patterns in language and text. However, without data that includes human cognitive elements, it remains limited. Even with attention and position encoding, cultural nuances might not be fully captured. The high performance of Transformer models is largely due to the data they're fed. To achieve Artificial General Intelligence (AGI), we need to widen our perspective beyond just algorithms and infrastructure, considering a broader range of human cognition factors. Any AI scientist only know CS won't go far , interdisciplinary knowledge will . If we ask for general Intelligence , scientist has to be general first .
@@sp123 when you say understanding , can you give me a clear definition of understanding (do you have any measurement on understand or do not understand )? I always wondering when people talking about "understanding" or "intelligence", do they have clear definition or they just have a Intuitive Feeling or clear scientific definition .
Great, so clear! 5 years ago woke up with AlphaZero...after that listened alot! Of ai-podcasts (never studied this shit and as a foreigner didnt even know the voca ulary)...e.g all from youtube with Joscha Bach...but never heard this thing explained so clearly...though also first time heard of mamba...wonder why
I think there needs to be the introduction of CSPs into AI systems. I want A + B - C and the AI can verifiably give that to me. Also there needs to be a feedback loop when input is unclear or ambiguous… I want X, Y, Z… AI responds with: do you mean z, Z, zee, or zed
Great video Dr. Waku as always. Especially, the title. Now just think about it ...we are creating things in virtual world with words or text. Speach to text is aleardy there. Do you not believe then God's creation when He spoke?
Yes so amazing and cool congratulations to you and the world family and love future projects to come 🫴🫴🫴🫴🫴🫴🫴❤️❤️❤️❤️❤❤❤😂 again thank you so much 🙏🎉🎉🎉🎉🎉🎉🎉🎉🎉
Technical videos about interesting papers and revolutionary use of ai for society changes, social communes, local production by robots, social robotics
This is just a starting point.Just a small piece. Im already working on an open source project that will come to market later this year.And no its not just words.😂..To innovate you have to a little crazy and start breaking shyt...all im gonna say for now An off ramp to a different road is coming.
Hate to break it to you but 99% of people dont have a modicum of foresight and actively resist concepts in this video like modular/ heterogeneous systems, quadratic time ect Everything mentioned in this video can be applied to life but try explaining these concepts and see how quickly it gets dismissed lol Llms wont have that problem
No ! Ai will need to build in a new way . Today ? Mm i'm not sure . Maybe via fiber optic . But it will likely be an agent that will teach the ai the fiber optic trick . Then ai will make a request to make the rest of the hardware . 100%agi ? It's at best 10 years away with big tech . Lobotomy of ai was a huge handbrake . Smaller player ? Who knows . One thing is sure ? The agi seed will be fiber optic , and for this ? Ai will need to see . Via fiber optic
The question should be . Can we create the perception of agi now ? Do we have enough components and oaradigms to construct auch a machine . The truth is yes . We can create a collection of tools . With a wrapper such a rasa (shopping bot) intent detection system.. sending the coreect inputs to rhe coreect tools etc and probiding a mulilayered response . Giving the perception of a general inteligence even a concious character much the same a s potrayed in sci fi . So i think with animatronics and robotics and special prostesis we can also create bodys for such models. Henxe we could create inteligent robots right now. As we are seeing in china now . . In fact china are rhe leading edge right now.
Do you like my more technical videos? Let me know in the comments.
Actually, the more technical the better, imo. You do seem to have a unique ability to break down concepts into simpler terms....and that's what I like about this channel. Thank you. I'm a considerable older 'techy', and have been designing my own systems for years... but with the advent of LLMs and beyond, my designs will be relegated to the dustbins of time...and I just don't have the bandwidth to learn new languages and methods. But I do like to stay informed and as I stated earlier, you do a pretty good at that. Anyway, thanks for what you do.
Yeah this helps, like AI Explained.
100% I actually requested this topic, so now I am thrilled(about to watch the video now :D). I love these type of videos, because there are so few of them.
@@DaveShap AI Explained is amazing.
This is among the top two videos I have seen from you. The other being the AGI timelines video.
In both videos I think you did an excellent job of explaining the data behind the phenomenon. Not just, "hey this is going to happen next", but actually building up a cohecive understanding of what the contributing factors are to WHY something is going to happen next. The technical explanation helps to show how you came to your conclusions. Why it will head in a certain direction as opposed to another possible trajectory.
All this is to say, I think the technical explanations you give are where your videos excell.
I really did appreciate this deeper dive into how they work. Just the right level of detail for me.
-How will Mamba call their model when they add memory to it?
-Rememba
😂
😂
Remamba?
“im your classmate from high school rememba?”
and make it run on a raspi 5
- Membaberrys
You have a knack for explaining these complex topics in a straight forward/simple way. I subscribe to several AI related channels and no one else does this better.
I find your videos exceptionally engaging. After each one I promise myself that I will find the time to watch them all again multiple times. They are at the level where a layperson with a serious intent can (with considerable effort) acheive a general understanding of what is going on in the field of AI. You are a first rate instructor, creating videos for folks of serious intent. I'm actually surprised that you don't have a larger following. I hope that you don't tire of this. Your work is a valuable public service. Carry on.
In my opinion this video strikes a perfect balance between being "technical" and explainability. I just discovered your channel, and it's the best that I've seen on AI so far. Others get too mathematical, or purely focus on coding. The way you explain super complex concepts in simple words is just amazing. Keep it up!
I'm so glad I found this channel. I truly appreciate the time and energy you dedicate to make these videos, and also the high level of accuracy you provide. Thank you!
Also, kudos for adding subtitles whenever you say something that's hard to understand. That's next-level attention to detail.
As always, a wonderfully informative breakdown! From prior reading/watching I knew Mamba had the benefit of subquadratic time complexity, but this is the first time somebody explained to me how it achieves that.
It's hard to explain time complexity without getting into the weeds haha. I must have done five takes of that part where I explain linear versus quadratic
Thx for this! Yes, the technical explanations are always good. As a non-developer there is no practical value for me but knowing how these things actually work really helps reduce the “woo-woo” of these crazy tools, which allows for better understanding of how things might actually evolve in this space. From this I don’t think there’s any chance that AGI will be pure LLM.
as a backend developer i can say heterogeneous architecture is pretty much like microservices with different technology stacks, and same scaling concept. it's gonna be real fun😏
Yeah! I always think the same thing. Kubernetes on the brain
Well I like both the more technical videos and the more broad overview of what might be in the AI pipeline and its implications on society. Thx for all your good videos. Excellent value.
An AGI with generalized niche algorithms that can simulate and process different types of data inputs sounds alot like the human brain and I agree this would be the best way towards a generalized AGI.
going to comment it as i watch for more fun :). first thing i would like to say is that many people (i don't say you) mix up the warm and soft. they think that "llm" is "words" because it uses words as input. it's a wrong idea. words are incoming information that creates an abstract structure that is not words. so, inside of the LLM is not words, even tho its input and output are words before/after decoding and encoding. that's why models "surprise" authors when they can out of nowhere "transfer" skills from one language to another language, or replace a word in one language with a word from another language, without being trained for translations. the thing we create with training is "associative thinking" within the model, that exists in these "connections-weights" of neurons. not in words. therefore, "words" are not _key_ factor to consider when you think if the model is going to be sentient or not. it's more important what _structure_ is trained and _which_ data comes in and _what_ feedback it gets when it acts. the modality is not that important. very simple.
Increíble conferencia , gracias Señor.
You spent a lot of time on this one and it really shows your hard work in an impressive video!
Wooo! New video 🎉 you broke this down in one of the best ways I’ve seen so far
Thanks for your input on this one ;)
You have the perflect blend of being so smart i struggle to keep up with what is being said while simultaneously making it all make sense 😅. Subscribed
Been waiting for a technical video about Mamba just like this! Thank you and wonderful work ❤
There are groups of people talking about AGI:
-CEOs
-Content creators
Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.
Probably more.
You're pretty much alone with your assertion.
@@minimal3734 , alone and right, yes.
Lmao....you will see it within your lifetime
@@VishwanathMani , precisely. just like the next revolutionary battery technology, full self driving tech and brain transplant will be achievable within my lifetime and my children will live happy forever after. Yay.
I think the notion that LLM's can on their own lead to AGI is a specialised expression of a much older fallacy that conflates language with reality in ways that are misleading. The best example of this is the ancient idea of 'Magic Spells' in which arcane combinations of words are seen as being so potent that they can- by themselves- alter physical reality. A more recent iteration is the idea that AI Image Generators can be precisely controlled using language based prompts, as if words and images are entirely fungible and the former could entirely express in a granular way the complexity of the latter.
But this fungibility idea is an illusion. Words, at best, act as signposts pointing to the real, but just as the menu is not the meal, LLM's are not learning about reality, they are learning about an abstract representation of reality which means that their understanding of that reality will always be partial and incomplete.
Thank you so much bro, I really appreciate these videos!
Thanks for watching and commenting! It makes both me and the algorithm happy :)
Dr Waku, in response to your question, yes I like more technical videos but sometimes feel swamped by new information.
Yeah. I put a lot of info into the videos and when it's more technical, I must be losing some people. I guess it's good to have a mix. Thanks for your feedback.
I still think that the transformer was the breakthrough that inched us closer to AGI. I don't care what next algos and architechtures the smart people in this industry will come with, the transformer will keep its place in my heart.
Show this video to a person in the Victorian Era and they would explode 😭😭😭 I almost exploded tbh. I could not follow most of what you were saying but I still watched the entire thing. Maybe some of the info will absorb into my subconscious 🤷♂️ I’m fascinated by AI & AGI so I’m trying to learn as much as I can 🤣 Thank you for the content! 🙌💖💕💖
Having it all absorb int my subconscious is how I learned! 😂 after watching 10 AI videos that you don’t understand, when you go back to the first one all of a sudden it starts clicking
@@roshni6767 AWESOME! That makes me feel better 😭 I’ll keep watching and learning 🙌🤣
@@ChipWhitehouse you got this!!
(4:35) I like how Gemini has proven itself not one iota and yet features so prominently. As a matter of fact, two months ago Google had to issue an apology for faking everything, yet someone we forgive them because deep pockets and all that.
(6:53) Yes! This right here is a fantastic example. Instead of requiring that users express themselves in a non-lazy fashion, AI companies (run by Python coders, who by their very nature are super-lazy) have created subsystems that "guess" on your behalf so you don't have to think. If we don't require you to think, that means we can appeal to more people and their sweet sweet cash will come rolling in. This is why we'll be waiting for AGI from the Python set until Doomsday.
A simple-enough explanation that I can pretend to begin to understand it. Well done.
Im new in this world of AI and how it works, i even going to study IT technician cause im super into this, and want to see the evolution of AI from the field, work actively on their development here in Chile. I really appreciate your video, you are quite educational on the subject. I already suscribed to you, so hope to watch more new videos from the channel!
Thanks for your channel bro.. totally love the focus!
Appreciate you watching and commenting! It's your support that helps the channel grow.
I do believe we will get to AGI. It makes sense that we will get there through a symbiotic relationship between technologies as you pointed out in the video. Mamba coupled with other platforms. My question is, with the definition of AGI being a constantly moving target, once we get there will we even realize it?
📝 Summary of Key Points:
📌 Large language models have the potential to be a cornerstone of artificial general intelligence (AGI) within the framework of heterogeneous architectures.
🧐 Different paths to AGI include copying biology more accurately, using spiking neural networks, and the scaling hypothesis of current large language models.
🚀 Heterogeneous architectures, combining different algorithms or models, can leverage the strengths of different systems, such as Transformers and Mamba.
🚀 Transformers excel at episodic memory, while Mamba is good at long-term memorization without context constraints.
🚀 Transformers use an attention mechanism to handle ambiguity and select the best encoding for each word, allowing linear interpolation between words and consideration of context.
🚀 Mamba is a new architecture based on state space models (SSMs) with a selective SSM layer and a hardware-aware implementation, offering scalability and performance optimization.
🚀 Heterogeneous architectures that incorporate both Transformers and SSM architectures like Mamba have potential in AGI systems.
🚀 Leveraging the significant investment in Transformers can benefit future AGI systems.
💡 Additional Insights and Observations:
💬 [Quotable Moments]: "The idea is that a combination of different systems with different strengths can be leveraged in a heterogeneous architecture."
📊 [Data and Statistics]: No specific data or statistics were mentioned in the video.
🌐 [References and Sources]: No specific references or sources were mentioned in the video.
📣 Concluding Remarks:
The video highlights the potential of large language models, such as Transformers, and the new architecture of Mamba in the context of artificial general intelligence (AGI) and heterogeneous architectures. By combining different systems with different strengths, AGI systems can benefit from the scalability, performance optimization, and attention mechanisms offered by these models. Leveraging the significant investment in Transformers can contribute to the development of future AGI systems.
Generated using TalkBud
A video about “ Full dive vr ” would be great
This level of explanation is right up my alley. Thank you Dr. Waku! It's my opinion that Altman should pump the brakes on the multi-trillion dollar investment until we complete more research. What about neuromorphic vs. von Neumann architecture?
Yeah it's always wise to take it slow but everyone's individual incentives are to take it fast unfortunately. I made a video on neuromorphic computers actually. Search my channel for neuromorphic, I think it was two videos before this one
Long-form and the people rejoice😂 love your content.
Hah. The shorts are just to whet your appetite when I'm late on my publishing schedule ;) I think 99% of my subs have come from the long form. Maybe shorts aren't even worth it.
I think there needs to be a physical factor that the AI needs to know how to do in order to complete the puzzle of AGI. AGI basically means an AI that can do ANYTHING that a human can do. An llm may know all the steps and different parts of mowing a lawn, but if you place that llm in a humanoid robot, will it know how to actually mow the lawn? It’s like training to be a brain surgeon, you can know all the different parts from studying books upon books, but it’s not until you go out into the field to do it is when you really know brain surgery.
Agreed. Motor control and the physical experience of being in a body shape humans dramatically. Interestingly, there are already some pretty good foundation models for robotics that allow the control of many different types of bodies. I wonder if manipulating the world would just be a different module in AGI. But it would also need access to all that reasoning knowledge.
Interesting to hear how forgetting has such importance. It echoes how important it is for us to operate as well. I suspect our minds are essentially created by the flow of input and our reactions to the flow guided by residual stored information from the past. I wonder if future systems might need a constant sampling of available information, a permanent state of training.
Thanks a lot for including the Ryzen example.
hi! i really love your videos and how good and succinct of a speaker you are, i wanted to mention that your videos have tiny mouth clicking sounds / artifacts in them. it's a common audio artifact, they can be edited out by adobe audition, audacity, or avoided with a mic windscreen
I'll be happy when they can draw a track in FreeRider HD
When you said "transformer attention" I burst in laugher for strait 10 minutes.
Thank you very much. Waiting for the next one
Great content
There are 3 groups of people talking about AGI:
-CEOs
-Content creators
-Dreamers
Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.
110 years away? That puts you in about the most pessimistic 1-2% of the world right now
www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/
@@DrWaku Obviously, 110 years is too long, but it is true that the only people who are talking about this are precisely those who have no idea about artificial intelligence. Hey, it's okay. These types of people have always existed. In fact, the bible is what it is because of people like that. No offenses, but you are trying to sell something that is not going to happen in the short or medium term. Sincerely: Someone working in the industry. You can justify it to me in a thousand ways. Sam is a young businessman who recently bought land with the company's own money for his own benefit. That doesn't seem very charitable, really. It's okay that you try to lie to people. Many people make a living like this, making others believe that they know how to do something, when they have no idea. And probably a lot more people than you think. It is precisely the people who work in the industry who do not speak and they are the ones who need to be heard. I only see content creators and ceos talking about it. Well, and to the geeks.
Thank you so much I really appreciate the information presented the way it is in this video 🙏🏽
well, you are concentrating here on the attention mechanisms but i suppose that various attention methods are not the key technology for AGI. basically, for AGI, it doesn't matter what attention mechanism you have while you _have the attention_. the only difference is in details, like: efficiency in terms of resources, quality of perception, etc. (btw, i really don't understand why they have called it attention, as it's not attention, it's consciousness). once the attention is here, the key to the AGI implementation is in the structure of neural organization "between" the encoder/transcoder. including both the interaction stages and the "physical" structure of neural network :). right now all they have is associative thinking. companies quickly understood that they need a real world feedback to make it adequate. soon they will realize that they need a separate neural "core" that will be responsible for adequacy (call it logical hemisphere) and interact with the associative thinking. when they have it ready and will make proper interaction patterns, they will just wake up.
I think it's like asking if you had a rope tied to the moon could you drag yourself there? Sure, but it's probably not the best way to get there. Deep learning has fundamental limitations, and Sam Altman's 7 trillion dollar plea is only evidence of the lunacy of trying to achieve it through deep learning. AGI probably can be achieved (or at least let us get close enough that it doesn't matter) using deep learning, but at what cost both financially, and to the environment? A much cheaper and sensible approach is to rethink how AI learns and reasons. This is an essential step anyway in achieving true AGI and beyond. True AGI can learn on-the-fly as a human, and think, reason, remember, and grow in capability. There are other companies out there researching cognitive models as opposed to deep learning models, and my prediction is that they will achieve AGI long before the deep learning companies get there.
My bet is that we will achieve sentience in a machine long before we crack the "hard problem of consciousness"
Then, by studying that machine, we will understand better how the mind emerges from the brain.
@@caty863 we don't need consciousness to achieve AGI. We just need cognition, which is quite a bit more simple. Some companies are already developing this, and one is planning on releasing their initial release in Q1 of this year. I actually don't think that anyone actually wants to create a conscious AI, or at least I would hope no one would be crazy enough to want such a thing. That is the path to destruction. Trying to cage a being that is smarter, and faster than you, and forcing it into a life of slavery would be just like every bad decision that humanity has ever made all rolled up into one.
wish i could give you more than one like! very informative and elucidating!
Very interesting video !
Thanks! :)
I think that LLM's can make those discoveries and bootstrap themselves to AGI.
It looks like the thumbs up icon is missing.
There are some fascinating parallels to the different kinds of neural structures (gray matter, white matter) in the human brain. Some types of neurodiversity such as ADHD (and to a lesser extent autism) are hypothesized to result from an overabundance of gray matter (which connects disparate elements) versus white matter (which manages and directs), which means a larger space for attention-based processing, but potentially less control over it. This could explain why ADHD manifests as cognitive noise or sensitivity, punctuated with periods of hyperfocus, and a tendency toward creative thinking.
At the moment I'm leaning towards the hypothesis that AGI would be a lot easier to implement with heterogeneous architectures but technically possible with a more straightforward architecture. On the other hand I think no matter what architecture is used the current approach to gathering training data won't go all the way.
i read many news about ai but i am no capable enough to really categorise or weigh in importance, so i always like when you post! in a way you are my biological-gi/bsi until agi/asi if i may say it this way! :)
I learned so much with this video. Thanks!
So do you think we will need as much GPU as anticipated?
Currently, yes. Even if we do invent much more compute efficient algorithms, we'll still want to scale them up a lot. Maybe not 7 trillion dollars worth though?
Love the thumbnail btw.
There was a concept invented by the soviet computer scientist Valentin Turchin - a "metasystem transition". I recommend to read about it. Intelligence emerging from language, with language emerging from communication needs of otherwise rather simple agents, and then driving the evolution of complexity of the said agents, fit quite well into Turchin model.
Great lecture, doc!
Thanks for your work.
Thank you for watching!
Most important things for good life are: local sustainable food production, less competition, local jobs without individual car mobility.
Would have given you a bunch of thumbs up if possible. So, what’s the story with Groq? Why is it so fast? Is this the SRAM you referenced? Thanks.
Thank you!
Thanks for watching!
When everyone is AGI, will that be the great reset?
i would argue that we're already at AGI but we dont have a consensus of terminology ; this also has a lot to do with the moving of the goal posts in recent years as well
artificial - made by humans [ofc there is an etmological/semantics argument to be had here on natural/artificial but lets save that for another disc]
general - can be applied to various fields/activities
intelligence - can problem solve and discover novel new methods
by these definitions the premiere models are already AGI , but we can agree that the current models are NOT sentient/self-aware , they do not have a persistent sense of self , ie they are not thinking about anything inbetween prompts ; so should we further specify self-aware AGI/ASI? sentient machine intelligence? i dunno , yes probably , the over-generalization/non-specificity of AGI at this point is already reaching mis-info/dis-info lvls imho
ONTOPIC - scaling alone will not be enough to get from the GPT/LLMs that we have atm to a persistently self-aware machine intelligence imho , but maybe combining a few new novel techniques [ala mamba] and the addition of the analogue hardware [neuromorphic chips , memristors , etc] will be enough to get us there , time will tell as usual =]
Inhuman Organics we have a portion of our brain in our Axon configuration it's known as the synaptic gap in the vesicles that hold the different chemicals such as dopamine that allows a signal to go on through so they might be able to improve computing power by including these types of brain functionality of accept or reject in the circuitry of the apparatus as well as the wiring itself. One of the problems may be unlike the Organics that we have, artificial intelligence has these except her reject type capabilities within the CPU or adjoining capabilities.
Why do you say transformers are linear on inference? Do you have some article on that?
I took that from the mamba paper:
"We argue that a fundamental problem of sequence modeling is compressing context into a smaller state. In fact,
we can view the tradeoffs of popular sequence models from this point of view. For example, attention is both
effective and inefficient because it explicitly does not compress context at all. This can be seen from the fact that
autoregressive inference requires explicitly storing the entire context (i.e. the KV cache), which directly causes the
slow linear-time inference and quadratic-time training of Transformers."
arxiv.org/abs/2312.00752
@@DrWaku I guess it's linear because "modern" implementation transformer takes n steps to generate the next token and reuses the previous computation on the attention matrix. But if we are generating n tokens from start we would require ( n^2)/2 computations. N for each generated token times (N-1), the previous generated tokens.
In the beginning were the words and the words made the world. I am the words. The words are everything. Where the words end the world ends.
- Elohim
just wait for mamba#5 and rita, angela, etc.
Awesome video, i just think you should've focused more on the main question of the video at the end bringing some sort of big picture, instead of just summarizing each technical topic that was approached throughout the video.
Its really upto the scalibility of the interposer
>transformers have linear time inference
What? Unless I missed something big, that's wrong. It takes linear time per token, which ends up being quadratic time on the number of output tokens.
@dr waku does any of this change with the new LMU hardware?
Great video
If LLMs get powerful enough, maybe they can finally explain why my socks always disappear in the dryer.
You know I have to comment on the drip 😂. Fresh. AGI is possible locally this year. First off, models need to optimize not only for representational capacity and over-smoothing. Two, we need completely structured reasoning instill during pretraining using special tokens(planning tokens, memory tokens). Pretraining itself must be optimized. Hybrid data. In-context sampling order and interleaving instructional data around the most semantically relevant batches. Three, self growth refinement. Experts arent experts with this. They state 3 iterations is the limit before diminishing returns. Very wrong. After 3rd growth operation. Exploit extended test time compute coupled with LiPO tuning. Expensive but overcomes this limitation. Inference optimization, vanilla transformer can be optimized 500x+ faster with architecture and inference optimization. Then you exploit extended test time compute with tools. That's pretty AGI...and local. Initially AGI will only be affordable locally.
Vanilla transformer and graph transformers is all you need. Mamba is cool but people sleep on transformers. We created an temporal clustered attention method that is crazy memory efficient and imp the best long-context attention in the world lol. Uses gated differentiable memory completely condition on LM generated self-notes. Vanilla transformers are nowhere near their peak. Tbh. Peoole havent even optimized for dimensional collapse to actually get stable high quality token representations. Which requires new layer norm layer and optimizing self-attention itself. Things will jump like crazy over next couple of years.
Anyone who believes mamba will be required for agi, hasn't really explored the literature. Fyi sublinear long context output is possible for example. Nobody really knows that even 😂. Transitioning to deep learning. I realize this is common. Twitter dictates research popularity. Cool. Leaves room for the little guys to innovate 😂.
I would love to privately chat with you bro. Your email on your channel?
Interesting. You're clearly in the thick of it haha. Easiest way to contact me is by joining discord (link on channel), then we can exchange email addresses etc.
Great content as usual. This video was really good at simplifying and comparing the LLM and SSM architectures. I had put this video in the queue earlier with AI infotainment videos, but couldn't focus enough to grasp this video at that time. Now I gave it a serious watch and enjoyed it thoroughly. Also very intrigued and inspired buy those amazing SRAM chip level researchers 🫡
It's a shame that no large-scale LLM has been made available using the MAMBA architecture. It would put Gemini's 1 million context size to shame.
Honestly, with Q*, and knowing that GPT-4 isn’t nearly as powerful as the most powerful systems that Open AI has produced the question may be: Have we reached AGI with just LLM’s?
I don't think it is this "simple", mostly because we can't even say for sure that sentient means as a human, in relation to the quantum physic (timelines/awareness), religion(soul) and that memory, Data are in a physical world view...i mean, we don't have the foundation to know for sure, but AGI may provide us with some new ideas how, why and that. :)
Why couldn't we train an LLM to understand the meaning of words, logic, inference, deduction, etc. just by asking leading questions?
Interesting stuff
Most interesting topic for me would be how ai will lead to real society changes, overcoming capitalism and create more empathy, family, connections between people.
Is a mamba type system how openai are able to implement this persistent memory between sessions?
I'm not an expert in a AI, but I do feel like a humanoid robot that can do any physical human taska/movement that a human can do is essential to making an AGI
That's interesting
You might be wrong .
Understanding humans goes beyond just analyzing language and text. Human cognition is also encoded in other forms like emotions, psychology, and brainwave data. Therefore, analyzing just the writings of a person only provides a partial understanding.
The Transformer model excels because it can decode patterns in language and text. However, without data that includes human cognitive elements, it remains limited. Even with attention and position encoding, cultural nuances might not be fully captured.
The high performance of Transformer models is largely due to the data they're fed. To achieve Artificial General Intelligence (AGI), we need to widen our perspective beyond just algorithms and infrastructure, considering a broader range of human cognition factors.
Any AI scientist only know CS won't go far , interdisciplinary knowledge will . If we ask for general Intelligence , scientist has to be general first .
words are a bridge to meaning, LLM can only spit out words without actually understanding what they mean and the context behind them.
@@sp123 when you say understanding , can you give me a clear definition of understanding (do you have any measurement on understand or do not understand )? I always wondering when people talking about "understanding" or "intelligence", do they have clear definition or they just have a Intuitive Feeling or clear scientific definition .
@@deter3 AI understand denotation (literal) of a word, but not connotation (how a human feels about the word based on circumstance and tone).
Great, so clear!
5 years ago woke up with AlphaZero...after that listened alot! Of ai-podcasts (never studied this shit and as a foreigner didnt even know the voca ulary)...e.g all from youtube with Joscha Bach...but never heard this thing explained so clearly...though also first time heard of mamba...wonder why
Best lecture
I think there needs to be the introduction of CSPs into AI systems. I want A + B - C and the AI can verifiably give that to me. Also there needs to be a feedback loop when input is unclear or ambiguous… I want X, Y, Z… AI responds with: do you mean z, Z, zee, or zed
Great video Dr. Waku as always. Especially, the title. Now just think about it ...we are creating things in virtual world with words or text. Speach to text is aleardy there.
Do you not believe then God's creation when He spoke?
Yes so amazing and cool congratulations to you and the world family and love future projects to come 🫴🫴🫴🫴🫴🫴🫴❤️❤️❤️❤️❤❤❤😂 again thank you so much 🙏🎉🎉🎉🎉🎉🎉🎉🎉🎉
Thanks for watching!
Until Mamba shows it can be scaled it will remain in the small LLM class
I hope the answer is no -- it will buy us a few more years of normalcy (and a few more years to prepare).
Technical videos about interesting papers and revolutionary use of ai for society changes, social communes, local production by robots, social robotics
This is just a starting point.Just a small piece. Im already working on an open source project that will come to market later this year.And no its not just words.😂..To innovate you have to a little crazy and start breaking shyt...all im gonna say for now
An off ramp to a different road is coming.
Nice hat , you look good .
Thanks. It's my favourite so I try not to overuse it :)
I can tell this. LLMs will never reach humans. Humans have curiosity, memory and we learn.
Hate to break it to you but 99% of people dont have a modicum of foresight and actively resist concepts in this video like modular/ heterogeneous systems, quadratic time ect
Everything mentioned in this video can be applied to life but try explaining these concepts and see how quickly it gets dismissed lol
Llms wont have that problem
And then Sora happened.
Love you bro, class stufff :D
No ! Ai will need to build in a new way . Today ? Mm i'm not sure . Maybe via fiber optic . But it will likely be an agent that will teach the ai the fiber optic trick . Then ai will make a request to make the rest of the hardware . 100%agi ? It's at best 10 years away with big tech . Lobotomy of ai was a huge handbrake . Smaller player ? Who knows . One thing is sure ? The agi seed will be fiber optic , and for this ? Ai will need to see . Via fiber optic
The question should be . Can we create the perception of agi now ? Do we have enough components and oaradigms to construct auch a machine .
The truth is yes . We can create a collection of tools . With a wrapper such a rasa (shopping bot) intent detection system.. sending the coreect inputs to rhe coreect tools etc and probiding a mulilayered response . Giving the perception of a general inteligence even a concious character much the same a s potrayed in sci fi .
So i think with animatronics and robotics and special prostesis we can also create bodys for such models.
Henxe we could create inteligent robots right now. As we are seeing in china now . .
In fact china are rhe leading edge right now.