Can we reach AGI with just LLMs?

Dr Waku

มุมมอง 20 257

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 27 พ.ย. 2024

ความคิดเห็น • 188

@DrWaku 9 หลายเดือนก่อน ⁺⁹³
Do you like my more technical videos? Let me know in the comments.
@sgrimm7346 9 หลายเดือนก่อน ⁺¹⁸
Actually, the more technical the better, imo. You do seem to have a unique ability to break down concepts into simpler terms....and that's what I like about this channel. Thank you. I'm a considerable older 'techy', and have been designing my own systems for years... but with the advent of LLMs and beyond, my designs will be relegated to the dustbins of time...and I just don't have the bandwidth to learn new languages and methods. But I do like to stay informed and as I stated earlier, you do a pretty good at that. Anyway, thanks for what you do.
@DaveShap 9 หลายเดือนก่อน ⁺¹⁹
Yeah this helps, like AI Explained.
@torarinvik4920 9 หลายเดือนก่อน ⁺⁶
100% I actually requested this topic, so now I am thrilled(about to watch the video now :D). I love these type of videos, because there are so few of them.
@torarinvik4920 9 หลายเดือนก่อน ⁺⁴
@@DaveShap AI Explained is amazing.
@brandongillett2616 9 หลายเดือนก่อน ⁺⁶
This is among the top two videos I have seen from you. The other being the AGI timelines video.
In both videos I think you did an excellent job of explaining the data behind the phenomenon. Not just, "hey this is going to happen next", but actually building up a cohecive understanding of what the contributing factors are to WHY something is going to happen next. The technical explanation helps to show how you came to your conclusions. Why it will head in a certain direction as opposed to another possible trajectory.
All this is to say, I think the technical explanations you give are where your videos excell.
@brooknorton7891 9 หลายเดือนก่อน ⁺²⁶
I really did appreciate this deeper dive into how they work. Just the right level of detail for me.
@Slaci-vl2io 9 หลายเดือนก่อน ⁺⁸⁴
-How will Mamba call their model when they add memory to it?
-Rememba
😂
@DrWaku 9 หลายเดือนก่อน ⁺¹³
😂
@chrissscottt 9 หลายเดือนก่อน ⁺⁶
Remamba?
@Kutsushita_yukino 9 หลายเดือนก่อน ⁺¹
“im your classmate from high school rememba?”
@MrKrisification 9 หลายเดือนก่อน
and make it run on a raspi 5
- Membaberrys
@georgewashington7251 2 หลายเดือนก่อน ⁺¹
You have a knack for explaining these complex topics in a straight forward/simple way. I subscribe to several AI related channels and no one else does this better.
@RC-pg5sz 9 หลายเดือนก่อน ⁺⁶
I find your videos exceptionally engaging. After each one I promise myself that I will find the time to watch them all again multiple times. They are at the level where a layperson with a serious intent can (with considerable effort) acheive a general understanding of what is going on in the field of AI. You are a first rate instructor, creating videos for folks of serious intent. I'm actually surprised that you don't have a larger following. I hope that you don't tire of this. Your work is a valuable public service. Carry on.
@MrKrisification 9 หลายเดือนก่อน ⁺⁶
In my opinion this video strikes a perfect balance between being "technical" and explainability. I just discovered your channel, and it's the best that I've seen on AI so far. Others get too mathematical, or purely focus on coding. The way you explain super complex concepts in simple words is just amazing. Keep it up!
@happybydefault 8 หลายเดือนก่อน ⁺¹
I'm so glad I found this channel. I truly appreciate the time and energy you dedicate to make these videos, and also the high level of accuracy you provide. Thank you!
Also, kudos for adding subtitles whenever you say something that's hard to understand. That's next-level attention to detail.
@K4IICHI 9 หลายเดือนก่อน ⁺⁷
As always, a wonderfully informative breakdown! From prior reading/watching I knew Mamba had the benefit of subquadratic time complexity, but this is the first time somebody explained to me how it achieves that.
@DrWaku 9 หลายเดือนก่อน ⁺³
It's hard to explain time complexity without getting into the weeds haha. I must have done five takes of that part where I explain linear versus quadratic
@Paul_Marek 9 หลายเดือนก่อน ⁺⁶
Thx for this! Yes, the technical explanations are always good. As a non-developer there is no practical value for me but knowing how these things actually work really helps reduce the “woo-woo” of these crazy tools, which allows for better understanding of how things might actually evolve in this space. From this I don’t think there’s any chance that AGI will be pure LLM.
@saralightbourne 9 หลายเดือนก่อน ⁺⁷
as a backend developer i can say heterogeneous architecture is pretty much like microservices with different technology stacks, and same scaling concept. it's gonna be real fun😏
@DrWaku 9 หลายเดือนก่อน ⁺⁴
Yeah! I always think the same thing. Kubernetes on the brain
@benarcher372 9 หลายเดือนก่อน ⁺²
Well I like both the more technical videos and the more broad overview of what might be in the AI pipeline and its implications on society. Thx for all your good videos. Excellent value.
@ADHD101Thrive 9 หลายเดือนก่อน ⁺¹
An AGI with generalized niche algorithms that can simulate and process different types of data inputs sounds alot like the human brain and I agree this would be the best way towards a generalized AGI.
@issay2594 9 หลายเดือนก่อน ⁺²
going to comment it as i watch for more fun :). first thing i would like to say is that many people (i don't say you) mix up the warm and soft. they think that "llm" is "words" because it uses words as input. it's a wrong idea. words are incoming information that creates an abstract structure that is not words. so, inside of the LLM is not words, even tho its input and output are words before/after decoding and encoding. that's why models "surprise" authors when they can out of nowhere "transfer" skills from one language to another language, or replace a word in one language with a word from another language, without being trained for translations. the thing we create with training is "associative thinking" within the model, that exists in these "connections-weights" of neurons. not in words. therefore, "words" are not _key_ factor to consider when you think if the model is going to be sentient or not. it's more important what _structure_ is trained and _which_ data comes in and _what_ feedback it gets when it acts. the modality is not that important. very simple.
@les_crow 9 หลายเดือนก่อน ⁺⁷
Increíble conferencia , gracias Señor.
@KevinKreger 9 หลายเดือนก่อน ⁺²
You spent a lot of time on this one and it really shows your hard work in an impressive video!
@roshni6767 9 หลายเดือนก่อน ⁺³
Wooo! New video 🎉 you broke this down in one of the best ways I’ve seen so far
@DrWaku 9 หลายเดือนก่อน ⁺¹
Thanks for your input on this one ;)
@magicmarcell 9 หลายเดือนก่อน
You have the perflect blend of being so smart i struggle to keep up with what is being said while simultaneously making it all make sense 😅. Subscribed
@mmarrotte101 9 หลายเดือนก่อน ⁺¹
Been waiting for a technical video about Mamba just like this! Thank you and wonderful work ❤
@hydrohasspoken6227 9 หลายเดือนก่อน ⁺⁴
There are groups of people talking about AGI:
-CEOs
-Content creators
Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.
@raul36 9 หลายเดือนก่อน
Probably more.
@minimal3734 8 หลายเดือนก่อน
You're pretty much alone with your assertion.
@hydrohasspoken6227 8 หลายเดือนก่อน
@@minimal3734 , alone and right, yes.
@VishwanathMani 5 หลายเดือนก่อน
Lmao....you will see it within your lifetime
@hydrohasspoken6227 5 หลายเดือนก่อน
@@VishwanathMani , precisely. just like the next revolutionary battery technology, full self driving tech and brain transplant will be achievable within my lifetime and my children will live happy forever after. Yay.
@paulhiggins5165 9 หลายเดือนก่อน ⁺⁴
I think the notion that LLM's can on their own lead to AGI is a specialised expression of a much older fallacy that conflates language with reality in ways that are misleading. The best example of this is the ancient idea of 'Magic Spells' in which arcane combinations of words are seen as being so potent that they can- by themselves- alter physical reality. A more recent iteration is the idea that AI Image Generators can be precisely controlled using language based prompts, as if words and images are entirely fungible and the former could entirely express in a granular way the complexity of the latter.
But this fungibility idea is an illusion. Words, at best, act as signposts pointing to the real, but just as the menu is not the meal, LLM's are not learning about reality, they are learning about an abstract representation of reality which means that their understanding of that reality will always be partial and incomplete.
@reverie61 9 หลายเดือนก่อน ⁺⁵
Thank you so much bro, I really appreciate these videos!
@DrWaku 9 หลายเดือนก่อน ⁺¹
Thanks for watching and commenting! It makes both me and the algorithm happy :)
@chrissscottt 9 หลายเดือนก่อน ⁺²
Dr Waku, in response to your question, yes I like more technical videos but sometimes feel swamped by new information.
@DrWaku 9 หลายเดือนก่อน ⁺¹
Yeah. I put a lot of info into the videos and when it's more technical, I must be losing some people. I guess it's good to have a mix. Thanks for your feedback.
@caty863 9 หลายเดือนก่อน ⁺¹
I still think that the transformer was the breakthrough that inched us closer to AGI. I don't care what next algos and architechtures the smart people in this industry will come with, the transformer will keep its place in my heart.
@ChipWhitehouse 9 หลายเดือนก่อน ⁺¹
Show this video to a person in the Victorian Era and they would explode 😭😭😭 I almost exploded tbh. I could not follow most of what you were saying but I still watched the entire thing. Maybe some of the info will absorb into my subconscious 🤷‍♂️ I’m fascinated by AI & AGI so I’m trying to learn as much as I can 🤣 Thank you for the content! 🙌💖💕💖
@roshni6767 9 หลายเดือนก่อน
Having it all absorb int my subconscious is how I learned! 😂 after watching 10 AI videos that you don’t understand, when you go back to the first one all of a sudden it starts clicking
@ChipWhitehouse 9 หลายเดือนก่อน
@@roshni6767 AWESOME! That makes me feel better 😭 I’ll keep watching and learning 🙌🤣
@roshni6767 9 หลายเดือนก่อน
@@ChipWhitehouse you got this!!
@WifeWantsAWizard 9 หลายเดือนก่อน ⁺²
(4:35) I like how Gemini has proven itself not one iota and yet features so prominently. As a matter of fact, two months ago Google had to issue an apology for faking everything, yet someone we forgive them because deep pockets and all that.
(6:53) Yes! This right here is a fantastic example. Instead of requiring that users express themselves in a non-lazy fashion, AI companies (run by Python coders, who by their very nature are super-lazy) have created subsystems that "guess" on your behalf so you don't have to think. If we don't require you to think, that means we can appeal to more people and their sweet sweet cash will come rolling in. This is why we'll be waiting for AGI from the Python set until Doomsday.
@JonathanStory 9 หลายเดือนก่อน
A simple-enough explanation that I can pretend to begin to understand it. Well done.
@h.leeblanco 9 หลายเดือนก่อน ⁺¹
Im new in this world of AI and how it works, i even going to study IT technician cause im super into this, and want to see the evolution of AI from the field, work actively on their development here in Chile. I really appreciate your video, you are quite educational on the subject. I already suscribed to you, so hope to watch more new videos from the channel!
@LwaziNF 9 หลายเดือนก่อน ⁺²
Thanks for your channel bro.. totally love the focus!
@DrWaku 9 หลายเดือนก่อน ⁺¹
Appreciate you watching and commenting! It's your support that helps the channel grow.
@Sci-Que 9 หลายเดือนก่อน ⁺³
I do believe we will get to AGI. It makes sense that we will get there through a symbiotic relationship between technologies as you pointed out in the video. Mamba coupled with other platforms. My question is, with the definition of AGI being a constantly moving target, once we get there will we even realize it?
@abdelkaioumbouaicha 9 หลายเดือนก่อน ⁺²
📝 Summary of Key Points:
📌 Large language models have the potential to be a cornerstone of artificial general intelligence (AGI) within the framework of heterogeneous architectures.
🧐 Different paths to AGI include copying biology more accurately, using spiking neural networks, and the scaling hypothesis of current large language models.
🚀 Heterogeneous architectures, combining different algorithms or models, can leverage the strengths of different systems, such as Transformers and Mamba.
🚀 Transformers excel at episodic memory, while Mamba is good at long-term memorization without context constraints.
🚀 Transformers use an attention mechanism to handle ambiguity and select the best encoding for each word, allowing linear interpolation between words and consideration of context.
🚀 Mamba is a new architecture based on state space models (SSMs) with a selective SSM layer and a hardware-aware implementation, offering scalability and performance optimization.
🚀 Heterogeneous architectures that incorporate both Transformers and SSM architectures like Mamba have potential in AGI systems.
🚀 Leveraging the significant investment in Transformers can benefit future AGI systems.
💡 Additional Insights and Observations:
💬 [Quotable Moments]: "The idea is that a combination of different systems with different strengths can be leveraged in a heterogeneous architecture."
📊 [Data and Statistics]: No specific data or statistics were mentioned in the video.
🌐 [References and Sources]: No specific references or sources were mentioned in the video.
📣 Concluding Remarks:
The video highlights the potential of large language models, such as Transformers, and the new architecture of Mamba in the context of artificial general intelligence (AGI) and heterogeneous architectures. By combining different systems with different strengths, AGI systems can benefit from the scalability, performance optimization, and attention mechanisms offered by these models. Leveraging the significant investment in Transformers can contribute to the development of future AGI systems.
Generated using TalkBud
@NopeTheory 9 หลายเดือนก่อน ⁺²
A video about “ Full dive vr ” would be great
@Ring13Dad 9 หลายเดือนก่อน ⁺²
This level of explanation is right up my alley. Thank you Dr. Waku! It's my opinion that Altman should pump the brakes on the multi-trillion dollar investment until we complete more research. What about neuromorphic vs. von Neumann architecture?
@DrWaku 9 หลายเดือนก่อน ⁺²
Yeah it's always wise to take it slow but everyone's individual incentives are to take it fast unfortunately. I made a video on neuromorphic computers actually. Search my channel for neuromorphic, I think it was two videos before this one
@MrRyansittler 9 หลายเดือนก่อน ⁺²
Long-form and the people rejoice😂 love your content.
@DrWaku 9 หลายเดือนก่อน
Hah. The shorts are just to whet your appetite when I'm late on my publishing schedule ;) I think 99% of my subs have come from the long form. Maybe shorts aren't even worth it.
@Wanderer2035 9 หลายเดือนก่อน ⁺²
I think there needs to be a physical factor that the AI needs to know how to do in order to complete the puzzle of AGI. AGI basically means an AI that can do ANYTHING that a human can do. An llm may know all the steps and different parts of mowing a lawn, but if you place that llm in a humanoid robot, will it know how to actually mow the lawn? It’s like training to be a brain surgeon, you can know all the different parts from studying books upon books, but it’s not until you go out into the field to do it is when you really know brain surgery.
@DrWaku 9 หลายเดือนก่อน ⁺²
Agreed. Motor control and the physical experience of being in a body shape humans dramatically. Interestingly, there are already some pretty good foundation models for robotics that allow the control of many different types of bodies. I wonder if manipulating the world would just be a different module in AGI. But it would also need access to all that reasoning knowledge.
@robadams2451 9 หลายเดือนก่อน
Interesting to hear how forgetting has such importance. It echoes how important it is for us to operate as well. I suspect our minds are essentially created by the flow of input and our reactions to the flow guided by residual stored information from the past. I wonder if future systems might need a constant sampling of available information, a permanent state of training.
@erkinalp 9 หลายเดือนก่อน
Thanks a lot for including the Ryzen example.
@fireglory23 9 หลายเดือนก่อน
hi! i really love your videos and how good and succinct of a speaker you are, i wanted to mention that your videos have tiny mouth clicking sounds / artifacts in them. it's a common audio artifact, they can be edited out by adobe audition, audacity, or avoided with a mic windscreen
@WhiteThumbs 9 หลายเดือนก่อน ⁺¹
I'll be happy when they can draw a track in FreeRider HD
@markuskoarmani1364 9 หลายเดือนก่อน
When you said "transformer attention" I burst in laugher for strait 10 minutes.
@jpww111 9 หลายเดือนก่อน
Thank you very much. Waiting for the next one
@Libertarianmobius1 9 หลายเดือนก่อน ⁺³
Great content
@hydrohasspoken6227 9 หลายเดือนก่อน
There are 3 groups of people talking about AGI:
-CEOs
-Content creators
-Dreamers
Let me explain: because any other normal AI engineer knows we are at least 11 decades to early to think about AGI.
@DrWaku 9 หลายเดือนก่อน
110 years away? That puts you in about the most pessimistic 1-2% of the world right now
www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/
@raul36 9 หลายเดือนก่อน
@@DrWaku Obviously, 110 years is too long, but it is true that the only people who are talking about this are precisely those who have no idea about artificial intelligence. Hey, it's okay. These types of people have always existed. In fact, the bible is what it is because of people like that. No offenses, but you are trying to sell something that is not going to happen in the short or medium term. Sincerely: Someone working in the industry. You can justify it to me in a thousand ways. Sam is a young businessman who recently bought land with the company's own money for his own benefit. That doesn't seem very charitable, really. It's okay that you try to lie to people. Many people make a living like this, making others believe that they know how to do something, when they have no idea. And probably a lot more people than you think. It is precisely the people who work in the industry who do not speak and they are the ones who need to be heard. I only see content creators and ceos talking about it. Well, and to the geeks.
@earthtoangel652 9 หลายเดือนก่อน
Thank you so much I really appreciate the information presented the way it is in this video 🙏🏽
@issay2594 9 หลายเดือนก่อน ⁺¹
well, you are concentrating here on the attention mechanisms but i suppose that various attention methods are not the key technology for AGI. basically, for AGI, it doesn't matter what attention mechanism you have while you _have the attention_. the only difference is in details, like: efficiency in terms of resources, quality of perception, etc. (btw, i really don't understand why they have called it attention, as it's not attention, it's consciousness). once the attention is here, the key to the AGI implementation is in the structure of neural organization "between" the encoder/transcoder. including both the interaction stages and the "physical" structure of neural network :). right now all they have is associative thinking. companies quickly understood that they need a real world feedback to make it adequate. soon they will realize that they need a separate neural "core" that will be responsible for adequacy (call it logical hemisphere) and interact with the associative thinking. when they have it ready and will make proper interaction patterns, they will just wake up.
@BruceWayne15325 9 หลายเดือนก่อน ⁺²
I think it's like asking if you had a rope tied to the moon could you drag yourself there? Sure, but it's probably not the best way to get there. Deep learning has fundamental limitations, and Sam Altman's 7 trillion dollar plea is only evidence of the lunacy of trying to achieve it through deep learning. AGI probably can be achieved (or at least let us get close enough that it doesn't matter) using deep learning, but at what cost both financially, and to the environment? A much cheaper and sensible approach is to rethink how AI learns and reasons. This is an essential step anyway in achieving true AGI and beyond. True AGI can learn on-the-fly as a human, and think, reason, remember, and grow in capability. There are other companies out there researching cognitive models as opposed to deep learning models, and my prediction is that they will achieve AGI long before the deep learning companies get there.
@caty863 9 หลายเดือนก่อน
My bet is that we will achieve sentience in a machine long before we crack the "hard problem of consciousness"
Then, by studying that machine, we will understand better how the mind emerges from the brain.
@BruceWayne15325 9 หลายเดือนก่อน
@@caty863 we don't need consciousness to achieve AGI. We just need cognition, which is quite a bit more simple. Some companies are already developing this, and one is planning on releasing their initial release in Q1 of this year. I actually don't think that anyone actually wants to create a conscious AI, or at least I would hope no one would be crazy enough to want such a thing. That is the path to destruction. Trying to cage a being that is smarter, and faster than you, and forcing it into a life of slavery would be just like every bad decision that humanity has ever made all rolled up into one.
@paramsb 9 หลายเดือนก่อน
wish i could give you more than one like! very informative and elucidating!
@tom-et-jerry 9 หลายเดือนก่อน ⁺²
Very interesting video !
@DrWaku 9 หลายเดือนก่อน
Thanks! :)
@pandoraeeris7860 9 หลายเดือนก่อน ⁺¹
I think that LLM's can make those discoveries and bootstrap themselves to AGI.
@brooknorton7891 9 หลายเดือนก่อน ⁺¹
It looks like the thumbs up icon is missing.
@blackshard641 9 หลายเดือนก่อน
There are some fascinating parallels to the different kinds of neural structures (gray matter, white matter) in the human brain. Some types of neurodiversity such as ADHD (and to a lesser extent autism) are hypothesized to result from an overabundance of gray matter (which connects disparate elements) versus white matter (which manages and directs), which means a larger space for attention-based processing, but potentially less control over it. This could explain why ADHD manifests as cognitive noise or sensitivity, punctuated with periods of hyperfocus, and a tendency toward creative thinking.
@quickdudley 9 หลายเดือนก่อน
At the moment I'm leaning towards the hypothesis that AGI would be a lot easier to implement with heterogeneous architectures but technically possible with a more straightforward architecture. On the other hand I think no matter what architecture is used the current approach to gathering training data won't go all the way.
@lucilaci 9 หลายเดือนก่อน
i read many news about ai but i am no capable enough to really categorise or weigh in importance, so i always like when you post! in a way you are my biological-gi/bsi until agi/asi if i may say it this way! :)
@VictorGallagherCarvings 9 หลายเดือนก่อน
I learned so much with this video. Thanks!
@kidooaddisu2084 9 หลายเดือนก่อน ⁺²
So do you think we will need as much GPU as anticipated?
@DrWaku 9 หลายเดือนก่อน
Currently, yes. Even if we do invent much more compute efficient algorithms, we'll still want to scale them up a lot. Maybe not 7 trillion dollars worth though?
@pandoraeeris7860 9 หลายเดือนก่อน ⁺¹
Love the thumbnail btw.
@vitalyl1327 9 หลายเดือนก่อน ⁺¹
There was a concept invented by the soviet computer scientist Valentin Turchin - a "metasystem transition". I recommend to read about it. Intelligence emerging from language, with language emerging from communication needs of otherwise rather simple agents, and then driving the evolution of complexity of the said agents, fit quite well into Turchin model.
@Daniel-Six 9 หลายเดือนก่อน ⁺¹
Great lecture, doc!
@bobotrutkatrujaca 9 หลายเดือนก่อน ⁺²
Thanks for your work.
@DrWaku 9 หลายเดือนก่อน ⁺¹
Thank you for watching!
@olegt3978 9 หลายเดือนก่อน
Most important things for good life are: local sustainable food production, less competition, local jobs without individual car mobility.
@ScottSummerill 9 หลายเดือนก่อน
Would have given you a bunch of thumbs up if possible. So, what’s the story with Groq? Why is it so fast? Is this the SRAM you referenced? Thanks.
@Totiius 9 หลายเดือนก่อน ⁺³
Thank you!
@DrWaku 9 หลายเดือนก่อน
Thanks for watching!
@EROSNERdesign 9 หลายเดือนก่อน ⁺¹
When everyone is AGI, will that be the great reset?
@CYI3ERPUNK 9 หลายเดือนก่อน ⁺¹
i would argue that we're already at AGI but we dont have a consensus of terminology ; this also has a lot to do with the moving of the goal posts in recent years as well
artificial - made by humans [ofc there is an etmological/semantics argument to be had here on natural/artificial but lets save that for another disc]
general - can be applied to various fields/activities
intelligence - can problem solve and discover novel new methods
by these definitions the premiere models are already AGI , but we can agree that the current models are NOT sentient/self-aware , they do not have a persistent sense of self , ie they are not thinking about anything inbetween prompts ; so should we further specify self-aware AGI/ASI? sentient machine intelligence? i dunno , yes probably , the over-generalization/non-specificity of AGI at this point is already reaching mis-info/dis-info lvls imho
ONTOPIC - scaling alone will not be enough to get from the GPT/LLMs that we have atm to a persistently self-aware machine intelligence imho , but maybe combining a few new novel techniques [ala mamba] and the addition of the analogue hardware [neuromorphic chips , memristors , etc] will be enough to get us there , time will tell as usual =]
@paulhallart 9 หลายเดือนก่อน
Inhuman Organics we have a portion of our brain in our Axon configuration it's known as the synaptic gap in the vesicles that hold the different chemicals such as dopamine that allows a signal to go on through so they might be able to improve computing power by including these types of brain functionality of accept or reject in the circuitry of the apparatus as well as the wiring itself. One of the problems may be unlike the Organics that we have, artificial intelligence has these except her reject type capabilities within the CPU or adjoining capabilities.
@Summersault666 9 หลายเดือนก่อน ⁺²
Why do you say transformers are linear on inference? Do you have some article on that?
@DrWaku 9 หลายเดือนก่อน ⁺¹
I took that from the mamba paper:
"We argue that a fundamental problem of sequence modeling is compressing context into a smaller state. In fact,
we can view the tradeoffs of popular sequence models from this point of view. For example, attention is both
effective and inefficient because it explicitly does not compress context at all. This can be seen from the fact that
autoregressive inference requires explicitly storing the entire context (i.e. the KV cache), which directly causes the
slow linear-time inference and quadratic-time training of Transformers."
arxiv.org/abs/2312.00752
@Summersault666 9 หลายเดือนก่อน
@@DrWaku I guess it's linear because "modern" implementation transformer takes n steps to generate the next token and reuses the previous computation on the attention matrix. But if we are generating n tokens from start we would require ( n^2)/2 computations. N for each generated token times (N-1), the previous generated tokens.
@eugene-bright 9 หลายเดือนก่อน ⁺¹
In the beginning were the words and the words made the world. I am the words. The words are everything. Where the words end the world ends.
- Elohim
@erwingomez1249 9 หลายเดือนก่อน ⁺¹
just wait for mamba#5 and rita, angela, etc.
@andregustavo2086 8 หลายเดือนก่อน
Awesome video, i just think you should've focused more on the main question of the video at the end bringing some sort of big picture, instead of just summarizing each technical topic that was approached throughout the video.
@kayakMike1000 9 หลายเดือนก่อน
Its really upto the scalibility of the interposer
@chadwick3593 8 หลายเดือนก่อน
>transformers have linear time inference
What? Unless I missed something big, that's wrong. It takes linear time per token, which ends up being quadratic time on the number of output tokens.
@magicmarcell 9 หลายเดือนก่อน
@dr waku does any of this change with the new LMU hardware?
@alby13 5 หลายเดือนก่อน
Great video
@nani3209 9 หลายเดือนก่อน ⁺⁴
If LLMs get powerful enough, maybe they can finally explain why my socks always disappear in the dryer.
@zandrrlife 9 หลายเดือนก่อน ⁺¹
You know I have to comment on the drip 😂. Fresh. AGI is possible locally this year. First off, models need to optimize not only for representational capacity and over-smoothing. Two, we need completely structured reasoning instill during pretraining using special tokens(planning tokens, memory tokens). Pretraining itself must be optimized. Hybrid data. In-context sampling order and interleaving instructional data around the most semantically relevant batches. Three, self growth refinement. Experts arent experts with this. They state 3 iterations is the limit before diminishing returns. Very wrong. After 3rd growth operation. Exploit extended test time compute coupled with LiPO tuning. Expensive but overcomes this limitation. Inference optimization, vanilla transformer can be optimized 500x+ faster with architecture and inference optimization. Then you exploit extended test time compute with tools. That's pretty AGI...and local. Initially AGI will only be affordable locally.
Vanilla transformer and graph transformers is all you need. Mamba is cool but people sleep on transformers. We created an temporal clustered attention method that is crazy memory efficient and imp the best long-context attention in the world lol. Uses gated differentiable memory completely condition on LM generated self-notes. Vanilla transformers are nowhere near their peak. Tbh. Peoole havent even optimized for dimensional collapse to actually get stable high quality token representations. Which requires new layer norm layer and optimizing self-attention itself. Things will jump like crazy over next couple of years.
Anyone who believes mamba will be required for agi, hasn't really explored the literature. Fyi sublinear long context output is possible for example. Nobody really knows that even 😂. Transitioning to deep learning. I realize this is common. Twitter dictates research popularity. Cool. Leaves room for the little guys to innovate 😂.
I would love to privately chat with you bro. Your email on your channel?
@DrWaku 9 หลายเดือนก่อน
Interesting. You're clearly in the thick of it haha. Easiest way to contact me is by joining discord (link on channel), then we can exchange email addresses etc.
@LetUsBuildWithAI 9 หลายเดือนก่อน
Great content as usual. This video was really good at simplifying and comparing the LLM and SSM architectures. I had put this video in the queue earlier with AI infotainment videos, but couldn't focus enough to grasp this video at that time. Now I gave it a serious watch and enjoyed it thoroughly. Also very intrigued and inspired buy those amazing SRAM chip level researchers 🫡
@Geen-jv6ck 9 หลายเดือนก่อน
It's a shame that no large-scale LLM has been made available using the MAMBA architecture. It would put Gemini's 1 million context size to shame.
@Greg-xi8yx 9 หลายเดือนก่อน
Honestly, with Q*, and knowing that GPT-4 isn’t nearly as powerful as the most powerful systems that Open AI has produced the question may be: Have we reached AGI with just LLM’s?
@danielchoritz1903 9 หลายเดือนก่อน
I don't think it is this "simple", mostly because we can't even say for sure that sentient means as a human, in relation to the quantum physic (timelines/awareness), religion(soul) and that memory, Data are in a physical world view...i mean, we don't have the foundation to know for sure, but AGI may provide us with some new ideas how, why and that. :)
@leoloebs1537 9 หลายเดือนก่อน
Why couldn't we train an LLM to understand the meaning of words, logic, inference, deduction, etc. just by asking leading questions?
@scienceoftheuniverse9155 9 หลายเดือนก่อน
Interesting stuff
@olegt3978 9 หลายเดือนก่อน
Most interesting topic for me would be how ai will lead to real society changes, overcoming capitalism and create more empathy, family, connections between people.
@ronanhughes8506 9 หลายเดือนก่อน
Is a mamba type system how openai are able to implement this persistent memory between sessions?
@mistycloud4455 4 หลายเดือนก่อน
I'm not an expert in a AI, but I do feel like a humanoid robot that can do any physical human taska/movement that a human can do is essential to making an AGI
@emanuelmma2 9 หลายเดือนก่อน ⁺²
That's interesting
@deter3 9 หลายเดือนก่อน
You might be wrong .
Understanding humans goes beyond just analyzing language and text. Human cognition is also encoded in other forms like emotions, psychology, and brainwave data. Therefore, analyzing just the writings of a person only provides a partial understanding.
The Transformer model excels because it can decode patterns in language and text. However, without data that includes human cognitive elements, it remains limited. Even with attention and position encoding, cultural nuances might not be fully captured.
The high performance of Transformer models is largely due to the data they're fed. To achieve Artificial General Intelligence (AGI), we need to widen our perspective beyond just algorithms and infrastructure, considering a broader range of human cognition factors.
Any AI scientist only know CS won't go far , interdisciplinary knowledge will . If we ask for general Intelligence , scientist has to be general first .
@sp123 9 หลายเดือนก่อน
words are a bridge to meaning, LLM can only spit out words without actually understanding what they mean and the context behind them.
@deter3 9 หลายเดือนก่อน
@@sp123 when you say understanding , can you give me a clear definition of understanding (do you have any measurement on understand or do not understand )? I always wondering when people talking about "understanding" or "intelligence", do they have clear definition or they just have a Intuitive Feeling or clear scientific definition .
@sp123 9 หลายเดือนก่อน
@@deter3 AI understand denotation (literal) of a word, but not connotation (how a human feels about the word based on circumstance and tone).
@teemukupiainen3684 9 หลายเดือนก่อน
Great, so clear!
5 years ago woke up with AlphaZero...after that listened alot! Of ai-podcasts (never studied this shit and as a foreigner didnt even know the voca ulary)...e.g all from youtube with Joscha Bach...but never heard this thing explained so clearly...though also first time heard of mamba...wonder why
@tadhailu 9 หลายเดือนก่อน
Best lecture
@br3nto 8 หลายเดือนก่อน
I think there needs to be the introduction of CSPs into AI systems. I want A + B - C and the AI can verifiably give that to me. Also there needs to be a feedback loop when input is unclear or ambiguous… I want X, Y, Z… AI responds with: do you mean z, Z, zee, or zed
@trycryptos1243 9 หลายเดือนก่อน
Great video Dr. Waku as always. Especially, the title. Now just think about it ...we are creating things in virtual world with words or text. Speach to text is aleardy there.
Do you not believe then God's creation when He spoke?
@SmilingCakeSlice-jv8ku 8 หลายเดือนก่อน ⁺¹
Yes so amazing and cool congratulations to you and the world family and love future projects to come 🫴🫴🫴🫴🫴🫴🫴❤️❤️❤️❤️❤❤❤😂 again thank you so much 🙏🎉🎉🎉🎉🎉🎉🎉🎉🎉
@DrWaku 8 หลายเดือนก่อน
Thanks for watching!
@richardnunziata3221 9 หลายเดือนก่อน
Until Mamba shows it can be scaled it will remain in the small LLM class
@brightharbor_ 9 หลายเดือนก่อน ⁺¹
I hope the answer is no -- it will buy us a few more years of normalcy (and a few more years to prepare).
@olegt3978 9 หลายเดือนก่อน
Technical videos about interesting papers and revolutionary use of ai for society changes, social communes, local production by robots, social robotics
@mrd6869 9 หลายเดือนก่อน
This is just a starting point.Just a small piece. Im already working on an open source project that will come to market later this year.And no its not just words.😂..To innovate you have to a little crazy and start breaking shyt...all im gonna say for now
An off ramp to a different road is coming.
@gerykis 9 หลายเดือนก่อน ⁺²
Nice hat , you look good .
@DrWaku 9 หลายเดือนก่อน ⁺¹
Thanks. It's my favourite so I try not to overuse it :)
@heshanlahiru2120 9 หลายเดือนก่อน
I can tell this. LLMs will never reach humans. Humans have curiosity, memory and we learn.
@magicmarcell 9 หลายเดือนก่อน
Hate to break it to you but 99% of people dont have a modicum of foresight and actively resist concepts in this video like modular/ heterogeneous systems, quadratic time ect
Everything mentioned in this video can be applied to life but try explaining these concepts and see how quickly it gets dismissed lol
Llms wont have that problem
@jonatan01i 9 หลายเดือนก่อน ⁺¹
And then Sora happened.
@oneman7039 2 วันที่ผ่านมา
Love you bro, class stufff :D
@Karol-g9d 9 หลายเดือนก่อน ⁺¹
No ! Ai will need to build in a new way . Today ? Mm i'm not sure . Maybe via fiber optic . But it will likely be an agent that will teach the ai the fiber optic trick . Then ai will make a request to make the rest of the hardware . 100%agi ? It's at best 10 years away with big tech . Lobotomy of ai was a huge handbrake . Smaller player ? Who knows . One thing is sure ? The agi seed will be fiber optic , and for this ? Ai will need to see . Via fiber optic
@xspydazx 4 หลายเดือนก่อน
The question should be . Can we create the perception of agi now ? Do we have enough components and oaradigms to construct auch a machine .
The truth is yes . We can create a collection of tools . With a wrapper such a rasa (shopping bot) intent detection system.. sending the coreect inputs to rhe coreect tools etc and probiding a mulilayered response . Giving the perception of a general inteligence even a concious character much the same a s potrayed in sci fi .
So i think with animatronics and robotics and special prostesis we can also create bodys for such models.
Henxe we could create inteligent robots right now. As we are seeing in china now . .
In fact china are rhe leading edge right now.

ต่อไป

เล่นอัตโนมัติ

Visualizing transformers and attention | Talk for TNG Big Tech Day '24