Arvind talks about self driving cars as inspiration. But there is no fully self driving cars meaning they haven't found a way to solve the driving problem. Sure they can make it easier but you still need a human in the loop because when you least expect it it will ram into a wall or something else
The arguments made here seem very weak. To get a better model you need three things, Scale, Compute, Data. For data: We haven't used anywhere close to all available data and we just by existing make more data every year. The biggest source of data is not humans but reality. You can farm infinite data from reality and our models no longer need to train off of human data. We are currently focused on static data but humans train on dynamic data, models can too. Data is a bottleneck for people who don't understand we have practically infinite data. And with humanoid robots that data is a short walk away. He has no idea what he is talking about when he talks about synthetic data. The ONLY two reasons why synthedit data could hurt a model is the shrinking distribution problem and the ungrounded data problem. You can make infinite synthetic data if it's grounded and has a defined distribution. Humans almost exclusively train on our own grounded synthetic data. It's simply not a problem. Grounding a model will always make it better, it needs active ways to interact with reality and learn rather than just static data. It can even train on its own output if you ground the output and feed it back in. If the model is wrong or doesn't know something, tell it, it's literally just that simple. Data is simply not a bottleneck we will ever have. Data is the hardest part of making a model, we will always have problems curating it but we will never run out of it. When he gave the example of riding a bike it hit the nail on the head. It's not data that the problem is curating it and getting the right kind of data. A lot of this data needs to be experienced. Subjective experience to the model we are training. It needs the data of action and reaction to reality. Models need grounding. Grounding is why humans learn better than LLMs. And once you get the well grounded data you can train another model on it. For scale: We haven’t really scaled up yet. A human brain would require around 100T parameters. The only thing holding back our model scale is the storage space and ram we have access to. We have not by a long shot maxed out Petabytes or Exabytes of storage space with models. We have a long way to go to max out the scale of our models. He is wrong in saying that more efficient models will be the future. Because if you train a small model with the best data and compute, a larger model will always be better. If you think a model is good, realize that if you trained a bigger one it would be better. We know this fact because it's been mathematically proven. It's done, the debate has already been settled, it's true. LLMs are universal function approximators. But the function complexity and the amount of functions we are trying to approximate are above our current models ability to completely approximate. So training a bigger model will make the model mathematically better at approximating the functions based on the data points it's trained on. For Compute: This is our only current real bottleneck. Not for training though but for inference. We have to somehow run inference of huge models at an economically reasonable cost and still serve the model replies fast to millions of people. Don't even get me started on dedicated inference hardware. We could easily see a 1000x inference speed and efficiency improvement. The reason we haven't really done this yet is because LLMs are still improving super quickly, but once they settle down or we get models that are good enough we can build dedicated chips to run specific models that dwarf our current efficiency. We currently have not trained the largest models we can. We CAN currently train models of 50T+ perimeter size with our current data centers and data. There's not much point to doing that though because we can't run inference at scale for a model that big yet. We don't really need larger data centers to train larger models, we just need more time. 3 years of training time can get us a 20T+ parmiter model. As we get more advanced models we need to disconnect the idea that training will happen quickly. We need to start thinking of training more in the time it takes to train a human kid, because as we scale up we will be training brains that approach that size. If you think of the human brain, that's how efficient training and inference can be of a model 100T+ parmiters in size with the correct hardware. Everyone is trying to build that hardware. And even the hardware in the pipeline for the next 2 years will double training and inference efficiency. Think about spending 5 years to build a human scale 100T+ parmiter model and then making dedicated hardware to run inference on it that's as small and power efficient as a mobile phone. That's where we are heading. Side note about running out of benchmarks, it is a solved problem. It's just really annoying to do. If you have tasks you want your model to be able to do you can just make an automated benchmark with however many questions you want to fit your use case. Think of it as hiring a human rather than trying to get an overview of all of that human's skills. If you think trying to quantify all of the skills your dog or cat has is dumb, why are you trying to do it for an LLM? Build custom tests for your use case, it's that easy (hard). Also it's literally impossible to contaminate every single benchmark, that would just mean your model is actually just good at everything. Just make new benchmarks and correlate how similar they are to each other so you don't need to constantly run them all. We aren't even close to being done improving, sit down and enjoy the ride. It's going to be a bumpy one.
@@viaMac I mean in my opinion they would definitely help, but they aren't necessary. But regardless of what I want we're going to get algorithmic breakthroughs regardless. Because we have thousands of very intelligent devoting their time to solve these problems. Do I look forward to see what happens regardless of what happens. I feel like we all have a pretty common goal in what we want we're all just kind of unsure how to achieve it.
Think of it in terms of Asymptote analysis instead of whether it is possible or not. Even an algorithm with O(n!) is possible to solve a hard problem but it might not be practical due to the limitation on compute, law of physics, resources available. That is what we need to consider when we are discussing the future of AI and an algorithmic breakthrough is definitely needed if we want to advance beyond what we have today and so on. LSTM and GRU is not the only model that can retain attention from training data, transformer in a way leap forward those past models in terms of what deep learning model can do that previous model may never be able to achieve no matter how much compute or data we give. Similarly for LLM, there might be breakthrough needed to leap forward to a new model that could achieve more remarkable results with the amount of compute and data it has been given compared to the current state of LLM.
Incentives aside; That’s cute, but I’m gonna take the Ilyas of the world’s words over this professor. With all due respect, he’s invested in the future of AI being hype and dangerous, and the deep learning and scaling folks have not only won, but all scale has done has been consistently adding new capabilities and research to the field.
How have the scaling guys won? There's been an obvious and evident flattening of capabilities after gpt4. You guys are so invested in Agi scifi bullshit u cannot accept a reality check
Agreed. Even professors aren't necessarily "experts" because at this point even people like Dario Amodei don't know if the scaling laws will continue, it's empirically sound though and so far so good. Data is not a bottleneck therefore compute again becomes a bottleneck (also electricity is becoming a bottleneck to run the compute).
Yann Lecun is one of the creators of the transformer, the breakthrough that helped bring us self-driving cars and LLMS, and he agrees more with this professor than the more ai optimistic view.
@@BlueBalledMedia you might be referring to deep learning and CNNs, because LeCun was not even referenced in the Google transformer paper, let alone contributing to it. While LeCuns digs into his “no smarter than a house cat” stance, the field continues to add more human capabilities with just the current architecture. His own parent company is investing in scaling and build out like they believe it’s worth it FWIW too, so he can feel however he likes.
@@SouthAfricanAmerica Well, he is no slouch in the field; he's pushed the industry forward, so it's always important to take such critiques or pushback seriously. Yann has been in the field for 30+ years.
Is the technology ready to create a platform for collective intelligence? The platform could merge in real time the parts of conversations that people select for input to a shared graph representation. Scale it up to merging selected parts of simultaneous conversations with millions of people around the world.
Traditional scaling approaches in AI are yielding diminishing returns. Simply increasing model size and training data may no longer produce significant improvements. In fact, over large models can result in loss of generalization and memorization. Current benchmarks may be inadequate for measuring the true capabilities of advanced AI systems. As models approach or potentially surpass human-level performance in various areas, traditional evaluation methods are becoming obsolete. Programming capabilities, particularly the ability to debug large, multi-file codebases, could be a more relevant measure of AI progress. This shift towards evaluating AI based on practical, real-world tasks rather than abstract benchmarks may provide more meaningful insights. Developing AI agents, especially those capable of research and work-related tasks, is likely to be a key focus. This approach allows for more dynamic, interactive, and open-ended AI development, potentially leading to more rapid advancements. Human-AI collaboration in developing these agents is crucial, allowing for real-time learning, immediate feedback, and exposure to diverse problem-solving approaches. Daily fine-tuning and continuous improvement are central to advancing AI capabilities, suggesting a move towards more adaptive and personalized AI systems. Knowledge distillation is another area where we can expect improvements, Using AI agents to generate synthetic training data from chat history via in-context learning could create a virtuous cycle of improvement, accelerating AI development. Of course we'll need a curated set of experts whose chat history we'll use. The future of AI development may involve a shift away from simple scaling and benchmark chasing towards more nuanced, practical, and interactive approaches to advancing AI capabilities.
He is right. When it comes to a presidential election, the stakes are so high that money and resources arent that much of a constraint. And generating convincing deepfakes was always possible. Only that while previously you needed to photoshop it pixel by pixel, now you can get it done by running a couple of queries. Its faster now, but generating deepfake was still possible and if it was critical to political success, you would have seen widespread use from the time photoshop was available. If a couple of deepfakes could change presidential election results, you can bet the campaigns would have managed to generate convincing deepfakes of their opponents with prosthetics, masks, makeup and photoshop
Subscribe to the 20VC TH-cam channel for more great interviews: www.youtube.com/@20VC
Really appreciate the levelheaded and informed perspective on the state of AI.
Informed? He’s literally wrong about scaling
The "Bank Tellers" story sounds reassuring - until you learn that lately banks have been shutting down a lot of those new branches they started.
a very disappointing analogy
ATMs were invented 55 years ago.
Arvind talks about self driving cars as inspiration. But there is no fully self driving cars meaning they haven't found a way to solve the driving problem. Sure they can make it easier but you still need a human in the loop because when you least expect it it will ram into a wall or something else
The arguments made here seem very weak.
To get a better model you need three things, Scale, Compute, Data.
For data:
We haven't used anywhere close to all available data and we just by existing make more data every year.
The biggest source of data is not humans but reality. You can farm infinite data from reality and our models no longer need to train off of human data.
We are currently focused on static data but humans train on dynamic data, models can too.
Data is a bottleneck for people who don't understand we have practically infinite data. And with humanoid robots that data is a short walk away.
He has no idea what he is talking about when he talks about synthetic data. The ONLY two reasons why synthedit data could hurt a model is the shrinking distribution problem and the ungrounded data problem.
You can make infinite synthetic data if it's grounded and has a defined distribution. Humans almost exclusively train on our own grounded synthetic data. It's simply not a problem. Grounding a model will always make it better, it needs active ways to interact with reality and learn rather than just static data. It can even train on its own output if you ground the output and feed it back in. If the model is wrong or doesn't know something, tell it, it's literally just that simple. Data is simply not a bottleneck we will ever have. Data is the hardest part of making a model, we will always have problems curating it but we will never run out of it.
When he gave the example of riding a bike it hit the nail on the head. It's not data that the problem is curating it and getting the right kind of data. A lot of this data needs to be experienced. Subjective experience to the model we are training. It needs the data of action and reaction to reality. Models need grounding. Grounding is why humans learn better than LLMs. And once you get the well grounded data you can train another model on it.
For scale:
We haven’t really scaled up yet. A human brain would require around 100T parameters. The only thing holding back our model scale is the storage space and ram we have access to. We have not by a long shot maxed out Petabytes or Exabytes of storage space with models. We have a long way to go to max out the scale of our models.
He is wrong in saying that more efficient models will be the future. Because if you train a small model with the best data and compute, a larger model will always be better. If you think a model is good, realize that if you trained a bigger one it would be better.
We know this fact because it's been mathematically proven. It's done, the debate has already been settled, it's true. LLMs are universal function approximators. But the function complexity and the amount of functions we are trying to approximate are above our current models ability to completely approximate. So training a bigger model will make the model mathematically better at approximating the functions based on the data points it's trained on.
For Compute:
This is our only current real bottleneck. Not for training though but for inference. We have to somehow run inference of huge models at an economically reasonable cost and still serve the model replies fast to millions of people.
Don't even get me started on dedicated inference hardware. We could easily see a 1000x inference speed and efficiency improvement. The reason we haven't really done this yet is because LLMs are still improving super quickly, but once they settle down or we get models that are good enough we can build dedicated chips to run specific models that dwarf our current efficiency.
We currently have not trained the largest models we can. We CAN currently train models of 50T+ perimeter size with our current data centers and data. There's not much point to doing that though because we can't run inference at scale for a model that big yet. We don't really need larger data centers to train larger models, we just need more time. 3 years of training time can get us a 20T+ parmiter model. As we get more advanced models we need to disconnect the idea that training will happen quickly. We need to start thinking of training more in the time it takes to train a human kid, because as we scale up we will be training brains that approach that size.
If you think of the human brain, that's how efficient training and inference can be of a model 100T+ parmiters in size with the correct hardware. Everyone is trying to build that hardware. And even the hardware in the pipeline for the next 2 years will double training and inference efficiency.
Think about spending 5 years to build a human scale 100T+ parmiter model and then making dedicated hardware to run inference on it that's as small and power efficient as a mobile phone. That's where we are heading.
Side note about running out of benchmarks, it is a solved problem. It's just really annoying to do. If you have tasks you want your model to be able to do you can just make an automated benchmark with however many questions you want to fit your use case. Think of it as hiring a human rather than trying to get an overview of all of that human's skills. If you think trying to quantify all of the skills your dog or cat has is dumb, why are you trying to do it for an LLM? Build custom tests for your use case, it's that easy (hard). Also it's literally impossible to contaminate every single benchmark, that would just mean your model is actually just good at everything. Just make new benchmarks and correlate how similar they are to each other so you don't need to constantly run them all.
We aren't even close to being done improving, sit down and enjoy the ride. It's going to be a bumpy one.
Scale, compute, and data is insufficient. We need fundamental algorithmic breakthroughs.
@@viaMac I mean in my opinion they would definitely help, but they aren't necessary.
But regardless of what I want we're going to get algorithmic breakthroughs regardless. Because we have thousands of very intelligent devoting their time to solve these problems. Do I look forward to see what happens regardless of what happens.
I feel like we all have a pretty common goal in what we want we're all just kind of unsure how to achieve it.
Interesting view. While we wait invest in layered agent logic and adoption is a nobrainer.
Think of it in terms of Asymptote analysis instead of whether it is possible or not. Even an algorithm with O(n!) is possible to solve a hard problem but it might not be practical due to the limitation on compute, law of physics, resources available. That is what we need to consider when we are discussing the future of AI and an algorithmic breakthrough is definitely needed if we want to advance beyond what we have today and so on. LSTM and GRU is not the only model that can retain attention from training data, transformer in a way leap forward those past models in terms of what deep learning model can do that previous model may never be able to achieve no matter how much compute or data we give. Similarly for LLM, there might be breakthrough needed to leap forward to a new model that could achieve more remarkable results with the amount of compute and data it has been given compared to the current state of LLM.
Great Interview!
Incentives aside; That’s cute, but I’m gonna take the Ilyas of the world’s words over this professor. With all due respect, he’s invested in the future of AI being hype and dangerous, and the deep learning and scaling folks have not only won, but all scale has done has been consistently adding new capabilities and research to the field.
How have the scaling guys won? There's been an obvious and evident flattening of capabilities after gpt4. You guys are so invested in Agi scifi bullshit u cannot accept a reality check
Agreed. Even professors aren't necessarily "experts" because at this point even people like Dario Amodei don't know if the scaling laws will continue, it's empirically sound though and so far so good. Data is not a bottleneck therefore compute again becomes a bottleneck (also electricity is becoming a bottleneck to run the compute).
Yann Lecun is one of the creators of the transformer, the breakthrough that helped bring us self-driving cars and LLMS, and he agrees more with this professor than the more ai optimistic view.
@@BlueBalledMedia you might be referring to deep learning and CNNs, because LeCun was not even referenced in the Google transformer paper, let alone contributing to it. While LeCuns digs into his “no smarter than a house cat” stance, the field continues to add more human capabilities with just the current architecture. His own parent company is investing in scaling and build out like they believe it’s worth it FWIW too, so he can feel however he likes.
@@SouthAfricanAmerica
Well, he is no slouch in the field; he's pushed the industry forward, so it's always important to take such critiques or pushback seriously.
Yann has been in the field for 30+ years.
Thank You Harry 🙏
Is the technology ready to create a platform for collective intelligence? The platform could merge in real time the parts of conversations that people select for input to a shared graph representation. Scale it up to merging selected parts of simultaneous conversations with millions of people around the world.
Thank you for all the interviews with great people Harry👉👍👈
Was excited by the title, but lost me at bitcoin comments… persevered but found him concerningly out of touch
Traditional scaling approaches in AI are yielding diminishing returns. Simply increasing model size and training data may no longer produce significant improvements. In fact, over large models can result in loss of generalization and memorization.
Current benchmarks may be inadequate for measuring the true capabilities of advanced AI systems. As models approach or potentially surpass human-level performance in various areas, traditional evaluation methods are becoming obsolete.
Programming capabilities, particularly the ability to debug large, multi-file codebases, could be a more relevant measure of AI progress. This shift towards evaluating AI based on practical, real-world tasks rather than abstract benchmarks may provide more meaningful insights.
Developing AI agents, especially those capable of research and work-related tasks, is likely to be a key focus. This approach allows for more dynamic, interactive, and open-ended AI development, potentially leading to more rapid advancements.
Human-AI collaboration in developing these agents is crucial, allowing for real-time learning, immediate feedback, and exposure to diverse problem-solving approaches.
Daily fine-tuning and continuous improvement are central to advancing AI capabilities, suggesting a move towards more adaptive and personalized AI systems.
Knowledge distillation is another area where we can expect improvements,
Using AI agents to generate synthetic training data from chat history via in-context learning could create a virtuous cycle of improvement, accelerating AI development. Of course we'll need a curated set of experts whose chat history we'll use.
The future of AI development may involve a shift away from simple scaling and benchmark chasing towards more nuanced, practical, and interactive approaches to advancing AI capabilities.
30:25 thanks harry for pushing back
"sigmoids look exponential," "could just do the same thing today", these are just mollifying bs that gets retweets
He is right. When it comes to a presidential election, the stakes are so high that money and resources arent that much of a constraint. And generating convincing deepfakes was always possible. Only that while previously you needed to photoshop it pixel by pixel, now you can get it done by running a couple of queries. Its faster now, but generating deepfake was still possible and if it was critical to political success, you would have seen widespread use from the time photoshop was available. If a couple of deepfakes could change presidential election results, you can bet the campaigns would have managed to generate convincing deepfakes of their opponents with prosthetics, masks, makeup and photoshop
Lowing the barrier to creating convincing misinformation matters and this guys a dolt
Disappointed to hear Arvinds comments on Bitcoin. 😢
watching this after strawberry was shown to the fed . Horrible take.