GPT-3 has 175 billion parameters/synapses. Human brain has 100 trillion synapses. How much will it cost to train a language model the size of the human brain?
It depends on whether the lottery ticket hypothesis is verified or not at brain scale. In this case the cognitive power of a much larger brain could be reached within a much smaller one. I suspect new search mecanisms would have to be invented to discover these optimally sized architecture . The level of brain plasticity observed on subjects that have lost part of their brains leans toward that hypothesis .
These short highly focused videos are a nice mental appetizer, and its easy to set aside 5 mins to watch them between consecutive unsuccessful model training runs
You forget that the 100 trillion synapses doesn't only do language, it does vision, reasoning, biological function, fine motor control, and much more. The language part (if we can isolate from other parts) probably uses a fraction of those synapses
It might be hard to quantify how many neurons are associated with language, since language, vision, hearing and touch are very much interconnected in our brains. You can't learn a language if you can't see, hear and touch.
@@thomasreed2427 That seems like a bit of an exaggeration to me. To replicate some of the behavior of a single brain neuron (eg xor), you would need 4 of our current neurons. Let's take 10 times that and round upwards, 40≈100, to cover it more accurately. The structure of the brain could also give an additional 10, or even 100, times requirement with our type of neurons. Remember that just giving it an additional 10 times, is 10 times its current size, i.e. it could do the same job 10 different ways. So personally I think you might need at most 100*100 = 10 000 times larger than 100 trillion. But idk ¯\_(ツ)_/¯
I think it is important that computer scientists use the term neuron and synapse very carefully. I am a molecular biologist and to equate neurons in neural networks to biological neurons, or even a synapse, is like calling an abacus a quantum computer. I don't say this to diminish machine learning at all, I use it as a biologist, and I've been showing my whole family AI Dungeon 2 utilizing GPT-3; it really is tremendous. But there is such a large difference between computer neurons (I'll just call them nodes) and biological neurons. Each neuron itself could be represented as a neural network with probably 100s of trillions of nodes or maybe magnitudes more, and each of those nodes would itself consist of probably thousands or millions of nodes in their own neural network. This is to say that the computation involved in determining whether there is an action potential or not is truly massive. I wish I could put this into more precise words but the complexity of even a single neuron is far, far greater than the complexity of all human systems of all times compiled into even a single object. I will try to exemplify this using a single example in my field of expertise, microRNA. The synapse consists of multiple protein complexes that work to transmit a chemical signal from outside the cell to inside the cell. In this case, the outside signal is created by another neuron. Every one of those proteins has dozens (and probably a lot more than that) of regulatory steps along the path of its production, localization, and function. These regulatory steps happen over time and themselves consist of other molecules produced/consumed by the neuron, each of which have their own regulation. Now let's say we have neuron 1 and it is trying to form a synapse with neuron 2. At the position neuron 1 and 2 physically interact, communication has already happened and all the necessary players (small molecules, RNA, and protein) have been recruited to this location. The moment of truth arrives, neuron 1 has an an action potential. Neuron 2 starts to assemble a brand new synapse at that location but this does not end in the production of a new synapse. In neuron 2, perhaps hours or days previous it decoded a complex network of extracellular signals that culminated in the localization of a specific microRNA at the location of this potential synapse. At the same time neuron 2 receives the signal from neuron 1, that microRNA is matured and is made active over a period of minutes. Instead of this new synapse being formed on neuron 2, this specific microRNA causes the production of protein necessary for its completion to stop and the whole process is aborted. At every step of every process in normally functioning neurons, these seemingly miraculous processes are occurring. They are occurring in our billions of neurons over our entire lives, existing in a body of trillions of cells that are all equally as complex, communicating with each other always and for our entire lives. I say this not to demean or lessen the work of you, Lex, or any other computer scientist. But I say this to humble us, for us to be a little more careful when we say so casually, "it's just computation."
Reading this, what's truly miraculous to me is that the organization of these 100 billion neurons with 100-1000 trillion synapses into something that can reason and see and hear and feel and smell and remember can be thought to be explained by natural selection occurring over a mere 4-5 billion years. Anyone who's tried to simulate evolution in computers with a tiny fraction of variables of the real world should have some idea of how tiny that time really is to produce our brains by evolution, especially while recognizing that complexity explodes with increasing variables. It's miraculous then that the idea that we are designed and created isn't the normative claim.
Ahammed Jafar Saadique I know I don’t know enough about evolution to defend that position with scientific evidence; I defend it by other means. That said is your confidence to suggest that you can defend the contrary position with scientific evidence?
@@namaan123 The study on the evolution of human brain is pretty wide and diverse. I don't know what I'm supposed to prove here. Btw I can link you to some cool reads you can do in your free time to broaden your knowledge on human brain evolution. And I'm sorry if I came as arrogant in the last comment. humanorigins.si.edu/human-characteristics/brains www.yourgenome.org/stories/evolution-of-the-human-brain
funny that the money to pay all this AI services will be produced by other machines/AI's...at the end the human is practically out of the equation...nwo...cite:max tegmark,life 3.0
@@444haluk I'd love to see how species will fare as the sun grows into a red giant. Before you say humans won't last long with the way we are polluting the earth. Humans will survive, the numbers will vary and many may perish. But the species will survive, we are the only ones with the best chance to turn into a spacefaring civilization.
2:58 I would love an in-depth GPT-3 video, explaining how it works, the algorithms behind it, the results it has achieved, and its implications for the future.
If you're brave you can always try to read the papers much more informative , But I call them a total waste of time cause its allot. Look for "Autoregressive model that an easy start but before that read about entropy coding - Shannon. Then you basically have a little grasp whats going on just a little.
You can always go on youtube and search for it on your own. I recommend videos that are about 40 minutes long on the subject, else they are cutting too much of the details.
This is what I love about 2020 and the Internet. Two decades ago a channel concentrated on the eclectic scientific subjects that Lex covers would have had little activity. But I was thrilled to see that this video, released only hours ago, has a ton of comments and likes on it already, just like a typical TH-cam "video star" channel! :D On the darker side. The millions of dollars required to train a network like GPT-3 does torpedo somewhat the "democratization" of AI initiative. And yes, in X years the power required to train a GPT-3 system might fit in a smart phone. But when that happens there surely will be new hardware as powerful to that coming "genius" smart phone, as the computing cluster that GPT-3 was trained on is to the typical computing resources the average person can afford today. Perhaps it will be some astonishing combination of quantum computing and vast distributed parallel processing (or said more humorously by Marvin in The Hitchhiker's Guide to the Galaxy, a computing platform with the "brain the size of a planet") . Maybe that's just the way the Universe is and always will be??
I'd love to see what Google's/BERT's response is to GPT-3. After all, Google has the largest amount of compute resources in the world. Plus I'm sure their newest cloud TPU's can train GPT-3 much more quicker and efficiently than the many "general purpose" GPU's this exercise by OpenAI required.
Yep, but maybe we should improve ourselves too. Great technology become nothing, when it operated by idiots without power of will. We already drag behind of instruments, that could improve our live quality and this is already insane.
Thanks for feedback for more guidance from me on digital currencies... W. H. A.. T.. S. A. P. P.........+1........4..........3..........5........2........2.........4..........5.........1..........5..........6@
At this time there has came out faster and memory efficient training method: There is no need to train every synapse at each iteration, but only subset of them
To be honest, I was expecting a figure in the ballpark of the hundreds of trillions of USD, more the entire World's GDP and stuff. USD 2.6 billion doesn't sound impossible even in 2020. Maybe I'm poisoned by reading about billions too much, and startups like WeWork being worth dozens of billions - but some company/individual investing USD 2.6 bi / USD in 2020 to have a "maybe-too-close-to-human-like" language model, or at least something that is at least hundreds or thousands of times better than GPT-3 sound feasible to me.
Now this what the whole world needed, Getting a bit of an idea from different articles stating what GPT-3 is but not really we got any update or clue.👋 This is the real thing that you have talked about Lex.👌👌👌👌👌👌 Good one....😺😺😺😺😺😺😺😺😺😺😺😺😺😺😺
Lex, it's not about how complex is the box, it's about how the box can interface with the world in a close loop of changing the world and being changed by the changes it makes upon the world.
@@victoraguirre92 neurolink is outdated. :) Search for "artificial dopamine neuron" - we just invented it, no upload, direct brain expansion is possible. Basically we invented possibility of a better brain for ourselves.
@Ronit ganguly on the way, we're more likely to encounter a partial AGI(which may or may not be misleading) though. Are there discussions about it(rather than full AGIs)?
@Ronit ganguly that's such an optimistic spirit :) I brought up about AGI on r/machinelearning and they confused it with those sensationalist fearmongering news that make AGI seem to be real-life Terminator/Matrix/Screamers, claimong AGI is jus a fantasy and such. They think MLPs being math functions mean they can't be intelligent or the like, even though that's a bit like saying "a bunch of proteins and electric signals can't have self-awareness". Btw AFAIK most discussions regarding AGI hazard(like on Robert Miles' channel) seem to revolve around a hypothetical 'perfect'/'full' AGI, but what about the sub-AGIs we're likely to encounter first? Would they make different mistakes due to not being as intelligent? Btw which papers have you worked on?
Since I saw openAI play dota 5v5 it was clear to me that ML is capable of amazing things. Sure there are a lot of edge cases and weird stuff, but to see human behavior (like self sacrifice for the team good, intimidation tactics, baiting, object permanence, etc) emerge from a machine was just mind blowing. It would be really nice if you did a video about it or invite someone from the team on the podcast to talk about it.
I'm pretty sure linear scaling does not apply here. I think N log(N) is probably more applicable, perhaps even N^2. AI training is an optimization problem, and optimization with linear scaling is like the holy grail. I would like to see some expert input here!
86 billion neurons in the human brain. And about 10.000 connections between neurons (action potential), so it results 860 trillions (almost 1 quadrillion) of potentials connections what is called “connectome” and somehow make you - you.
Interesting reasoning. I wondered if we can continue doubling training efficiency at this rate. But yes... things might get interesting if there is a model containing 100 trillion parameters.
@@VincentKun no he couldnt. Even with a 100 trillion parameter network you may not reach general intelligence. A scaled up language model isnt the same thing as an AGI.
@@inutiledegliinutili2308 it's kinda yes and no. For example in Asimov's books we have Strong AI, that is basically an AGI, with self-consciousness. But in reality we don't know even what it is consciousness ... And when it appears. So i can't answer for sure, i could also be wrong
GPT-3 Sentence completion: Humanity is...[about to become obsolete] [unprepared for what's coming] [blissfully ignorant of the future they are about to experience]
Hi I wanted to know if GPT 4 is created what does that mean for, 1) Data Scientists, will their jobs go extinct? 2) Research in NLP, will it be complete with nothing to research?
2033 - GPT-4 Helps humanity create the Warp Drive. Asimov said in his (fiction) books that AI would help humanity accomplish just that - really really hope it comes true.
Great video, what about the fact that we don't use all 100 trillion for language but many other types of cerebral processing. Do we know how many synapses are used in the brain for language processing? Or is that not relevant as you can't process in isolation....
I like the short video, just started learning to code 3 months ago. Hopefully, I'll get the chance to work on some machine learning before the programs write themselves haha
Can you please do a long-form interview with GPT-3? Its responses depend heavily on the questions asked and I think you could extract some fascinating answers from it.
Thanks for feedback for more guidance from me on digital currencies... W. H. A.. T.. S. A. P. P.........+1........4..........3..........5........2........2.........4..........5.........1..........5..........6.s
It is also worth noting that not all of the brain's synapses are relevant to language processing, so we may be even closer to human performance than we know
Human brain has many synapses yes, but most are static or moving incredibly slow. Perhaps a Trillion node neural network running at say 100Hz around the outside and quicker around sound and vision processing centers could fit on a single chip and run cool.
If Nick Bostrom is right with assuming that language capability equals general intelligence - then, by your estimation Lex, we will have AGI by 2032. "If somebody were to succeed in creating an AI that could understand natural language as well as a human adult, they would in all likelihood also either already have succeeded in creating an AI that could do everything else that human intelligence can do, or they would be but a very short step from such a general capability" (Bostrom 2014)
There was a paper "Adaptive Flight Control With Living NeuronalNetworks on Microelectrode Arrays" by Thomas B. DeMarseand Karl P. Dockendorf. Where they connect a chips to rat neuron cells and use it to train a "literal" neural network to create an adaptive flight control in a flight simulator. The weights of the network are adjusted via killing or stimulating growth of cells at specific location via low/high frequency electric pulse. There was also another channel "The thought emporium" where the content creator cultured a petri dish full of commercially bought human neuron cells (grown from induced stem cells, so no human was harmed) and hook them up with electrodes and attempt to do some basic neural network task like number recognition with it. So technically a neural network might be replicated within a human brain with a bit more technology. Although that machine version of neural network might not be the same as the natural version of neural network.
Is training complexity for neural networks really have linear dependency on size ? Bigger network requires bigger training data and more epochs. So, IMHO it will be at least quadratic dependence (or even more)
That is an interesting idea that you might unlock more capabilities at 100 trillion synapses. However even at 175 billion parameters, GPT-3 was hitting a performance ceiling that had more to do with the algorithm not actually having conscious thought. In other words, simply having access to more training data and more parameters will only get you so far before you need to have conscious thought which then steps into the metaphysical...
I think 100 trillion (or even an order of magnitude or so lower) is a reasonable upper bound for language in and of itself. This is just NLP. The brain does much more and there's reason to believe it can work with abstract primitives, archetypes and a limited geometry subset, which this GPT doesn't really cover. While I believe we're closer to an AI we can write to than we think, we'll need similar effort in a few other key domains required for making sense of the world more generally.
You better get some impressive scalability for that price, because you can get quite a few talented people to be "as smart as humans" for $2.6 billion.
GPT-4 would be about 400TB on disk, and GPT-3 was trained on roughly 2TB of data (499B tokens). If GPT-4 is trained on the same data it would be overfitting it in less than 1 epoch - it would just memorize it perfectly. I agree with what the community is saying, that # of parameters and synapsis isn't everything. The brain has different types of synapses, processes information asynchronously, has other types of supporting neural cells, there is just no comparison between a biological neuron and an ANN. What is interesting is to see how the GPT architecture/methodology can be extended beyond a language model. Incorporating elements from cognitive systems such as planning, reasoning, etc for more generalized intelligence. Albeit even if just in the NLP domain, we should start to see something getting closer to AGI.
GPT-3 is able to play chess via in-context learning, but not to a mastery level. do you think GPT-x, without task-specific training, could play chess at a mastery level?
It took OpenAI about a year and a half to go from GPT2 to 3. Figure they will continue this path with 5-10X increase as the cost to train the network drops.
Shouldn't we also take into account the fact that computing gets cheaper and cheaper? Or is this advancement negligible compared to the one of the training efficiency of NNs? I remember Jim Keller said in your podcast that Moore's law is not dead yet. Love from 🇮🇹
its 3.14E+23 FLOP, with no S. FLOP is for floating Point operation, and FLOPS is floating point operations per second. so an amount of calculation is in FLOP, and the calculation power of a computer is in FLOPS
By 2040, I do think we will not only have a better training efficiency but also a better learning efficiency (this might come from model architecture and learning algorithm improvements).
Awesome video, I hope you do keep making more like this. One question though, you account for the improving efficiency of neural networks leading to less expensive training, but is it not also true that compute will continue to get cheaper as well?
@stack Whale if it is, that's kinda what I'm asking about. I don't actually know if compute is factored into the increased efficiency he is talking about.
@stack Whale I honestly wasn't trying to "click bait" anyone. If my question doesn't apply, I am happy to discuss that. I'm pretty sure we are actually both just on different pages and discussing this through the low bandwidth of the comments section clearly isn't going to rectify that issue. I frankly really don't appreciate this comment, especially as someone who is just trying to gain knowledged.
To some degree it might be fairer to compare only a limited subsection of the brain to GPT-3. As GPT-3 doesn't have to perform sensory processing, homeostasis regulation, motor control. It doesn't make judgments regarding threats or rewards. At least not yet.
Lex, you're a smart guy. If the object is to re-create a human brain via a computer and software, then I believe this can be done much cheaper. I don't think it has anything to do neural nets or trillions of synapses. The biggest problem I see with all of this is not the technical aspects but rather the discovery that you cannot have a true intelligence without consciousness. To me, consciousness is a pre-requisite for real intelligence. Then with an artificial consciousness, there comes lots of other issues like morality, laws, rights, representation, etc.
Depending on the speed at which they attempt to develop it, the cost of creating GPT-Human level could reach billions. However, there are significant challenges they must overcome. For instance, addressing short-term memory loss and the even more critical issue of catastrophic forgetting is essential. These obstacles prevent them from seamlessly combining too many models to create a supermodel. Doing so would necessitate reconstructing the models from scratch. When the announcement of AGI achievement eventually arrives, skepticism is warranted. It’s unlikely to be a pure AGI but rather an evolution of existing models-perhaps an advanced GPT-5 or a cleverly woven amalgamation of specialized agents.
Why do we need bigger and more complex models? We don't even have usefull applications for most of existing NN-models. We should start with simpler things, like that: th-cam.com/video/v1Nb_Da48og/w-d-xo.html And then integrate such AI successively in our everyday life, so that society can adapt and learn to work with it. Like with computers. But americans tend to build the ONE BIG Skynet model, for some reason? I'm a fan of small and specialized models. That makes them more efficient.
what about quantum computing in training models? how may that affect future ML algos and its associated costs? any idea if there are works going on with quantum computing and training models?
Yes, it passes the Turning test for language models, but it doesn't know what an apple is. That aspect is what I'm watching out for. Exciting times. GPT-3 would be amazing at grading grammar papers, though.
In 1980, a Digital Vax 780 sold at price USD 500,000. Today a Raspberry Pi 4 with at least 1000 times more computing power than Vax 780 cost only USD 50. So, I would say the most important thing is not the cost, but the idea. If the idea is feasible and with infinite potential , then people will do their best to bring down the cost.
Thanks for feedback for more guidance from me on digital currencies... W. H. A.. T.. S. A. P. P.........+1........4..........3..........5........2........2.........4..........5.........1..........5..........6@
I had to stop watching this around 1:30 when I failed to reconcile how a computer parameter is the same as a synapse. You aren't talking about 0 or 1 binary values, you're talking about parameters which can form equations with each other being variables from broader number sets, right? If 100 T "microsynapses" exist, those can be modeled as 0 or 1's and they can only influence the directly adjacent neurons, right? So that is a super exponential Decrease in the usefulness of the human synapses when compared to traditional computer parameters. So exponential that I don't think this math is worth exploring without reconciling that with a "parameter" up front!
I don't quite get this comparison. I guess it depends on the architecture of the neural net, but parameters I would say are equivalent to neurons. And synapses are equivalent to connections in an artificial neural net. So if each neuron in GPT-3 is connected to 1000 other neurons then we would have 176 trillion synapses in GPT-3, which is already more than human brain. Already the parameters in GPT-3 (176) would be more than the human brain's neurons (86).
Most of the synapses in human brain have nothing to do with language and reason, huge part of brain purely deals with innervating our bodies, reading and processing inputs from sensory organs and give commands to muscles and other effector organs. So to simulate human language and reasoning behind it, it might require much less than 100 trillion parameters.
A lot of those 100 trillions parameters of the human brain are dedicated to other tasks like sight, earing, feeling, smelling, tasting. So not all of it is dedicated to the tasks that GPT-X are meant to perform. Thus, we could expect a far lower number of parameters for dedicated funtions equilavent to the human brain.
The human brain as a whole may have 100T synapses, but for most humans, their knowledge and skill set pertaining to any specific domain may be worth much smaller than that - say, of the order of 1T synapses. So, in the near future, domain-specific GPT-xes could outperform most humans in most mental tasks that involve the use of natural or formal languages. That could make AI the predominant workforce in a knowledge economy.
GPT-3 has 175 billion parameters/synapses. Human brain has 100 trillion synapses. How much will it cost to train a language model the size of the human brain?
Not all of human brain synapses are used for language processing, though. It's gonna be super-human.
@@haulin I was thinking about the same. Not all parts of human brain is used to get there.
Today (2020) it costs a human life to train a human brain 🧠👀 👁 👅 👄 🩸💪 🦵
thanks Lex:)
It depends on whether the lottery ticket hypothesis is verified or not at brain scale. In this case the cognitive power of a much larger brain could be reached within a much smaller one.
I suspect new search mecanisms would have to be invented to discover these optimally sized architecture .
The level of brain plasticity observed on subjects that have lost part of their brains leans toward that hypothesis .
That's interesting because, if the trend continues, it will also cost $5M to train a human brain at college in 2032
College trains the human brain to be a good obedient worker slave for the big corps.. 9 to 5 9 to 5 9 to 5 9 to 5
And it extinguishes all sense of humor.
I am certain this comment was generated using GPT-3
I don't know about that trend as you would be multiplying with 0 if you did this in any civilized country.
This comment was made by Scandinavia gang
R M L college is what helped teach me Java, Python, JS, etc... but yeah, totally a scam 🙄
These short highly focused videos are a nice mental appetizer, and its easy to set aside 5 mins to watch them between consecutive unsuccessful model training runs
lol, watching a model train is soon going to be a trend (❁´◡`❁)
on point man
lmao
actually my model is training and i am watching this video. lol
@@AbhishekDubey-mp3ys I have some model trains. They're HO scale. I don't play with them much any more, though.
It would be awesome to see your breakdowns on GPT-3. Explain to us dummies how it works!
th-cam.com/video/SY5PvZrJhLE/w-d-xo.html
Yannic's video does a good job explaining the paper but might be a bit long
How about writing a gpt-3 app that explains you how it works
it's basically the auto-predict-next-word feature on your phone after a few cups of coffee
Thank you for this post. Powerful topic. Excellent description of the potential for this platform
and hurdles involved.
Wow !
Surprised to find your comment here
GPT-800: I need your clothes, your boots and your motorcycle 😎😂
Lol
I think that model's name is T-800 :)
Don’t forget sunglasses
@@xyzabc6741 whoosh
More likely it will be GPT-5
You forget that the 100 trillion synapses doesn't only do language, it does vision, reasoning, biological function, fine motor control, and much more. The language part (if we can isolate from other parts) probably uses a fraction of those synapses
It might be hard to quantify how many neurons are associated with language, since language, vision, hearing and touch are very much interconnected in our brains. You can't learn a language if you can't see, hear and touch.
@@postvideo97 you actually can learn a language with any of these senses (e.g. touching is enough)
The brain is so interconnected it’s hard to put a figure on how many synapses are used for a single task although estimations are good.
@@thomasreed2427 That seems like a bit of an exaggeration to me. To replicate some of the behavior of a single brain neuron (eg xor), you would need 4 of our current neurons.
Let's take 10 times that and round upwards, 40≈100, to cover it more accurately. The structure of the brain could also give an additional 10, or even 100, times requirement with our type of neurons. Remember that just giving it an additional 10 times, is 10 times its current size, i.e. it could do the same job 10 different ways.
So personally I think you might need at most 100*100 = 10 000 times larger than 100 trillion. But idk ¯\_(ツ)_/¯
Wow u guys are all yt og's
I think it is important that computer scientists use the term neuron and synapse very carefully. I am a molecular biologist and to equate neurons in neural networks to biological neurons, or even a synapse, is like calling an abacus a quantum computer. I don't say this to diminish machine learning at all, I use it as a biologist, and I've been showing my whole family AI Dungeon 2 utilizing GPT-3; it really is tremendous. But there is such a large difference between computer neurons (I'll just call them nodes) and biological neurons.
Each neuron itself could be represented as a neural network with probably 100s of trillions of nodes or maybe magnitudes more, and each of those nodes would itself consist of probably thousands or millions of nodes in their own neural network. This is to say that the computation involved in determining whether there is an action potential or not is truly massive. I wish I could put this into more precise words but the complexity of even a single neuron is far, far greater than the complexity of all human systems of all times compiled into even a single object.
I will try to exemplify this using a single example in my field of expertise, microRNA. The synapse consists of multiple protein complexes that work to transmit a chemical signal from outside the cell to inside the cell. In this case, the outside signal is created by another neuron. Every one of those proteins has dozens (and probably a lot more than that) of regulatory steps along the path of its production, localization, and function. These regulatory steps happen over time and themselves consist of other molecules produced/consumed by the neuron, each of which have their own regulation.
Now let's say we have neuron 1 and it is trying to form a synapse with neuron 2. At the position neuron 1 and 2 physically interact, communication has already happened and all the necessary players (small molecules, RNA, and protein) have been recruited to this location. The moment of truth arrives, neuron 1 has an an action potential. Neuron 2 starts to assemble a brand new synapse at that location but this does not end in the production of a new synapse. In neuron 2, perhaps hours or days previous it decoded a complex network of extracellular signals that culminated in the localization of a specific microRNA at the location of this potential synapse. At the same time neuron 2 receives the signal from neuron 1, that microRNA is matured and is made active over a period of minutes. Instead of this new synapse being formed on neuron 2, this specific microRNA causes the production of protein necessary for its completion to stop and the whole process is aborted.
At every step of every process in normally functioning neurons, these seemingly miraculous processes are occurring. They are occurring in our billions of neurons over our entire lives, existing in a body of trillions of cells that are all equally as complex, communicating with each other always and for our entire lives.
I say this not to demean or lessen the work of you, Lex, or any other computer scientist. But I say this to humble us, for us to be a little more careful when we say so casually, "it's just computation."
Wow - thanks for the comment this is very interesting!
Reading this, what's truly miraculous to me is that the organization of these 100 billion neurons with 100-1000 trillion synapses into something that can reason and see and hear and feel and smell and remember can be thought to be explained by natural selection occurring over a mere 4-5 billion years. Anyone who's tried to simulate evolution in computers with a tiny fraction of variables of the real world should have some idea of how tiny that time really is to produce our brains by evolution, especially while recognizing that complexity explodes with increasing variables. It's miraculous then that the idea that we are designed and created isn't the normative claim.
@@namaan123 I guess you should read more about evolution than what's given in the comments lol. Designer, my ass
Ahammed Jafar Saadique I know I don’t know enough about evolution to defend that position with scientific evidence; I defend it by other means. That said is your confidence to suggest that you can defend the contrary position with scientific evidence?
@@namaan123 The study on the evolution of human brain is pretty wide and diverse. I don't know what I'm supposed to prove here. Btw I can link you to some cool reads you can do in your free time to broaden your knowledge on human brain evolution. And I'm sorry if I came as arrogant in the last comment.
humanorigins.si.edu/human-characteristics/brains
www.yourgenome.org/stories/evolution-of-the-human-brain
the cost of training will be nothing compared to all the money they make on selling this as a service :o
funny that the money to pay all this AI services will be produced by other machines/AI's...at the end the human is practically out of the equation...nwo...cite:max tegmark,life 3.0
Let's hope it won't be about advertisements.
@@444haluk only humans have money though
@@444haluk I'd love to see how species will fare as the sun grows into a red giant. Before you say humans won't last long with the way we are polluting the earth. Humans will survive, the numbers will vary and many may perish. But the species will survive, we are the only ones with the best chance to turn into a spacefaring civilization.
baby bean That is so wrong it’s not even funny
2:58 I would love an in-depth GPT-3 video, explaining how it works, the algorithms behind it, the results it has achieved, and its implications for the future.
If you're brave you can always try to read the papers much more informative , But I call them a total waste of time cause its allot. Look for "Autoregressive model that an easy start but before that read about entropy coding - Shannon. Then you basically have a little grasp whats going on just a little.
You can always go on youtube and search for it on your own. I recommend videos that are about 40 minutes long on the subject, else they are cutting too much of the details.
it is spyware
This is what I love about 2020 and the Internet. Two decades ago a channel concentrated on the eclectic scientific subjects that Lex covers would have had little activity. But I was thrilled to see that this video, released only hours ago, has a ton of comments and likes on it already, just like a typical TH-cam "video star" channel! :D
On the darker side. The millions of dollars required to train a network like GPT-3 does torpedo somewhat the "democratization" of AI initiative. And yes, in X years the power required to train a GPT-3 system might fit in a smart phone. But when that happens there surely will be new hardware as powerful to that coming "genius" smart phone, as the computing cluster that GPT-3 was trained on is to the typical computing resources the average person can afford today. Perhaps it will be some astonishing combination of quantum computing and vast distributed parallel processing (or said more humorously by Marvin in The Hitchhiker's Guide to the Galaxy, a computing platform with the "brain the size of a planet") . Maybe that's just the way the Universe is and always will be??
I'd love to see what Google's/BERT's response is to GPT-3. After all, Google has the largest amount of compute resources in the world. Plus I'm sure their newest cloud TPU's can train GPT-3 much more quicker and efficiently than the many "general purpose" GPU's this exercise by OpenAI required.
Yep, but maybe we should improve ourselves too. Great technology become nothing, when it operated by idiots without power of will. We already drag behind of instruments, that could improve our live quality and this is already insane.
Thats the price of our last invention... After that...we might just be at best associate producers on everything.
The Erudite
AI sees another field to takeover: “another one” ☝️
Love these quick videos! Keep it up!
2:30 Looks like Ray Kurzweil’s prediction for the singularity is tracking pretty accurately.
Don't be so confidence
Remember that was what scientist said about the TOE back in 1980's and now in 2020 we are not even close
These short vids are great, keep them coming man, nice job!
Thanks for feedback for more guidance from me on digital currencies...
W. H. A.. T.. S. A. P. P.........+1........4..........3..........5........2........2.........4..........5.........1..........5..........6@
I thought 175 billion parameters were a lot... They are actually! Wonderful!
These short videos are so good. Thanks for sharing them with us.
At this time there has came out faster and memory efficient training method: There is no need to train every synapse at each iteration, but only subset of them
Very sexy
More efficient learning was covered by the video
These are great Lex!! Keep em coming !
Loving these short videos.
To be honest, I was expecting a figure in the ballpark of the hundreds of trillions of USD, more the entire World's GDP and stuff.
USD 2.6 billion doesn't sound impossible even in 2020. Maybe I'm poisoned by reading about billions too much, and startups like WeWork being worth dozens of billions - but some company/individual investing USD 2.6 bi / USD in 2020 to have a "maybe-too-close-to-human-like" language model, or at least something that is at least hundreds or thousands of times better than GPT-3 sound feasible to me.
Hello,
Thanks for your time and efforts! I love the idea of the short videos! I'm very grateful for all of your hard work!
Now this what the whole world needed, Getting a bit of an idea from different articles stating what GPT-3 is but not really we got any update or clue.👋
This is the real thing that you have talked about Lex.👌👌👌👌👌👌
Good one....😺😺😺😺😺😺😺😺😺😺😺😺😺😺😺
absolutely love this!!!! need more videos and a jre appearance from you to explain Gpt 3 deeply!
Good job Lex, really like the format. Thank you for sharing the knowledge.
Lex, it's not about how complex is the box, it's about how the box can interface with the world in a close loop of changing the world and being changed by the changes it makes upon the world.
It's not an exaggeration when people say that the most valuable possession you have is your brain...
Not for long... :)
You are your brain
@@LordAlacorn Care to explain? I heard something about neurolink but I just can't comprehend the idea of uploading your consciousnesses.
@@victoraguirre92 neurolink is outdated. :)
Search for "artificial dopamine neuron" - we just invented it, no upload, direct brain expansion is possible.
Basically we invented possibility of a better brain for ourselves.
Alacorn Thanks
Not all human synapses are dedicated to image processing
Imagine a computer model that does
@Ronit ganguly on the way, we're more likely to encounter a partial AGI(which may or may not be misleading) though. Are there discussions about it(rather than full AGIs)?
@Ronit ganguly that's such an optimistic spirit :) I brought up about AGI on r/machinelearning and they confused it with those sensationalist fearmongering news that make AGI seem to be real-life Terminator/Matrix/Screamers, claimong AGI is jus a fantasy and such. They think MLPs being math functions mean they can't be intelligent or the like, even though that's a bit like saying "a bunch of proteins and electric signals can't have self-awareness".
Btw AFAIK most discussions regarding AGI hazard(like on Robert Miles' channel) seem to revolve around a hypothetical 'perfect'/'full' AGI, but what about the sub-AGIs we're likely to encounter first? Would they make different mistakes due to not being as intelligent?
Btw which papers have you worked on?
Fantastic video, would love to see your thoughts on the potential of this technology and how you think it will impact the world
Since I saw openAI play dota 5v5 it was clear to me that ML is capable of amazing things. Sure there are a lot of edge cases and weird stuff, but to see human behavior (like self sacrifice for the team good, intimidation tactics, baiting, object permanence, etc) emerge from a machine was just mind blowing. It would be really nice if you did a video about it or invite someone from the team on the podcast to talk about it.
thank you for all the great content bro
Would honestly love to see some lectures or video essays on these subjects from you
You're awesome, Lex. Keep up the incredible work.
I'm pretty sure linear scaling does not apply here. I think N log(N) is probably more applicable, perhaps even N^2. AI training is an optimization problem, and optimization with linear scaling is like the holy grail. I would like to see some expert input here!
Keep making "this kind of things" please! :) Bite-sized ideas and information!
86 billion neurons in the human brain. And about 10.000 connections between neurons (action potential), so it results 860 trillions (almost 1 quadrillion) of potentials connections what is called “connectome” and somehow make you - you.
Thank you lex for explaining this. I am extremely grateful for your videos and explanations
Interesting reasoning. I wondered if we can continue doubling training efficiency at this rate. But yes... things might get interesting if there is a model containing 100 trillion parameters.
Absolutely brilliant Lex .
Liked this format, small, easy to digest 🙌
Big fan! I love your videos!
with a tiny fraction of Jeff Bezos' fortune we would be able to train GPT-4 today.
Bezos could become god right now
@@VincentKun no he couldnt. Even with a 100 trillion parameter network you may not reach general intelligence. A scaled up language model isnt the same thing as an AGI.
@@xsuploader i was joking about it, of course you can't develope self consciousness with this GPT3
Vincenzo Gargano Do you need self consciousness to have an agi?
@@inutiledegliinutili2308 it's kinda yes and no.
For example in Asimov's books we have Strong AI, that is basically an AGI, with self-consciousness.
But in reality we don't know even what it is consciousness ... And when it appears.
So i can't answer for sure, i could also be wrong
GPT-3 Sentence completion: Humanity is...[about to become obsolete] [unprepared for what's coming] [blissfully ignorant of the future they are about to experience]
all of the above
“How do you snorgle a borgle?”
GPT3: “With a snorgle.”
More of these videos! Especially from a philosophical standpoint
Good Morning Dave..
I love these little vids!
Would love for a video on Lex's work station/setup, what laptop he uses, what OS, his daily tech bag etc etc.
I really like this idea, shorts are great.
Hi I wanted to know if GPT 4 is created what does that mean for,
1) Data Scientists, will their jobs go extinct?
2) Research in NLP, will it be complete with nothing to research?
Never understood why big data looked appealing to people. You better start learning machine learning if you are into big data.
@@avinashsooriyarachchi994 yeah we can try making better models for different things but we will never achieve AGI
@@SahilP2648 you are religeous, correct?
@@douglasjamesmartin no I am an atheist and a developer lol
Douglas Martin Lol, what type of question was that?
And what would be the cost of actually gathering the data on which to train the gpt3?
Amazing short video. 👍
2033 - GPT-4 Helps humanity create the Warp Drive.
Asimov said in his (fiction) books that AI would help humanity accomplish just that - really really hope it comes true.
True that, but screw that Asimov ruleset
@@zubinzuro Do you mean the four rules of robotics?
(I'm counting with the zeroth law here)
Great video, what about the fact that we don't use all 100 trillion for language but many other types of cerebral processing. Do we know how many synapses are used in the brain for language processing? Or is that not relevant as you can't process in isolation....
It's crazy that two years later we now have GPT-4 with 1 trillion parameters.
I like the short video, just started learning to code 3 months ago. Hopefully, I'll get the chance to work on some machine learning before the programs write themselves haha
Yay Moore's Law. 🥳
More Moore. 🎉🎉
Can you please do a long-form interview with GPT-3? Its responses depend heavily on the questions asked and I think you could extract some fascinating answers from it.
Thanks for feedback for more guidance from me on digital currencies...
W. H. A.. T.. S. A. P. P.........+1........4..........3..........5........2........2.........4..........5.........1..........5..........6.s
It is also worth noting that not all of the brain's synapses are relevant to language processing, so we may be even closer to human performance than we know
Human brain has many synapses yes, but most are static or moving incredibly slow.
Perhaps a Trillion node neural network running at say 100Hz around the outside and quicker around sound and vision processing centers could fit on a single chip and run cool.
ok what would we do with it
Thanks Lex!
If Nick Bostrom is right with assuming that language capability equals general intelligence - then, by your estimation Lex, we will have AGI by 2032.
"If somebody were to succeed in creating an AI that could understand natural language as well as a human adult, they would in all likelihood also either already have succeeded in creating an AI that could do everything else that human intelligence can do, or they would be but a very short step from such a general capability" (Bostrom 2014)
There was a paper "Adaptive Flight Control With Living NeuronalNetworks on Microelectrode Arrays" by Thomas B. DeMarseand Karl P. Dockendorf. Where they connect a chips to rat neuron cells and use it to train a "literal" neural network to create an adaptive flight control in a flight simulator. The weights of the network are adjusted via killing or stimulating growth of cells at specific location via low/high frequency electric pulse. There was also another channel "The thought emporium" where the content creator cultured a petri dish full of commercially bought human neuron cells (grown from induced stem cells, so no human was harmed) and hook them up with electrodes and attempt to do some basic neural network task like number recognition with it. So technically a neural network might be replicated within a human brain with a bit more technology. Although that machine version of neural network might not be the same as the natural version of neural network.
Is training complexity for neural networks really have linear dependency on size ? Bigger network requires bigger training data and more epochs. So, IMHO it will be at least quadratic dependence (or even more)
That is an interesting idea that you might unlock more capabilities at 100 trillion synapses. However even at 175 billion parameters, GPT-3 was hitting a performance ceiling that had more to do with the algorithm not actually having conscious thought. In other words, simply having access to more training data and more parameters will only get you so far before you need to have conscious thought which then steps into the metaphysical...
I think 100 trillion (or even an order of magnitude or so lower) is a reasonable upper bound for language in and of itself. This is just NLP. The brain does much more and there's reason to believe it can work with abstract primitives, archetypes and a limited geometry subset, which this GPT doesn't really cover. While I believe we're closer to an AI we can write to than we think, we'll need similar effort in a few other key domains required for making sense of the world more generally.
You better get some impressive scalability for that price, because you can get quite a few talented people to be "as smart as humans" for $2.6 billion.
GPT-4 would be about 400TB on disk, and GPT-3 was trained on roughly 2TB of data (499B tokens). If GPT-4 is trained on the same data it would be overfitting it in less than 1 epoch - it would just memorize it perfectly.
I agree with what the community is saying, that # of parameters and synapsis isn't everything. The brain has different types of synapses, processes information asynchronously, has other types of supporting neural cells, there is just no comparison between a biological neuron and an ANN.
What is interesting is to see how the GPT architecture/methodology can be extended beyond a language model. Incorporating elements from cognitive systems such as planning, reasoning, etc for more generalized intelligence. Albeit even if just in the NLP domain, we should start to see something getting closer to AGI.
GPT-3 is able to play chess via in-context learning, but not to a mastery level. do you think GPT-x, without task-specific training, could play chess at a mastery level?
I was under the illusion that the computational cost would rise exponentially, not linearly, if you add more nodes, is that incorrect?
*I think it is correct*
It took OpenAI about a year and a half to go from GPT2 to 3. Figure they will continue this path with 5-10X increase as the cost to train the network drops.
this is great thanks Lex
Shouldn't we also take into account the fact that computing gets cheaper and cheaper? Or is this advancement negligible compared to the one of the training efficiency of NNs? I remember Jim Keller said in your podcast that Moore's law is not dead yet.
Love from 🇮🇹
this is taking that into account
the 50% deflation every 16 months is a combination of hardware and software improvements.
@@xsuploader oh ok. thanks
are the numbers adjusted for inflation?
its 3.14E+23 FLOP, with no S. FLOP is for floating Point operation, and FLOPS is floating point operations per second. so an amount of calculation is in FLOP, and the calculation power of a computer is in FLOPS
They say in 100 years we achieve but years is just a number
Can GPT-3 write an award winning novel? I bet it can't!
Is there a coherent argument explaining how transistors compares to synapses for total computing power?
By 2040, I do think we will not only have a better training efficiency but also a better learning efficiency (this might come from model architecture and learning algorithm improvements).
Mind blown. We're in an overhang where so much is feasible it's just a matter of trying it out.
Awesome video, I hope you do keep making more like this. One question though, you account for the improving efficiency of neural networks leading to less expensive training, but is it not also true that compute will continue to get cheaper as well?
@stack Whale if it is, that's kinda what I'm asking about. I don't actually know if compute is factored into the increased efficiency he is talking about.
I see it as two factors of the same problem. I am happy to discuss how I may be wrong as I am always eager to learn.
@stack Whale I honestly wasn't trying to "click bait" anyone. If my question doesn't apply, I am happy to discuss that. I'm pretty sure we are actually both just on different pages and discussing this through the low bandwidth of the comments section clearly isn't going to rectify that issue. I frankly really don't appreciate this comment, especially as someone who is just trying to gain knowledged.
Our Digital On Line Helper is closer than you think! And I am talking about the level of sophistication akin in the move " Her ."
To some degree it might be fairer to compare only a limited subsection of the brain to GPT-3. As GPT-3 doesn't have to perform sensory processing, homeostasis regulation, motor control. It doesn't make judgments regarding threats or rewards.
At least not yet.
Please make podcast on LINUS TORVALDS .
Lex, you're a smart guy. If the object is to re-create a human brain via a computer and software, then I believe this can be done much cheaper. I don't think it has anything to do neural nets or trillions of synapses. The biggest problem I see with all of this is not the technical aspects but rather the discovery that you cannot have a true intelligence without consciousness. To me, consciousness is a pre-requisite for real intelligence. Then with an artificial consciousness, there comes lots of other issues like morality, laws, rights, representation, etc.
Depending on the speed at which they attempt to develop it, the cost of creating GPT-Human level could reach billions. However, there are significant challenges they must overcome. For instance, addressing short-term memory loss and the even more critical issue of catastrophic forgetting is essential. These obstacles prevent them from seamlessly combining too many models to create a supermodel. Doing so would necessitate reconstructing the models from scratch. When the announcement of AGI achievement eventually arrives, skepticism is warranted. It’s unlikely to be a pure AGI but rather an evolution of existing models-perhaps an advanced GPT-5 or a cleverly woven amalgamation of specialized agents.
Why do we need bigger and more complex models? We don't even have usefull applications for most of existing NN-models.
We should start with simpler things, like that: th-cam.com/video/v1Nb_Da48og/w-d-xo.html
And then integrate such AI successively in our everyday life, so that society can adapt and learn to work with it. Like with computers.
But americans tend to build the ONE BIG Skynet model, for some reason? I'm a fan of small and specialized models. That makes them more efficient.
When is GPT-6 come out lex cant wait to play it
what about quantum computing in training models? how may that affect future ML algos and its associated costs? any idea if there are works going on with quantum computing and training models?
Yes, it passes the Turning test for language models, but it doesn't know what an apple is. That aspect is what I'm watching out for. Exciting times. GPT-3 would be amazing at grading grammar papers, though.
Can you go into deep detail about how GPT-3 works? Thanks.
Please consider using scientific notation to represent very large numbers.
In 1980, a Digital Vax 780 sold at price USD 500,000.
Today a Raspberry Pi 4 with at least 1000 times more computing power than Vax 780 cost only USD 50.
So, I would say the most important thing is not the cost, but the idea.
If the idea is feasible and with infinite potential , then people will do their best to bring down the cost.
Thanks for feedback for more guidance from me on digital currencies...
W. H. A.. T.. S. A. P. P.........+1........4..........3..........5........2........2.........4..........5.........1..........5..........6@
I had to stop watching this around 1:30 when I failed to reconcile how a computer parameter is the same as a synapse. You aren't talking about 0 or 1 binary values, you're talking about parameters which can form equations with each other being variables from broader number sets, right? If 100 T "microsynapses" exist, those can be modeled as 0 or 1's and they can only influence the directly adjacent neurons, right? So that is a super exponential Decrease in the usefulness of the human synapses when compared to traditional computer parameters. So exponential that I don't think this math is worth exploring without reconciling that with a "parameter" up front!
He did say a much larger network is likely required to approximate the brain.
I don't quite get this comparison. I guess it depends on the architecture of the neural net, but parameters I would say are equivalent to neurons. And synapses are equivalent to connections in an artificial neural net. So if each neuron in GPT-3 is connected to 1000 other neurons then we would have 176 trillion synapses in GPT-3, which is already more than human brain. Already the parameters in GPT-3 (176) would be more than the human brain's neurons (86).
Most of the synapses in human brain have nothing to do with language and reason, huge part of brain purely deals with innervating our bodies, reading and processing inputs from sensory organs and give commands to muscles and other effector organs. So to simulate human language and reasoning behind it, it might require much less than 100 trillion parameters.
A lot of those 100 trillions parameters of the human brain are dedicated to other tasks like sight, earing, feeling, smelling, tasting. So not all of it is dedicated to the tasks that GPT-X are meant to perform. Thus, we could expect a far lower number of parameters for dedicated funtions equilavent to the human brain.
The singularity is rapidly approaching!
3.14E+23 flops? Did GPT-3 write the paper and mixed up pi and Avogadro's number? 🤔
The human brain as a whole may have 100T synapses, but for most humans, their knowledge and skill set pertaining to any specific domain may be worth much smaller than that - say, of the order of 1T synapses. So, in the near future, domain-specific GPT-xes could outperform most humans in most mental tasks that involve the use of natural or formal languages. That could make AI the predominant workforce in a knowledge economy.