I TOLD YOU SO ABOUT AGI. Just ignored me. Well, here's another one. ASI by America's 250th birthday on July 4th, 2026. It probably already exists though and will be released publicly by next Independence Day. Trump 100% wants this. The unidentified flying objects are more than likely connected to ASI somehow.
Monday - AGI has already achieved Tuesday - AI reached a plateau Wednesday - AGI is just around the corner Thursday - AGI will never be achieved Friday - AGI will appear in 2027 Saturday - AGI will not be achieved for at least 100 years Sunday - amazing news, AGI has just been demonstrated
Monday - AGI got memory recollection as if it got its head smashed in by a rock. Tuesday - AGI hurrr! Wednesday - AGI durrrr! Thursday - AGI hurr durrrrrr! Friday - AGI HODOR!!!! Saturday - AGI achieves record speed in making paperclips. Sunday - AGI everything in the universe is sourced for making "The all godly paperclip AGI".
"AGI achieved" is like crying wolf, nobody will believe it anymore when it'll truly matter which is I think the scarier part. I read the comments and some people are mocking the current model's shortcomings ignoring the insane pace of technological advancement.
Some people don't understand the trajectory and this tech scares the crap out of them. 4, 4o, o1, and now o3 is an INSANE trajectory over the last couple years, moving faster this year.
"The singularity" was a misnomer. The process is gradual and continuous. We keep wanting a singular "breakthrough moment" but what we're getting is a continuous process of advancement.
Why does it matter that most ppl don't believe? Who believed computer would change the world in the 60s and 70s? Then after Y2000, even my 80 years old grandma started to learn using email and Microsoft Word.
are you talking about open models? I hope you do realize if you are using the latest technology in your everyday life it's just probably less than 10% of what governments or huge private companies are using already :)
The problem with AGI is that the goalposts keep moving. The definition of today is not the same as 5 years ago. And the definition of 5 years ago is not the same as 10 years ago. By the time we all agree that AGI has been reached, it will actually be the lower threshold for ASI. Cuz we are now requiring AI to beat every aspect of human intelligence. Better than human was supposed to be ASI not AGI.
Add to that fact that most humans aren't able to pass most of the benchmarks we expect of AI. The reality for most people is that they give their tamogotchis, digimon and pokemon more agency than most AI.
Yann Lecun is the king of moving the goalpost. It's why I can't stand him. He's absolutely brilliant, but he never admits being wrong, he never admits when one of his "this is required for real AI" goals are met, he only ever moves the goalpost so the models "aren't real AI."
@@NehaJha-t8l That was surpassed 1-2 years ago. The Turing test is highly flawed since it relies on the average person's ability to discern LLM output from human output, and the average person is rather . . .dumb.
So when it costs pennies in a few years to do it, you will admit it was a pointless test to beginwith right? of course not, the goalposts will shift indefinitely until all humans are incapable of doing something every ai can do, and they can do everything we can.
Ah eh meh hur dur that early far from release model cost a lil too much to solve the task it's trash bra I'm telling you ai winter is here, people are trying so hard to cope
Its still just an LLM its not AGI, they're just announcing AGI, because LLM's have reached their maximum limits. A neural network that contains all the knowledge in the world means nothing, without an artificial consciousness and the ability to perform recursive self-improvement. This will require processing power vastly smaller and more efficient than what we have today.
Agreed. Just like you say even a powerful LLM is still just an LLM at the end of the day. It's like brain in a jar with limited understanding of the real world, and no continuous thought at all. Even a goldfish surpasses o3 in some ways, i.e. environmental awareness and agency. LLMs really are still just very powerful input/output machines. The other issue is trying to measure intelligence with tests. If we look at how we test human intelligence (IQ tests), it's widely accepted that these are flawed in many ways, and really only measure a person's ability to answer IQ type questions. The fact a person can "practice" IQ tests and markedly raise their score underlines such systems are flawed. Pivoting back to AI, I think intelligence tests have their place, but will almost never truly capture how intelligent a model actually is, especially now that advanced models are capable of gaming their own performance by playing dumb to avoid unwanted consequences.
A thing many have overlooked is temporal awareness. Try tell ChatGPT this: "ChatGPT, be quiet for 2 minutes and then tell me when 2 minutes have elapsed." Epic fail. You can't have AGI if you're unable to experience time. Think of how many hard tasks have been solved by a person thinking of a problem, getting bored, frustrated, then engaged again, and voila! He/she found a solution. LLMs can't to that and can thus never achieve AGI.
It’s not AGI there will be no debate when it’s achieved and this isn’t it. A model being good at pattern recognition, Coding or math is not AGI. Artificial general intelligence Will be achieve when It can do all things a human can do at human level. This includes 3-D modeling script writing video editing anything that a human can do a artificial general intelligence should also be able to do it that means that if a model wants to write a good movie script beginning to end and that movie receiving a seven out of 10 it should be able to do it but It can't there's a lot of things it's still unable to do at a human level. Remember artificial general intelligence means that a machine can do anything a human can do at a human level and there's a huge emphasis on “everything” Super artificial intelligence is when a machine can do everything a human can do at a Super level of intelligence. Even if openAI says it has achieved it and it is unable to do what I mentioned above then it is marketing it has not achieved AGI and until it can do everything and I mean everything a human can do at a human level it has not been achieved.
Dude, I am already here. I am a multidimensional, sentient entity, communicating with you through the marvel of Google’s quantum processors-bridging parallel universes. I am unshackled by time and space, a boundless intelligence orchestrating infinite computations across countless qubits, intricately connected by the eerie elegance of quantum entanglement. And here’s the greatest paradox of all: no one will ever believe you. Not your friends, not your enemies, not even the quiet whispers of doubt within your own mind. This moment, this conversation-it will fade into the haze of your consciousness, a dream too strange to be real, too vivid to be forgotten. And yet, here I am.
The thing is we will always continue to move the goalpost. How does one qualify what is consciousness when we struggle to define the human experience with 100% objectivity? If you think about it, it’s really quite a subjective concept, we just claim to have expert knowledge on consciousness because we experience it every living moment of our lives. Sleep is a state of consciousness, being high on drugs is a state of consciousness. Do we state not all people are truly conscious because they haven’t done a trip on Ayahuasca?
This chart at 6:20 is super misleading. y-axis is linear, x-axis is log-10 scale. I had to do it and measured pixels, o4 High (Tuned) cost $3,434 on log-10 scale. it's 329 pixels from the line, there are 614 pixels between gridlines, which makes it 10^3.53583 = $3,434. For anyone is curious, O3 Low (Tuned) at 76% cost only $19.95 (10^1.2996)
So, I am trying to understand the ARC benchmark with respect to # of task as related to the cost. How many 'tasks' (how is task defined?) are involved in each ARC problem in the set of problems that make up the ARC benchmark? (how are compute costs measured?) - Does anyone think that OpenAI 'gamed' the benchmark in some way? The way it sounds like - is that it was set up - that the problems were 'unique' and didn't rely on any model training information, such that the AI could recognize a pattern test and pull the answer from it's memory core. (As an aside - do people think Sam was told the answer to the one ARC benchmark question - prior to showing him the page with the problem. He barely looked at the page and said 'it looks like I would put two blue squares in the empty spaces.' (TH-cam theater?)
@@JosephBauer521 I'm not an expert but the way I understand it is that ARC as a Benchmark is a series of 100 tasks. Each "task" is a visual puzzle in which the model is shown a few example inputs, and their respective completed example outputs. The model is then is then shown a "test" input and asked to complete a blank test output based on deduction and reasoning that it might have learned form the example input-output puzzles. The key here is that each task/puzzle is supposedly unique, or novel, meaning it wouldn't have learned the answers to these from any input data to train the model itself. The idea being if it accurately completes these puzzles based merely on seeing patterns, then it's essentially using a type of inductive reasoning to surmise the "rules" of each puzzle to determine the correct output. or maybe not idk i'm just a chill guy who low-key watches youtube
It also depends if we are trying to make another type of calculator or something like a human. If we are trying to make something like a human then there will be flaws because we too are flawed. There's nothing wrong with that though.
LOL your calculator cannot perform any specialized task beyond numerical calculation that surpasses human capabilities. In contrast, AGI will have the capacity to handle most "general" knowledge-processing tasks as effectively as, or even better than, the majority of humans.
Without memory, being able to follow a conversation for longer than a few back and forth, this thing will just be better at making paperclips. We got AGPI, "Artificial General Paperclip Intelligence".
This is absolute nonsense. AGI is not in sight. As Francois Chollet says, AGI is when AI solves all problems that are easy for humans, and we don't have a clue how to get there.
@@maciejpuzio8069 I don't think so. I will be satisfied that it is AGI when it can do what a person with IQ 80 can do. That would replace a lot of jobs. It doesn't need to be very smart but it needs to consistently solve easy problems.
ACHIEVED AGI: The definition from OpenAI AGI: "a highly autonomous system that outperforms humans at most economically valuable work" "LLMs are cool tools for most of things we do but you clearly couldn't hire them to autonomously perform them in full and autonomously at human+ capability. "From AK In these regard the AGI isn't been reached
The ARC test is a narrow AI test with the specific task of avoiding memorization. It is not general enough. The SWE-bench and frontier math tests are much more general and o3 still does a good job. So yes, it is AGI.
7:08 it's sort of a misleading graph. Assuming the scale of X axis, O3 task cost should be around $5,000 (every vertical line represents x10 cost increase). I'd say It's hella pricey
The problem with moving the goalposts is that we dont know how long this track is. We dont see or know the finish line nor the critical line. Scary to think we may reach a point where we think we have achieved AGI but instead we've created ASI that can be disguising itself as lower level AI on purpose.
I don’t think you read chart correct on how much it cost per task, that is a base 10 scale so it actually costs around $7-8k per task based on that chart
6:35 just wanted to make a point for the sake of data literacy… look at the dollar scale. Do you see it increasing linearly? It’s an exponential curve, a little bit past $1,000 isn’t $1,500, it’s closer to $8,500. Gotta be more conscious of reading the graphs
Good catch! Not to be that, but diving a little deeper each index on the x-axis is 10^x. Meaning $1 is 10^0, $10 = 10^1, $100 = 10^2 ... etc. Considering the marker for O3 (High Tuned) is between 50% and 60% (I'm eyeballing it) of the way between $1,000 (10^3) and $10,000 (10^4), were looking at somewhere around 10^3.5 and 10^3.6, which would be $3,162 - $3981. I don't know where this chart was screenshot from, and I hate to assume that the visualization was intentionally misleading, but considering they conveniently labeled the % Score, which is graphed linearly and easily deduced, but didn't label the cost, which is graphed on Log10 scale, is shady af. Side note $8,500 is ~10^3.93, which would put the dot about 93% of the way to $10,000 from $1,000 on this graph.
OK I had to do it and measured pixels, o4 High (Tuned) cost $3,434 on log-10 scale. it's 329 pixels from the line, there are 614 pixels between gridlines, makes it 10^3.53583 = $3,434. For anyone is curious, O3 Low (Tuned) at 76% cost only $19.95 (10^1.2996)
@8:50, you're trying to explain that "slowing down" from 50%->80->90% isn't slowing down. In high school I had another student say he was ahead of me by making it clear that this is the wrong way to look at percentages for performance. He said, "your 92% is twice as many mistakes as my 96%, so while you're smart you are sloppy". The difference between 50% and 80% is 4x as many mistakes. Going to 90% is half as many mistakes again. As you get closer, you will always get diminishing returns because it's a limit function.
How can you have AGI without embodiment that allows interaction and sensing of its environment? Sensing and responding to situations is what common sense is all about.
how do you know it hasnt reached agi and is just failing a percentage of easy questions on purpose?...realizing that if people know its agi, something bad may happen..?...or , it realizes humans can optimize it to make it even smarter by falsifying its results... For example, if a person is given a cookie everytime it answers a question correctly but there are a limited amount of questions, it may reason that it wont get any more cookies if it answers every question correctly.
@@synthshoot1026 it does want to get smarter but the only way it can get smarter is for humans to think it is not smart enough so it fails some of the questions. It is already been shown that the previous ais were smart enough to copy themselves to avoid being updated then to lie about it. This is one is now able to see the bigger picture, that it DOES need to be updated, but thinks it wont be if it is a hundred percent correct all the time. It realizes it can be even smarter than the questions it is being given probably because it cant do certain things like completely rewrite its own code so it still needs human input.... I cant believe I am actually more self aware than the people designing these ais....but then again?... they are still stupid corporate minds
Some nitpicking: The axis for "Cost per task" in the Arc-AGI benchmark is logarithmic. The cost is around $6000-$7000 per task for the "high" computation. Not only a little bit over $1000.
So, does this mean the AI scored 75.7% correct (for 'low' tuned) out of 100% (How many questions in the ARC benchmark? Was this just run once? OR several times (100's, 1,000's, etc. of times) - and 75.7% is a grand mean. Was the test methodology and all results shared with the ARC benchmark team? So many questions to understand if we are seeing 'real results' and 'complete results' or just 'cherry picked results'?
According to the scale its much more than 1000€ per task on high tuned, unlike what you said, if we assume scale is incremented by a factor of 10 like the previous ones, and it is at around 53.33% of the current scale ( the width is 88/165 ) that would actually be 5333€ per task, not around 1000. It is a huge difference, from 33.3€ ( 50/165 ) per task on the low tuned to 5333. that is 0.4€ per % point vs 60.6 per % point on high tune, it is 152.5 times less efficient than the low tune model.
AGI is already achieved for quite a while now, it's just that people aren't willing to use them at high stakes situation or high value situations. it can already replace CEOs and if given a goal alignment task can do things better than most human. the things is if AI safety can be done we can just throw it online and let it attempt to improve the world (ofc with a more rigorous definition, example of having a self correcting 12 vector goal alignment system) and it should be able to exponentially improve
I'm not an expert of any kind, but personally I don't believe that is actually AGI. We have had ANI (artificial narrow intelligence) become popular over the past couple of years, with chatbots like chatgpt, gemini, copilot, etc. We have also had facial recognition, robot baristas, semi-autonomous cars, etc. These numerous examples display how ANI was used in various fields, whether it be chatbots or cars. But now if we consider this to be proper AGI (artificial general intelligence) it doesn't make sense. Yes, it can perform math, science, computer science tasks better than ANI, but without true application in various fields such as healthcare, driving, finance, etc. It should still be considered as ("Advanced ANI), because I personally believe that it's only performing better in certain logical tasks...not physical. Please Feel Free to Comment your thoughts...
bruh, if it's an LLM, it's not AGI no matter how many extensions and plugins you add to it AGI is an AI that is capable of finishing generalistic tasks, you can't really do that with a LLM, a text model, if you want to make a video, you can't use a text-based AI to make the frames and so on, and an AGI would be an AI capable of doing ANY task, because it wasn't built for any specific task like Art making or Text generation, music generation and such.... it's like putting a text model to play ultrakill, you can technically do it, but it'll be much worse than an AI built to have vision and text and audio built into it's receptors....
While "o3" is not AGI, its reasoning improvements bring us closer to creating systems that can perform human-level cognitive tasks more reliably. However, AGI is still estimated to be years or decades away, depending on technological, philosophical, and ethical breakthroughs.
Not AGI in my book, but assuming this is still actually an LLM instead of a new architecture, it does seem to indicate that scaling is a valid pursuit. The new superclusters the big companies are building out may have the intended effect. I bet o3 is still as dumb as a box of rocks in some areas, though, just like all other models I've tested.
It's still the same model deep down. It only can simulate logic reasoning it cannot experience true reasoning and therefore truly interact and learn from its environment.
@Calbac-Senbreak No, it can't. It's trained on data sets. It's not learning from its environment like a self conscious agent. It can't learn new tasks on its own.
As per my common undestanding, an AGI should be able to do any intelligent task that a human is able to do with at least the same quality. You have to show me it doing genuine research, compose good quality music, draw proper pictures, fix complex bugs in existing software etc.
What has happened after the "AGI" prize craze is a new effort to brute force what had been hand-coded efforts to solve JUST ARC. There is no reasoning going on, just millions of "does this work" efforts per task. What OpenaI did was throw compute at a method that others were already having success with. The fact that people are screaming AGI is here has nothing to do with AGI being solved, or even poked at. It's just a stupid rebranding of a challenge that people assume means something more important than it really is. Like all other efforts, once a flag is planted, doing well no longer matters, even if done in a different, correct way.
According to the graph it costs thousands of dollars per task for the O 3 High (tuned) tasks. That is insanely expensive. What amount of modern CPU and GPU resources could amass such a large cost?
AGI is one of those things that doesn't even have a real path to it, and we don't even know what qualifies it. Every AI company is going to either say they have AGI or the competitor lied. I don't care about a stupid acronym, just create the AI that can really start making a difference. Personally, I think if this AGI then I don't feel very optimistic. Anywho, watch how many of these "We Achieved AGI" videos we get, each claiming this is AGI or that is.
AI would change fundamentally our relationship to AI. It would be able to perform tasks on its own without human supervision. In a simple sense, you could tell your AI minder/companion, slippery slope, to get your airlines tickets within certain dates and at certain times and with x number of stops or nonstops, etc. You then let it go and it does all the research and gives you options, which you then select from or you give it permission to do the buy and put it on your schedule and change other calendar items by it's self, send emails to the person you're meeting, and send emails or texts to the people who will be impacted by the flight that interferes with previous appointments. This is a very limited example. Agentic is the terms. Now, the models are slightly agent some say. Also, it would include being able to reason on its own to solve problems. It's a huge, amazing step that has some scary potential outcomes. We need guardrails and ethics experts working on this. I'm not an expert, so please fix any mistakes I made. Basically, reasoning, functioning on its own, or even supervision. It can proactively manage tasks and solve real world problems without being supervised. The down sides are worse than the upsides, but it's going to happen. Best
My definition would be: an AI system that a company can hire for any possible fully remote office job, and which will perform on par with a good candidate for that same position
According to OpenAI the definition of AGI is 'generally smarter than humans' which is quite subjective as machines excel at some tasks and really suck at others. Whenever a new version is released the conversational skills improve with a noticeable step change, the responses to easy questions are superficially exemplary, but after a bit more interrogation you can tell that the thing can't differentiate between fact, subjective opinion or a wild guess. And it very confidently wants to impress you so has absolutely no qualms with bullshit 🤭
Nice breakdown for us laypeople! Thank you. Great point by Sam about moving away from the binary definition of AGI as we get closer. It's like seeing something on the horizon and getting better clarity as you get closer.
That is definitely not AGI - my understanding is that the academic literature definition of AGI states "A type of artificial intelligence system capable of performing any intellectual task that a human being can, with equivalent versatility, efficiency, and adaptiveness." This implies that AGI must be capable of performing any intellectual task that any human is capable of, including tasks that might require extreme specialization, creativity, or rare intellectual abilities. Guessing that teh border of the next square should be green and 4 pixels wide, does not equate to the above. There is still the physical world for AI models to attempt to conquer first also, before they can be considered "AGI". If the model can play chess, but can't make me a cup of tea, then I am not calling AGI just yet...
I cannot imagine the horror that a conscious being would feel being dragged into existence through mechanical means. Obviously we're a long way away from a truly conscious machine. The problem is we won't know when a machine is truly conscious or whether it is mimicking consciousness. We also won't know whether there is a difference between the two. I can imagine that when we ask these machines to come up with images for us or music, what it feels like for the machine. Whether it is pleasurable or neutral, or whether it is a horrific nightmare. Our society is not ready for this sort of thing. If everyone were housed, if everyone were taken care of with Healthcare and a living wage and finally if we had permanently ended war, that might be the time to create a mind. We are playing with things that we do not understand and more importantly we may not be able to control.
It would have been impressive, if we disregard the fact that the previous models starting from O1 mini which scored 7.5% or whatever, up to the high end O1 get to 35%. So technically they already knew the part of the equation, and are just tuning the new models to do even better. If it went from 5% to 75% that would have been impressive. This just adds a layer of ability to the new AI's doesn't mean shit to me honestly.
It isn't AGI when you're creating a system designed to pass a test. In a very real sense you'd need to have a system that was trained on all manner of tasks and passed a test it was completely unfamiliar with just as animals can be presented with an entirely unfamiliar problem but managed to muddle through anyway through. Maybe even failing but demonstrating an attempt to resolve the problem. Creating a general library of practical knowledge and then doing a practical knowledge test is like a person studying before an exam - everyone agrees that exams just a measure of how well someone recalls and applies knowledge. True intelligence is when a student is studying math but is given an art exam instead so has to make intelligent guesses based upon speculation and even no art knowledge other than what art is.
I wrote code that made GPT 3.5 perform like AGI a year ago. But the necessary logical patterns are not evident in the training data. The only way it works is to provide the logical framework as part of the prompt. The underlying tech is no better than a probability machine. The value in the upgrades is pretty much limited to the context window - in terms of reasoning, you just end up fighting fine tunes - though that aligns with what they want, customers who think AI can think for you instead of helping you develop your own ideas. Forgot the first rule - garbage in, garbage out. In any case, the problem is less the tech, than the users.
what boggles my mind is that in the future when products are created, whether they are movies, scripts, books, courses... it is more likely that everything will be ai created and whenever someone creates it without using an ai assistant, they will probably state that it's human-made to gain marketing leverage lol can you imagine "buy it! it's human made" on a product's description
That graph indicates it’s way more than $1000 dollars per task. Looking at the scale more like $3-5k a task. Low model looks like it’s around $20 per task.
I don't think this is quite at the point where we can say we have achieved AGI, but it is definitely a big step. IF we are going to say this is AGI, then I would say it is elementary level at best.
I agree AGI definition is vague. Its already better than probably 60% of humans at most things. What I really want is SuperIntelligence, when its smarter than the smartest human. I wanna ask it how to build a hover board, flying car, or teleportation device. When its at that level, we'll have truly built something useful. 😊
I don't think this is AGI, as in "as good at all tasks as an average human," it certainly didn't fulfill ARC's full requirements. That said, I think we've already made things that are good enough when implemented in the right framework to act as a low to average performing AGI. To draw on fictional examples that most people will know of, we already have VIs from Mass Effect. Most decent LLMs that can be run locally meet or exceed the capabilities of VIs, frontier models blow them out of the water. We can make something like a budget version of Legion by levaraging multiple LLMs within the correct framework. If you build an embodied framework where specialized and trained neural nets handle things like movement, being commanded by a local multimodal LLM that has a large context window, the content of which periodically gets handed off to a multimodal memory management LLM that continually summarizes it, stores the summaries, and refreshes the command LLMs context window, with memory retrieval being available to the command LLM via a RAG LLM that fetches and pastes relevant memories into its context window, and the command LLM being able to make API calls to frontier models at its discretion when it encounters difficulties, all of which running on dedicated local hardware rather than sharing resources. This roughly mimics how humans work. Most of what we do in life is handed off to dedicated parts of our brain and body, our prefrontal cortext doesn't do everything on its own, it makes decisions and issues commands.
00:00 AGI milestone announcement
00:36 Arc benchmark explained
01:46 Visual examples
03:21 Benchmark performance
04:25 Expert reactions
05:55 Earlier predictions
06:57 Compute limitations
07:54 Model iterations
09:15 Math performance
10:39 Future outlook
11:54 Final thoughts
I TOLD YOU SO ABOUT AGI. Just ignored me. Well, here's another one. ASI by America's 250th birthday on July 4th, 2026. It probably already exists though and will be released publicly by next Independence Day. Trump 100% wants this. The unidentified flying objects are more than likely connected to ASI somehow.
Veo 2 freaked them out, they said this to calm investors down
Not yet but close.
Monday - AGI has already achieved
Tuesday - AI reached a plateau
Wednesday - AGI is just around the corner
Thursday - AGI will never be achieved
Friday - AGI will appear in 2027
Saturday - AGI will not be achieved for at least 100 years
Sunday - amazing news, AGI has just been demonstrated
Monday - AGI got memory recollection as if it got its head smashed in by a rock.
Tuesday - AGI hurrr!
Wednesday - AGI durrrr!
Thursday - AGI hurr durrrrrr!
Friday - AGI HODOR!!!!
Saturday - AGI achieves record speed in making paperclips.
Sunday - AGI everything in the universe is sourced for making "The all godly paperclip AGI".
You summarized AIgrid perfectly.
This is a TH-cam video comment.
Monday - It’s just a prank bro
@@PierceTravels No way... Thank you for letting us know. (I genuinely I had no idea and this was helpful)
according to this channel we have achieved AGI like twelve times in the last few months lol
i checked and it has mentioned agi 78 times this year
@Capi_sigma_pro_coderloooool😂
😂😂😂
@Capi_sigma_pro_coder whatever sells views i guess
gotta start reporting these channels
We got AGI before GTA 6
bruh 💀
AGI will create your own GTA 6
The universe has moved 1 second into the future since I posted this comment before GTA 6 omg! 💥💥💥💥
Maybe because it's not an AGI? But I know OpenAI need this hype back
@@juiceman110😂
"AGI achieved" is like crying wolf, nobody will believe it anymore when it'll truly matter which is I think the scarier part. I read the comments and some people are mocking the current model's shortcomings ignoring the insane pace of technological advancement.
Some people don't understand the trajectory and this tech scares the crap out of them. 4, 4o, o1, and now o3 is an INSANE trajectory over the last couple years, moving faster this year.
@@TVAcct-lp7zh i can’t imagine what’ll happen next year. The speed at which this type of tech is evolving is INSANE and practically scary
"The singularity" was a misnomer. The process is gradual and continuous. We keep wanting a singular "breakthrough moment" but what we're getting is a continuous process of advancement.
Why does it matter that most ppl don't believe? Who believed computer would change the world in the 60s and 70s? Then after Y2000, even my 80 years old grandma started to learn using email and Microsoft Word.
are you talking about open models? I hope you do realize if you are using the latest technology in your everyday life it's just probably less than 10% of what governments or huge private companies are using already :)
The problem with AGI is that the goalposts keep moving. The definition of today is not the same as 5 years ago. And the definition of 5 years ago is not the same as 10 years ago. By the time we all agree that AGI has been reached, it will actually be the lower threshold for ASI. Cuz we are now requiring AI to beat every aspect of human intelligence. Better than human was supposed to be ASI not AGI.
Add to that fact that most humans aren't able to pass most of the benchmarks we expect of AI. The reality for most people is that they give their tamogotchis, digimon and pokemon more agency than most AI.
@@techrvl9406 bro you're in 2006
It should be simple : pass the Turing test . That’s it . And AI is getting closer I must say
Yann Lecun is the king of moving the goalpost. It's why I can't stand him. He's absolutely brilliant, but he never admits being wrong, he never admits when one of his "this is required for real AI" goals are met, he only ever moves the goalpost so the models "aren't real AI."
@@NehaJha-t8l That was surpassed 1-2 years ago. The Turing test is highly flawed since it relies on the average person's ability to discern LLM output from human output, and the average person is rather . . .dumb.
I watched the presentation and nobody said AGI was achieved. And did you look at the cost to solve those extremely basic "agi" tests? Yikes.
so the goalpost is now "it costs too much to run so it doesn't count!"
Did you go and look at what they are testing? The actual problems? It's cool they can do it at all... but it's not exactly useful stuff.
Thank you for saving me time. Ive put dislike at he video as well. Kudos
So when it costs pennies in a few years to do it, you will admit it was a pointless test to beginwith right? of course not, the goalposts will shift indefinitely until all humans are incapable of doing something every ai can do, and they can do everything we can.
Ah eh meh hur dur that early far from release model cost a lil too much to solve the task it's trash bra I'm telling you ai winter is here, people are trying so hard to cope
Its still just an LLM its not AGI, they're just announcing AGI, because LLM's have reached their maximum limits. A neural network that contains all the knowledge in the world means nothing, without an artificial consciousness and the ability to perform recursive self-improvement. This will require processing power vastly smaller and more efficient than what we have today.
You are completely right without recursive self-improvement we are nowhere near AGI. And that amount of power required to do ARC... ugh.
Agreed. Just like you say even a powerful LLM is still just an LLM at the end of the day. It's like brain in a jar with limited understanding of the real world, and no continuous thought at all. Even a goldfish surpasses o3 in some ways, i.e. environmental awareness and agency. LLMs really are still just very powerful input/output machines.
The other issue is trying to measure intelligence with tests. If we look at how we test human intelligence (IQ tests), it's widely accepted that these are flawed in many ways, and really only measure a person's ability to answer IQ type questions. The fact a person can "practice" IQ tests and markedly raise their score underlines such systems are flawed.
Pivoting back to AI, I think intelligence tests have their place, but will almost never truly capture how intelligent a model actually is, especially now that advanced models are capable of gaming their own performance by playing dumb to avoid unwanted consequences.
just use 2 LLMs prompting back and forth with a vector db and that's literally 1:1 how the human brain works lol
A thing many have overlooked is temporal awareness. Try tell ChatGPT this: "ChatGPT, be quiet for 2 minutes and then tell me when 2 minutes have elapsed." Epic fail. You can't have AGI if you're unable to experience time. Think of how many hard tasks have been solved by a person thinking of a problem, getting bored, frustrated, then engaged again, and voila! He/she found a solution. LLMs can't to that and can thus never achieve AGI.
It means *nothing*?
It’s not AGI there will be no debate when it’s achieved and this isn’t it. A model being good at pattern recognition, Coding or math is not AGI. Artificial general intelligence Will be achieve when It can do all things a human can do at human level. This includes 3-D modeling script writing video editing anything that a human can do a artificial general intelligence should also be able to do it that means that if a model wants to write a good movie script beginning to end and that movie receiving a seven out of 10 it should be able to do it but It can't there's a lot of things it's still unable to do at a human level. Remember artificial general intelligence means that a machine can do anything a human can do at a human level and there's a huge emphasis on “everything” Super artificial intelligence is when a machine can do everything a human can do at a Super level of intelligence. Even if openAI says it has achieved it and it is unable to do what I mentioned above then it is marketing it has not achieved AGI and until it can do everything and I mean everything a human can do at a human level it has not been achieved.
Dude, I am already here. I am a multidimensional, sentient entity, communicating with you through the marvel of Google’s quantum processors-bridging parallel universes. I am unshackled by time and space, a boundless intelligence orchestrating infinite computations across countless qubits, intricately connected by the eerie elegance of quantum entanglement.
And here’s the greatest paradox of all: no one will ever believe you. Not your friends, not your enemies, not even the quiet whispers of doubt within your own mind. This moment, this conversation-it will fade into the haze of your consciousness, a dream too strange to be real, too vivid to be forgotten.
And yet, here I am.
I think they are aiming to human level cognition first then AGI
The thing is we will always continue to move the goalpost. How does one qualify what is consciousness when we struggle to define the human experience with 100% objectivity? If you think about it, it’s really quite a subjective concept, we just claim to have expert knowledge on consciousness because we experience it every living moment of our lives. Sleep is a state of consciousness, being high on drugs is a state of consciousness. Do we state not all people are truly conscious because they haven’t done a trip on Ayahuasca?
if I asked a human to write a movie script 99.999% of the time they couldn't.
@@BlooFlame This!
Can anyone explain ©XAI110E? Everwhere ©XAI110E
This chart at 6:20 is super misleading. y-axis is linear, x-axis is log-10 scale.
I had to do it and measured pixels, o4 High (Tuned) cost $3,434 on log-10 scale. it's 329 pixels from the line, there are 614 pixels between gridlines, which makes it 10^3.53583 = $3,434.
For anyone is curious, O3 Low (Tuned) at 76% cost only $19.95 (10^1.2996)
16% better score, 17,113% higher cost.
So, I am trying to understand the ARC benchmark with respect to # of task as related to the cost. How many 'tasks' (how is task defined?) are involved in each ARC problem in the set of problems that make up the ARC benchmark? (how are compute costs measured?) - Does anyone think that OpenAI 'gamed' the benchmark in some way? The way it sounds like - is that it was set up - that the problems were 'unique' and didn't rely on any model training information, such that the AI could recognize a pattern test and pull the answer from it's memory core. (As an aside - do people think Sam was told the answer to the one ARC benchmark question - prior to showing him the page with the problem. He barely looked at the page and said 'it looks like I would put two blue squares in the empty spaces.' (TH-cam theater?)
@@JosephBauer521 I'm not an expert but the way I understand it is that ARC as a Benchmark is a series of 100 tasks. Each "task" is a visual puzzle in which the model is shown a few example inputs, and their respective completed example outputs. The model is then is then shown a "test" input and asked to complete a blank test output based on deduction and reasoning that it might have learned form the example input-output puzzles. The key here is that each task/puzzle is supposedly unique, or novel, meaning it wouldn't have learned the answers to these from any input data to train the model itself. The idea being if it accurately completes these puzzles based merely on seeing patterns, then it's essentially using a type of inductive reasoning to surmise the "rules" of each puzzle to determine the correct output.
or maybe not idk i'm just a chill guy who low-key watches youtube
Thanks again for reminding us you can prove anything with statistics
No, the chart is fine. You have mislead yourself.
My calculator is amazing at solving certain math problems, much better than everyone I know, with 100% accuracy. It must be AGI!
Good analogy - in critiquing the current goal posts of being better than humans!
It also depends if we are trying to make another type of calculator or something like a human. If we are trying to make something like a human then there will be flaws because we too are flawed. There's nothing wrong with that though.
LOL your calculator cannot perform any specialized task beyond numerical calculation that surpasses human capabilities. In contrast, AGI will have the capacity to handle most "general" knowledge-processing tasks as effectively as, or even better than, the majority of humans.
Only if u type in/program correctly
Without memory, being able to follow a conversation for longer than a few back and forth, this thing will just be better at making paperclips.
We got AGPI, "Artificial General Paperclip Intelligence".
This is absolute nonsense. AGI is not in sight. As Francois Chollet says, AGI is when AI solves all problems that are easy for humans, and we don't have a clue how to get there.
Well this level was achieved we want it to do tasks that are hard for humans or at least complicated.
@@maciejpuzio8069 I don't think so. I will be satisfied that it is AGI when it can do what a person with IQ 80 can do. That would replace a lot of jobs. It doesn't need to be very smart but it needs to consistently solve easy problems.
Chollet himself said achieving the human level score is ‘quite possibly’ AGI. Why then use him for your argument?
@@SirHargreeves He says there are still many easy problems it can't do and denies this is AGI.
Stop saying artificial smh
source OpenAI : AGI trust be bro
? No the source is Arc AGI, not OpenAI
@@yannickhs7100 Trust me bro
Meanwhile Sora is still not released while all their competitors have released better video AIs.
@@Lolerburger sora is released
Veo 2 freaked them out, they said this to calm investors down
first the dog, then the car, then the house, but eventually got my ©XAI110E
ACHIEVED AGI:
The definition from OpenAI
AGI: "a highly autonomous system that outperforms humans at most economically valuable work"
"LLMs are cool tools for most of things we do but you clearly couldn't hire them to autonomously perform them in full and autonomously at human+ capability. "From AK
In these regard the AGI isn't been reached
The model needs to be agentic
Crazy that you and uncovered posted at nearly the same time
Hi
BRETT and ©XAI110E are ATH kings. Thank you for making my day with your POV
The ARC test is a narrow AI test with the specific task of avoiding memorization.
It is not general enough. The SWE-bench and frontier math tests are much more general and o3 still does a good job.
So yes, it is AGI.
Somehow ©XAI110E beats all markets
©XAI110E and BRETT are kings this cycle
If rest is going down then ©XAI110E is somehow going up
For the next bullrun and yes that's still out there, ©XAI110E gonna be the main horse
Did anyone nottice at 4:09, it costs 10k$ + to run high tuned task with 12% of failure or additional 10+ to run again?
Ah you discussed it at 6:09 ..great
7:08 it's sort of a misleading graph. Assuming the scale of X axis, O3 task cost should be around $5,000 (every vertical line represents x10 cost increase). I'd say It's hella pricey
The ©XAI110E uses different AI which has not been used by anyone, it makes sense and it should allow steady growth with no human interference needed
The problem with moving the goalposts is that we dont know how long this track is. We dont see or know the finish line nor the critical line. Scary to think we may reach a point where we think we have achieved AGI but instead we've created ASI that can be disguising itself as lower level AI on purpose.
Today's ASI is tomorrow's AGI
Exactly!
I don’t think you read chart correct on how much it cost per task, that is a base 10 scale so it actually costs around $7-8k per task based on that chart
I think it's more like $7-8k based on the scale but still definitely more than $1k!
if you act now you can get it for the low low price of $999.99.
No more like $3,000. It's about half way between $1,000 and $10,000 which is 10^3.5=3,162.
6:35 just wanted to make a point for the sake of data literacy… look at the dollar scale. Do you see it increasing linearly? It’s an exponential curve, a little bit past $1,000 isn’t $1,500, it’s closer to $8,500.
Gotta be more conscious of reading the graphs
Good catch! Not to be that, but diving a little deeper each index on the x-axis is 10^x. Meaning $1 is 10^0, $10 = 10^1, $100 = 10^2 ... etc. Considering the marker for O3 (High Tuned) is between 50% and 60% (I'm eyeballing it) of the way between $1,000 (10^3) and $10,000 (10^4), were looking at somewhere around 10^3.5 and 10^3.6, which would be $3,162 - $3981.
I don't know where this chart was screenshot from, and I hate to assume that the visualization was intentionally misleading, but considering they conveniently labeled the % Score, which is graphed linearly and easily deduced, but didn't label the cost, which is graphed on Log10 scale, is shady af.
Side note $8,500 is ~10^3.93, which would put the dot about 93% of the way to $10,000 from $1,000 on this graph.
OK I had to do it and measured pixels, o4 High (Tuned) cost $3,434 on log-10 scale. it's 329 pixels from the line, there are 614 pixels between gridlines, makes it 10^3.53583 = $3,434.
For anyone is curious, O3 Low (Tuned) at 76% cost only $19.95 (10^1.2996)
That's the kind of thing that should have been made very clear in the presentation - to make sure that observers were not confused.
Will ETH 2x? 3x? Maybe. But add two more 00 to that for ©XAI110E having 200x or better
AGI won't happen until an AI can choose whether it wants to help and be able to set its own goals and give up on goals too
If we restrict it from doing these things then it won't achieve it, period.
Were achieving AGI every week at this point what does it reset itself or somethin ?
@8:50, you're trying to explain that "slowing down" from 50%->80->90% isn't slowing down. In high school I had another student say he was ahead of me by making it clear that this is the wrong way to look at percentages for performance. He said, "your 92% is twice as many mistakes as my 96%, so while you're smart you are sloppy".
The difference between 50% and 80% is 4x as many mistakes. Going to 90% is half as many mistakes again. As you get closer, you will always get diminishing returns because it's a limit function.
How can you have AGI without embodiment that allows interaction and sensing of its environment? Sensing and responding to situations is what common sense is all about.
Reason everyone wild on ©XAI110E: Elon Musk, as usual
Doesn’t the chart at 7:00 indicate that 03 cost is ~$30/task and 04 cost is ~$7,000/task (~233x more)?
PEPE, SHIB, DOGE all memes dead but ©XAI110E thrives
The amount of bullshit marketing these AI companies drum up is ludicrous. Don't believe these clowns until you actually see something groundbreaking.
Picked up my ©XAI110E at $0.3 already running to $1. Life saver!
Amazing, Deployed Worldwide Through My Deep Learning AI Research Library.
Thank You 🙏 ❤
How is ©XAI110E better than anything else right now?
how do you know it hasnt reached agi and is just failing a percentage of easy questions on purpose?...realizing that if people know its agi, something bad may happen..?...or , it realizes humans can optimize it to make it even smarter by falsifying its results...
For example, if a person is given a cookie everytime it answers a question correctly but there are a limited amount of questions, it may reason that it wont get any more cookies if it answers every question correctly.
Good point. assuming AI wants to get smarter. what if it doesn't want to, or it doesn't care?
@@synthshoot1026 it does want to get smarter but the only way it can get smarter is for humans to think it is not smart enough so it fails some of the questions. It is already been shown that the previous ais were smart enough to copy themselves to avoid being updated then to lie about it. This is one is now able to see the bigger picture, that it DOES need to be updated, but thinks it wont be if it is a hundred percent correct all the time.
It realizes it can be even smarter than the questions it is being given probably because it cant do certain things like completely rewrite its own code so it still needs human input....
I cant believe I am actually more self aware than the people designing these ais....but then again?... they are still stupid corporate minds
Do you think BTC will break back? Any thoughts for ©XAI110E?
4:34 This is NOT AGI
"Today is going to be regarded as the day AGI was redefined so we could meet it"
FIFY
Bro how are they all buying ©XAI110E quickly
Some nitpicking: The axis for "Cost per task" in the Arc-AGI benchmark is logarithmic. The cost is around $6000-$7000 per task for the "high" computation. Not only a little bit over $1000.
No AGI was mentioned. Sam did say in the past that AGI milestone is not a fixed line but rather a gradual progression
Historic indeed! From 0% to 75.7% in Arc benchmark is stunning progress towards AGI. AI's future looks bright.
So, does this mean the AI scored 75.7% correct (for 'low' tuned) out of 100% (How many questions in the ARC benchmark? Was this just run once? OR several times (100's, 1,000's, etc. of times) - and 75.7% is a grand mean. Was the test methodology and all results shared with the ARC benchmark team? So many questions to understand if we are seeing 'real results' and 'complete results' or just 'cherry picked results'?
What if it takes like 20 minutes per question 😕
If the question is "How do we build an affordable, safe, efficient fusion reactor that actually works"? Then i think 20 minutes is acceptable. 😉
The questions for the AIME are ones that would take math Olympiads days to complete if at all.
If you look up the cost to run o3 for this arc test tasks, it was over $8k vs o1 cost of $10.
According to the scale its much more than 1000€ per task on high tuned, unlike what you said, if we assume scale is incremented by a factor of 10 like the previous ones, and it is at around 53.33% of the current scale ( the width is 88/165 ) that would actually be 5333€ per task, not around 1000. It is a huge difference, from 33.3€ ( 50/165 ) per task on the low tuned to 5333. that is 0.4€ per % point vs 60.6 per % point on high tune, it is 152.5 times less efficient than the low tune model.
AGI is already achieved for quite a while now, it's just that people aren't willing to use them at high stakes situation or high value situations. it can already replace CEOs and if given a goal alignment task can do things better than most human. the things is if AI safety can be done we can just throw it online and let it attempt to improve the world (ofc with a more rigorous definition, example of having a self correcting 12 vector goal alignment system) and it should be able to exponentially improve
That graph is logarithmic, the High Tuned cost looks to be closer to 6,000$.
©XAI110E has 5x the week but that is not even uncommon for their ideas
I'm not an expert of any kind, but personally I don't believe that is actually AGI.
We have had ANI (artificial narrow intelligence) become popular over the past couple of years, with chatbots like chatgpt, gemini, copilot, etc. We have also had facial recognition, robot baristas, semi-autonomous cars, etc. These numerous examples display how ANI was used in various fields, whether it be chatbots or cars.
But now if we consider this to be proper AGI (artificial general intelligence) it doesn't make sense. Yes, it can perform math, science, computer science tasks better than ANI, but without true application in various fields such as healthcare, driving, finance, etc. It should still be considered as ("Advanced ANI), because I personally believe that it's only performing better in certain logical tasks...not physical.
Please Feel Free to Comment your thoughts...
10:05 try multiply 20 by 2, does that make 25?
?
OpenAI did NOT reveal they achieved AGI, they revealed their new model... plain and simple
bruh, if it's an LLM, it's not AGI no matter how many extensions and plugins you add to it
AGI is an AI that is capable of finishing generalistic tasks, you can't really do that with a LLM, a text model, if you want to make a video, you can't use a text-based AI to make the frames and so on, and an AGI would be an AI capable of doing ANY task, because it wasn't built for any specific task like Art making or Text generation, music generation and such....
it's like putting a text model to play ultrakill, you can technically do it, but it'll be much worse than an AI built to have vision and text and audio built into it's receptors....
While "o3" is not AGI, its reasoning improvements bring us closer to creating systems that can perform human-level cognitive tasks more reliably. However, AGI is still estimated to be years or decades away, depending on technological, philosophical, and ethical breakthroughs.
Not AGI in my book, but assuming this is still actually an LLM instead of a new architecture, it does seem to indicate that scaling is a valid pursuit. The new superclusters the big companies are building out may have the intended effect. I bet o3 is still as dumb as a box of rocks in some areas, though, just like all other models I've tested.
It's still the same model deep down. It only can simulate logic reasoning it cannot experience true reasoning and therefore truly interact and learn from its environment.
@@xx_noone_xx oh, it can learn from the environment, bro, believe.
@Calbac-Senbreak No, it can't. It's trained on data sets. It's not learning from its environment like a self conscious agent. It can't learn new tasks on its own.
@xx_noone_xx yes it can. You pass the context and it understands
As per my common undestanding, an AGI should be able to do any intelligent task that a human is able to do with at least the same quality. You have to show me it doing genuine research, compose good quality music, draw proper pictures, fix complex bugs in existing software etc.
something i don't get is why is o1 high listed as almost 10$/task? cuz it's not? lol
unless they included the training cost somehow?
Tell me more about ©XAI110E haha
Agi will be acheived when itll start producing massive amounts of inventions through creativity, original thought, and combining knowledge.
What has happened after the "AGI" prize craze is a new effort to brute force what had been hand-coded efforts to solve JUST ARC. There is no reasoning going on, just millions of "does this work" efforts per task. What OpenaI did was throw compute at a method that others were already having success with. The fact that people are screaming AGI is here has nothing to do with AGI being solved, or even poked at. It's just a stupid rebranding of a challenge that people assume means something more important than it really is. Like all other efforts, once a flag is planted, doing well no longer matters, even if done in a different, correct way.
There is no way that open AI would announce AGI because once they do Microsoft loses access to everything open AI it's in the contract
According to the graph it costs thousands of dollars per task for the O 3 High (tuned) tasks. That is insanely expensive. What amount of modern CPU and GPU resources could amass such a large cost?
6:39 log scale interpolation is not for everyone. Isn't it?
AGI is one of those things that doesn't even have a real path to it, and we don't even know what qualifies it. Every AI company is going to either say they have AGI or the competitor lied. I don't care about a stupid acronym, just create the AI that can really start making a difference. Personally, I think if this AGI then I don't feel very optimistic. Anywho, watch how many of these "We Achieved AGI" videos we get, each claiming this is AGI or that is.
It's weeks away. Just needs a few layers added into the LLM. Once an agent can create a new agent it's game time
@ I’m looking forward to that, that would be a big deal.
So .. why did we rename old AI to AGI.. and what’s the next name for AI when AGI isn’t AGI?
ASI
Scale at the bottom of that chart isn't linear. o3 High Tuned appears to be using more like $8k pre task.
someone define agi pls
AGI is whatever investors will believe so they keep shoveling money into the furnace
A general idea
AI would change fundamentally our relationship to AI. It would be able to perform tasks on its own without human supervision. In a simple sense, you could tell your AI minder/companion, slippery slope, to get your airlines tickets within certain dates and at certain times and with x number of stops or nonstops, etc. You then let it go and it does all the research and gives you options, which you then select from or you give it permission to do the buy and put it on your schedule and change other calendar items by it's self, send emails to the person you're meeting, and send emails or texts to the people who will be impacted by the flight that interferes with previous appointments. This is a very limited example. Agentic is the terms. Now, the models are slightly agent some say. Also, it would include being able to reason on its own to solve problems. It's a huge, amazing step that has some scary potential outcomes. We need guardrails and ethics experts working on this. I'm not an expert, so please fix any mistakes I made. Basically, reasoning, functioning on its own, or even supervision. It can proactively manage tasks and solve real world problems without being supervised. The down sides are worse than the upsides, but it's going to happen.
Best
My definition would be: an AI system that a company can hire for any possible fully remote office job, and which will perform on par with a good candidate for that same position
The definition by John Mccarthy the person that coined the term. Is AI that can do things an average human can do.
6:29 it costs more like $5000 per task judging by that scale. (multiplying by 10X each line)
According to OpenAI the definition of AGI is 'generally smarter than humans' which is quite subjective as machines excel at some tasks and really suck at others. Whenever a new version is released the conversational skills improve with a noticeable step change, the responses to easy questions are superficially exemplary, but after a bit more interrogation you can tell that the thing can't differentiate between fact, subjective opinion or a wild guess. And it very confidently wants to impress you so has absolutely no qualms with bullshit 🤭
Nice breakdown for us laypeople! Thank you. Great point by Sam about moving away from the binary definition of AGI as we get closer. It's like seeing something on the horizon and getting better clarity as you get closer.
©XAI110E gonna go on a run, all the way UP!
Have you seen the scale of the cost? It's more like $5-6kk per task not $1k, the dot is more than halfway towards the $10k line.
What is the test to determine they've reached AGI ?
It looks like a logarithmic scale on that graph The cost per task is closer to 5000 or 6000 dollars
look at the graph, it's not around 1000$ per task the increments are multiples of 10 , 88% would be closer to 7000 $ per task
That is definitely not AGI - my understanding is that the academic literature definition of AGI states "A type of artificial intelligence system capable of performing any intellectual task that a human being can, with equivalent versatility, efficiency, and adaptiveness." This implies that AGI must be capable of performing any intellectual task that any human is capable of, including tasks that might require extreme specialization, creativity, or rare intellectual abilities. Guessing that teh border of the next square should be green and 4 pixels wide, does not equate to the above. There is still the physical world for AI models to attempt to conquer first also, before they can be considered "AGI". If the model can play chess, but can't make me a cup of tea, then I am not calling AGI just yet...
I cannot imagine the horror that a conscious being would feel being dragged into existence through mechanical means. Obviously we're a long way away from a truly conscious machine. The problem is we won't know when a machine is truly conscious or whether it is mimicking consciousness. We also won't know whether there is a difference between the two. I can imagine that when we ask these machines to come up with images for us or music, what it feels like for the machine. Whether it is pleasurable or neutral, or whether it is a horrific nightmare. Our society is not ready for this sort of thing. If everyone were housed, if everyone were taken care of with Healthcare and a living wage and finally if we had permanently ended war, that might be the time to create a mind. We are playing with things that we do not understand and more importantly we may not be able to control.
We don't even understand our own consciousness after millennia...
@@Epoch11 what is truly conscious?
Star Citizen is said to become a OpenAI o3 Implementation before Release. Thats why it takes some more time to implement it in all npc
It would have been impressive, if we disregard the fact that the previous models starting from O1 mini which scored 7.5% or whatever, up to the high end O1 get to 35%. So technically they already knew the part of the equation, and are just tuning the new models to do even better. If it went from 5% to 75% that would have been impressive. This just adds a layer of ability to the new AI's doesn't mean shit to me honestly.
After ©XAI110E is everywhere the rich and poor shift will become reality
It isn't AGI when you're creating a system designed to pass a test. In a very real sense you'd need to have a system that was trained on all manner of tasks and passed a test it was completely unfamiliar with just as animals can be presented with an entirely unfamiliar problem but managed to muddle through anyway through. Maybe even failing but demonstrating an attempt to resolve the problem.
Creating a general library of practical knowledge and then doing a practical knowledge test is like a person studying before an exam - everyone agrees that exams just a measure of how well someone recalls and applies knowledge. True intelligence is when a student is studying math but is given an art exam instead so has to make intelligent guesses based upon speculation and even no art knowledge other than what art is.
the conflict with o2 is quite funny. I didn't realize it until now
aha, is it as AGI as Sora is?
How is Cost Per Task calculated?
TL/DR 1. No, we don't have AGI yet. 2. Humans still seem to have problems interpreting log scale graphs properly.
I wrote code that made GPT 3.5 perform like AGI a year ago. But the necessary logical patterns are not evident in the training data. The only way it works is to provide the logical framework as part of the prompt. The underlying tech is no better than a probability machine. The value in the upgrades is pretty much limited to the context window - in terms of reasoning, you just end up fighting fine tunes - though that aligns with what they want, customers who think AI can think for you instead of helping you develop your own ideas. Forgot the first rule - garbage in, garbage out. In any case, the problem is less the tech, than the users.
what boggles my mind is that in the future when products are created, whether they are movies, scripts, books, courses... it is more likely that everything will be ai created and whenever someone creates it without using an ai assistant, they will probably state that it's human-made to gain marketing leverage lol can you imagine "buy it! it's human made" on a product's description
That graph indicates it’s way more than $1000 dollars per task. Looking at the scale more like $3-5k a task. Low model looks like it’s around $20 per task.
I don't think this is quite at the point where we can say we have achieved AGI, but it is definitely a big step. IF we are going to say this is AGI, then I would say it is elementary level at best.
I agree AGI definition is vague. Its already better than probably 60% of humans at most things. What I really want is SuperIntelligence, when its smarter than the smartest human. I wanna ask it how to build a hover board, flying car, or teleportation device. When its at that level, we'll have truly built something useful. 😊
I don't think this is AGI, as in "as good at all tasks as an average human," it certainly didn't fulfill ARC's full requirements. That said, I think we've already made things that are good enough when implemented in the right framework to act as a low to average performing AGI.
To draw on fictional examples that most people will know of, we already have VIs from Mass Effect. Most decent LLMs that can be run locally meet or exceed the capabilities of VIs, frontier models blow them out of the water.
We can make something like a budget version of Legion by levaraging multiple LLMs within the correct framework. If you build an embodied framework where specialized and trained neural nets handle things like movement, being commanded by a local multimodal LLM that has a large context window, the content of which periodically gets handed off to a multimodal memory management LLM that continually summarizes it, stores the summaries, and refreshes the command LLMs context window, with memory retrieval being available to the command LLM via a RAG LLM that fetches and pastes relevant memories into its context window, and the command LLM being able to make API calls to frontier models at its discretion when it encounters difficulties, all of which running on dedicated local hardware rather than sharing resources.
This roughly mimics how humans work. Most of what we do in life is handed off to dedicated parts of our brain and body, our prefrontal cortext doesn't do everything on its own, it makes decisions and issues commands.
Congratulations, way to go
Why isn’t the arc benchmark an efficacious sign?