Check out Sabine Hossenfelder's video here -- it is a great overview of AlphaGeometry, I just respectfully disagreed with that one part th-cam.com/video/NrNjvIrCqII/w-d-xo.html
This brings back some memories of chasing the angles endlessly during a competition and wondering when to stop. Just to realize later that the solution is simple using a lesser known theorem. I hated geometry problems.
this is what makes me hate and love geometry problems at the same time. kind of a toxic relationship. I suck at them, but they're always so easy, it's just that they always require this one theorem that makes the problem trivial but you don't know it until you do... I feel that algebraic problems are more fun because there are often many ways to solve them not necessarily by brute force but by brilliant bit of ideas that lead to a solution. Where, in geometry, I don't feel like I can just invent a theorem out of thin air like that, but oh well, i'm not so good at geometry so anyways |:
@@remotepinecone It knows a foundational set of results, which it then used to deduce valid theorems (known in the paper as "synthetic theorems"). Collectively these are the results it uses to make inferences in problems.
I mean.. it's a general search algorithm: - the "logic" part has the edges of the graph and - the "creative" part picks less random directions than brute-forcing the whole graph. So while it does generalize to basically all other fields, as you said its utility and insights are questionable for harder/novel problems. The creative part would be really interesting if it offers a good true positives rate, to maybe offer in the future novel ideas to problems we are stuck on
More training can help it replicate "beautiful" proofs, even if it's hard for us to explain what makes something beautiful- as long as we can label which proofs are beautiful, we can theoretically train an AI on those labels. (technical details: train a reward model on a set of human "beautiful proof" labels, and use that reward model to finetune a LLM using RLHF)
I think you misunderstand just how vastly more competent human brains are. Human brains cooperate, adapt, and compute over months to years to solve a hard problem. An AI that runs in even a few minutes is just not even logically close to the computational resources and inherent efficency of what humans do. We cooperate, mentally adapt, and do it over months to years. That is so ungodly powerful. The idea that AI will help solve novel problems is a drastic misunderstanding of how insanely small its inherent computational ability and efficency is compared to a group of humans.
@@Dogo.R AI scaling laws and biological anchors disagree. Even the most generous estimates place AI as using as much compute as human brains within decades at most. Already, AI LLMs have >1/1000 the number of parameters as we have synapses, and that gap can be closed quickly with exponential growth, larger monetary investments, etc.
I think that your focus on the time scales is a proof of how deeply you are mistaken by putting value in how long it takes to solve the problem. There is abundance of historical evidence that the logic is inverse. Or, take a look how could apple and ms pass such a great well established company as IBM. That's a sunken cost fallacy. Human civilizations wouldn't collapse if your logic was true. If you see something that is able to compete and is way younger and uses less resources it means it's more effective and closing the gap. Also if you think computers are unable to cooperate in problem solving... Well, yeah... Not to mention there groups of people behind those computers. They are still tools. Just as with industrial revolution humans elevated physical strength using tools. This has potential to elevate intelligence.
I think the British pronunciation of "upsilon" and "epsilon" highlights the etymology of the names. Long ago, the Greeks forgot the names for ε, υ, ο, ω, and some other letters. So they just called them by their sound, the way we do vowels in English, like we just call E "ee." But as some of these vowels started to sound like each other, they added words to the names to distinguish them, like "little o" for omicron and "big o" for omega, similar to the way some Spanish-speakers call b and v "be larga" and "ve corta" or similar. In the case of ε and υ, "psilon" meant "bare," on its own, so an epsilon is a bare e. This distinguished it from αι, which had the same sound. Similarly, upsilon is a bare u, as opposed to ου.
not quite-it's bare υ as opposed to οι, which by then had come to sound the same. thus, at that time, ὗς "pig" and οἶς "sheep" were homophones pronounced "üs", because of which both being replaced by χοῖρος and πρόβατον, respectively, in the bible
> I think the British pronunciation of "upsilon" and "epsilon" highlights the etymology of the names. It doesn't do this any more than the non-RP / American pronunciation does. Indeed, the Greek pronunciation of "psilon" has the stress on the second sylalble: "psi-LON", not "PSI-lon", and regardless, it's definitely not "SIGH-lon"/"ep-SIGH-lon". The RP pronunciation is merely a result of the application of British/English phonics rules to the English spelling of "psilon"/"epsilon". See also "pi", "phi", "chi", which in Greek are pronounced "pee", "fee", "kee" (well, /xi:/, but close enough), respectively.
@@JivanPal It's not eps + ilon but e + psilon. Neither the British nor the Americans pronounce these letters the same as the modern Greeks or the ancient Greeks, but the way the British pronounce them emphasizes the two morphemes, whereas the American pronunciation does not.
@26:45 I'm so relieved you stressed the point of beauty! And also fun! My parents were mathematicians who did research together so I grew up hearing all the language at the dinner table. I had no patience for algebra, and still don't, but in my 50s my mathematician-husband helped me realize I can visualize math in my head (geometry, topology.. an if it isn't a visual math problem I'll turn it into one!). It helped that I already knew how to "talk math." Now we have several published papers. It's sooooo fun. I literally only do it for the joy, and also the hope that some day physics will find a use for our math - we already found a deep connection with light polarization. Anyway, now if I'm not busy with anything else I'm doing math research. So yeah, I'm a little weary of the AI. We're hoping it can eventually take the place of what Bruce does in Maple, cuz that is usually a pain.. but not take the fun away. 🤔
I really appreciate you highlighting the point "the AI wasnt designed to make beautiful proofs". Too often in the public discussion around AI, people leave out the fact that the AI has been designed in the first place. It has a reward function - for example, LLMs are designed to mimic human conversations. But this mimicry doesn't require a deeper understanding in order to succeed; it can just cleverly explore the space of possible conversations, in order to see what sounds most like a human. Similarly, here, it is trying to cleverly explore the space of known problem solutions, in order to create a new result. Which is why it looks that way - it always seemed to me to have a lot in common with the "press the middle suggested word on your phone keyboard to write a sentence" game. Or Clippy. At least for this flavour of artificial intelligence (which I guess is a name that is sticking now), it seems difficult to see how it can develop novel solutions. It might be useful in order to fill in the gaps of existing fields - e.g. find all possible consequences of a set of theorems. But it seems like making the leap to inventing a new field will be beyond it. Not to mention that the usefulmess/validity of any theorems it does come up with are suspect - as mathematicians, we often _need_ to appeal to authority, because there is too much mathematics for us all to hold in our heads. The way we square this is by knowing that with enough effort, we could reproduce/have explained to us the results we need. But because AI is trying to maximise its reward function, it might not actually be generating valid proofs - each proof would need verified. And if its proofs are thousands of steps long, and not guaranteed to be correct, then how can we trust its results? This is a huge problem that needs to be resolved before it can be useful - this is where the application differs from that of e.g Deep Blue, or AlphaGo. We might be surprised at the way an alien plays chess, but we can at least verify whether or not it has beaten us.
As @gabitheancient7664 says, the proofs can be automatically verified if the "left brain" part only uses deduction steps that are known to be valid. It's fairly straightforward to guarantee that if it can find a proof, it's a valid proof.
@@NNOTM yeah having difficulty verifying proofs or making up one that's valid is a very human thing we aren't very "logical" by nature, but computers tho
Dude, you literally have no idea what you're talking about. Appeal to authority? What a joke. Proofs in mathematics are verified through formal logic. LLMs might not be efficient at finding proofs yet, but the process of verifying mathematical proofs is very well defined. All it takes is a large amount compute. As for your point about mimicry, it is literally proven that ANY sufficiently large neural network is able to approximate any function to any degree of accuracy. Since any deterministic process (ie. same inputs = same outputs) can be represented as a function, so can your brain and the brains of every single human on earth. The only way it can be false is if you deny the assumption that the laws of physics work the same no matter where you are in the universe.
@@Jackson_ZhengAll the points that you mentioned are pretty much in agreement with what the OP said. Not sure why you act so triggered by it. Do note that modern theorem provers are not fully automated, and still require a lot of human intervention (e.g. coq, Lean, Isabelle HOL, etc..), which in large part is what an LLM is supposed to substitute (and it is far away from it, just ask LeCun). And just because something can be approximated arbitrarily close, does not necessarily make the approximation useful, feasible, practical and/or good. That's something a lot of people miss when they say that LLMs just need to be scaled up to achieve AGI. No, we are missing something very fundamental besides mimicry that would enable an AI model to accurately represent a reasoning machine.
I do think it's important to note that for a lot of mathematical problems just knowing what the answer is might give a human mathematician a better starting point to figure out more elegant and more meaningful proofs even if the AI generated one was a over complicated mess.
Exactly. Brute force calculations are used to prove stuff in math all the time. Computers revolutionized many fields of math just because calculations could be run to far faster by computer than by humans, even if the methods were exactly the same. Computers also dont make mistakes unlike humans. That doesnt mean computers are always useful for every situation in math, but even the relatively "dumb" computer brute force methods are insanely useful. This type of AI makes brute forcing stuff like geometric problems way more doable as well, and that can be very useful for certain projects. Is it going to be useful for everything geometry related? no. But itll be useful for some stuff, and thats what matters.
I agree. Even if the brute force proof cannot be refined into a more beautiful proof, knowing a result is true will allow a mathematician to focus their efforts in that direction. If we knew the Riemann hypothesis were true, we wouldn’t have to waste effort looking for counter examples or trying to disprove it.
Also, for important results (like the Pythagorean theorem) mathematicians will often come up with many different proofs. This is not because we need reassurance that the result is true, but because proving the result in new ways may provide new insights. A brute force approach may not be insightful, but it does tell us something.
@@mrtthepianoman another thing to consider as well is, if a brute force solution ISNT possible, then it means there a more novel approach is required. That can also help narrow down where to look or what kind of ideas to be thinking of.
tbh the a fun part of a math olympiad is the fact that you were able to solve it, a human person with a physical brain, other body parts, and limited time, and that needed you to develop some abilities (I mean my main interest in math olympiads is developing problem solving skills, almost a pedagogical objective) that seems almost as impressive as saying we could make a machine that beats people in long jumping, like yeah amazing technology but that's not the cool part of seeing someone jumping really long
This has far reaching implications right, someone who was previously unable to solve such problems can now use AG and boom he can work at the same level as a person experienced with math olympiads. After working on your skills for years you get beaten by an AI, that is bad to me.
@ram527 We already have (non-AI, non-heuristic) algorithms for solving olympiad and olympiad-adjecent problems like inequalities (many inequalities can be bruteforced by multiplying everything out and applying Muirhead) and integrals (Risch algorithm). Geometry also had these sort of algorithms: analytic geometry is popular brute force approach that is commonly used by olympic contestants. Alpha Geometry is only unique in a sense that it is a form of machine learning algorithms, but it is neither the only nor the first thing that can algorithmically solve math olympiad problems. It'd be surpriced if it affected math olympiads in any way.
@@mironhunia300 there's also combinatorics and soome nt problems where you could theoretically read the question and just put it in your computer and test all cases, the fun part is having the creativity to make the cases very small and solve it by hand
I'm not surprised that this method has results, as I've always thought that one of the best applications of LLMs is providing leads to subject experts that can follow up on them. That subject expert could either be a human such as a lawyer or a bespoke program as is the case here. LLMs are essentially playing the role of general-purpose heuristics to prioritize what nodes are explored by the subject expert in what order. AlphaGeometry is just the first notable instance where a computer program played the role of the subject expert.
Well put. It's important people understand that this "subject expert", what they call a deduction engine, is not an AI model; or at least not a machine learning model. It's a more traditional sort of computer algorithm.
I don't think it will give you a solution at all. This type of algorithm should be able to throw out any hallucinations generated by the language model portion of it. It shouldn't ever give a wrong answer, but it won't always find an answer. Worst case is it ends up in some endless loop.
@@iankrasnow5383everything makes mistakes , top computers , and even the best chess algorithms which have been more developed than math algorithms sometimes make mistakes , no matter how narrow the probability is its never zero
I think our ability to solve contest mathematics with computers has always been bottlenecked by our ability to formalize the mathematics involved, rather than our ability to develop new solving algorithms. Areas that have been formalized (e.g. algebra, calculus, and probability) have been child's play for WolframAlpha and the like for decades. There have always been a countable number of statements that can be proven from any given setup, and only a finite number of them that can be proven in a fixed amount of steps. Computers' abilities to explore several more orders of magnitude of that space is easily going to overpower humans' abilities to draw on experience and pattern recognition to explore the right paths. This is especially true given our ability to codify more and more of our intuition into these programs.
I love this. I, for one, don't feel like AI has to be elegant, perfect, and utterly mind-blowing at first. I think a lot of self-proclaimed "nerds" (for the lack of a better word) reward being amazingly good as a requisite for sharing anything on the internet, as opposed to consistent, often imperfect, growth (which I think is the more realistic progression toward anything good)... just like how people will nitpick on every youtube video for not being perfect, we'll nitpick AI and find a million reasons that "it's really not that impressive if you think about ..." I'm more excited that we even arrived at this point. It's a step in the right direction toward having an amazing and creative logical supplement to human intelligence. What a time to be alive! Love these videos, I love that you took the time to do a deep dive and I'm super interested in more breakdowns. Subscribed!
Rename this video to "AI can solve geometry, buuuuut...", and do thumbnail like "left: human - 12 steps vs right: AI 49 steps, and draw simple proof on left and complex proof on right" and you will get a ton of views. I thought your video will be just paper explanation, and not your deep look into the theme. Please search a new title and thumbnail, that express that core of your video. It's really good, but current thumbnail is not that catchy.
@@herobrine1847 it's not only about playing algorithm, it's about reaching certain people. I read every article and wathched every video about AlphaGeometry, because I obsessed with new ML advancements. At first I thought that I will learn nothing new from this video, but I started to watch it by mistake and pure chance. So, there are may be a lot of people who will not watch this video, because they already watched Yannic or Sabrine and thought like me, but they want information from this video. But my variant of a thumbnail can boost their interest.
@@optozoraxwhile true, this thumbnail appeals to a different audience of people, your suggestions just shift the audience, not necessarily expanding it.
There's an epistemological problem here as to the "beauty" of a proof. Turing's supposition, in respect of what we now call LLMs, was that perfect mimicry is indistinguishable from some other immeasurable sign of agency, to the extent we know when LLMs fall down, in the strictest sense they are not passing the Turing test. I suspect that (rather quickly) AI will acquire enough learning to meet our indefinable sense of 'beauty' in a proof - if educated mathematicians are able to 'know it when [they] see it', there must be some set of insights, available to any organic or inorganic reasoning machine, that would include 'beauty' in respect of proofs (or for that matter general reasoning or 'creativity')
It's not clear to me that Turing believed this. What he certainly believed is that many people naively believe they know the contrary when they certainly don't (that there is for certain some elegant test). If you have such thing in hand, you should be able to distinguish mimicry from the real thing. If we can't distinguish the "real" thing from mimicry, then we are all wet in thinking we know what the "real" thing looks like. When the computer passes the Turing test, we don't actually know it only uses mimicry. It might actually be doing the real thing the same way we do the real thing. Unless you think we mimic ourselves to prove our own intelligence.
When I was taking AI in college back in the 1980s (yes, I'm Old (TM)), one of the neat programs they talked about was Automated Mathematicians (see wikipedia), which starting from real rock bottom definitions about sets and used heuristics to generate interesting lines of inquiry, eventually recapitulating Goldbach's Conjecture (although reading through these things now, it may be that the author overstated the case)
I've always thought that for a computer to be good at math, it would need to be able to gauge how close it is to finding a proof. A chess computer is strong because because it knows wether or not the position is good/winning even if it hasn't found a winning sequence. In other words, the math computer would need to know how easy/hard something is to prove with respect with what it knows, even if it doesn't know the proof. Now that would be impressive.
Now, I also that there is some hope for AI to discover meaningful things and not just prove give propositions. Given a dataset of results with their proofs we could try to identify results that would significantly simplify the existing proofs. However I don't really see how an AI would be able to define useful new objects, which is th richest type of math innovation. This amounts to the AI enriching the language it is working with.
the thing i notice is that it doesn't seem like the AI part of this is very important. i'm pretty sure it would be easy to code up a conventional program to play the part of left/right brain. it feels more like another thing trying to capture the AI craze than a real advancement.
Yeah. I imagine the AI helps somewhat mainly in directing the results toward the actual answer. Effectively, all you need is some measure of distance from the current position to the proof and it's just like, a basic pathfinding algorithm. I imagine the actual practical use of this system though would be to just integrate the part which is actually consistent into programs mathematicians use, so they can choose what new things to add while the computer automatically chases angles and does the "dirty work". Which, means that you're effectively throwing out the AI part in favor of a real intelligence and just using the already existing proof checking techniques with a bit of automated search applied.
@@electra_ I just browsed the paper, and you pretty much hit the nail on the head. The actual deduction engine is not based on neural networks. It's based on methods that were designed decades ago and are human-understandable. This kind of program just outcompetes humans in the way computers always have: by doing a lot of logical operations very fast. The major advance made was figuring out a way to train the language model, since there isn't enough training data. They avoided this problem by algorithmically generating millions of problems and their proofs, and then training the language model on that. AlphaGeometry isn't better at the actual problem solving part, it just has a better language engine for converting a human-readable problem into a usable form, and then judging each attempt based on heuristics. The language processing part isn't doing anything that would be difficult for a human. The deductive engine is, but it's just a traditional algorithm, not an "intelligent" algorithm A trained human with the benefit of such a deduction algorithm on their computer would vastly and easily outcompete AlphaGeometry. A general LLM like GPT without any deduction engine would not be able to solve these problems at the current time.
The AI part just seems like an interpreter that gives instructions to what is just a maths program. That potentially has uses and is sorta cool in the general sense that LLMs enable computers to more directly tackle problems you'd need a human to input before but also like all LLMs it isn't doing anything a human couldn't do and it does it at a fairly mediocre skill level. It's really just the general issue that current AI seems like a solution in search of a problem that isn't just producing spam.
I think what this does sort of show is that in order for an AI to be actually useful in a specific context, especially one where you need to be objective, you sort of need to heavily restrict the AI by placing it in the middle of a larger system, and have components that can validate its input.
The only difference between this and brute forcing Sudoku with all the codified inferences is when you have to add lines to the diagram, and there's an inexhaustible supply of novel lines to mess around with, and you could end up chasing angles forever, and never arrive at your desired conclusion. If it was training on past Olympiad problems, then it has a good statistical model of what kinds of lines you might need to add for the kinds of problems that test tends to pose. And then you're pretty much back to brute force again, because the space of lines you might need to play with is relatively closed relative to your statistical model.
My question for the creators of this AI is the same as for those that generate text or images: why do you create AIs that do the jobs that people want to do?
Progress. Why we discovered the fire? Why we created/discovered math? Why we create physics? Why we start organizing in society? Why we look up at the sky and started questioning the reality? Why we created the wifi, internet, medicine and all other things? AI has already been use to decodify and represent in 3d models for the first time a molecule, and this help and save so many people, AI are being used in nuclear's fusion reactors to produce more efficient and clean energy . The progress is beautiful if used correctly.
5:22 The simplified version is sufficient because if the three circumferences were distinct, their pairwise radical axes should intersect, but they are the three sides of the triangle, which have no point in common
When I did math comps we called this approach "geometry bashing". When you have no idea what else to do, draw some lines and work out the angles, see if you get anything useful. The most generous interpretation is that AI has mastered the worst technique high-schoolers use. And that's the only tool in their toolbox. Obviously it's very cool that somebody taught an AI how to geometry bash, and it suits computers' strengths fairly well, but this is leagues away from anything generalizable.
It makes perfect sense if you imagine an evolutionary algorithm behind human problem-solving and learning. Something generates lots of mutated procedures, tests them, mates them, reproduces them, and then through Darwinian (or even Lamarckian) evolution, comes up with ways of proving and/or learning stuff. As the Wiki for the No Free Lunch Theorem says, "The 'No Free Lunch' Theorem argues that, without having substantive information about the modeling problem, there is no single model that will always do better than any other model. Because of this, a strong case can be made to try a wide variety of techniques, then determine which model to focus on." Through optimization (again, using something like genetic algorithms) the system learns better and better ways how to prove things and how to learn.
To be fair, "International Olympiad with a calculator" is a different category of competition from just "International Olympiad". And a symbolic deduction engine is basically a very powerful calculator. Still, even in this other category, it's an impressive achievement.
The brute force attack of any problem presented to AI does have the simplicity of use very base, core postulates. Where a human would use multiple time-saving theorems that each contain their own elaborate proofs, the lack of using or identifying these upper level theorems to solve a problem doesn't make them less real. Does brute force like this take longer than using proved theorems to short-cut the system? yes. Does such a longer process matter in a time scale that computers operate on? Not always, but sometimes it does. To make this AI Math algorithm faster, and more intuitive, programmers WILL be able to program future versions to use more and more proven theorems... HOWEVER... Is AI able to invent, or even know to invent, new theorems for future stream-lining? Not at this time... but that's the goal isn't it.
I also love the optimistic view of the video - that this could be a worthy foundation. This is the "messing around in the dark" part in coming up with a solution. If you could build on the solution - simplify it - connect it to higher level concepts in other proofs - you might start raising its beautification score in the answer. Then you'll get papers saying "our new solution averaged 0.26% more beautiful solutions than other leading AI mathematicians over a course of 20 standard proofs. ;) Kudos for this video shouting out that we SHOULD build on this solution and we SHOULD have beauty as a metric. As an aside, "I can't describe why it's beautiful, but I know it when I see it", is absolutely the hallmark of a neural net, so you must be using one in your head. :)
Great video, I really loved the insights into why we do proofs, I definitely agree with what you were saying about the beauty of proofs. For me Is if I could have thought of it myself? As in could I guess the next step of the proof or have an idea of the result you're trying to get to without already reading through the proof. AI just randomly works out enough information to try to guess of what the inner workings of the result which it does with enough trial and error, maybe this is the way a human could get to the answer but it isn't satisfying. Seeing the underlying structure of a beautiful proof allows us to abstract those ideas used in the specific proof to other results in the same beautiful ways. To me that's what mathematics is all about seeing those connections between not necessarily intuitive topics but being able to construct a framework where those connections do become intuitive, that's why results like Fermat's last theorem are so satisfying...
This is all true, and yet I feel that an AI proof, even a long, ugly, unintuitive, meandering proof - as long as it is probably correct - would serve as a very solid foundation for a human mathematition to know that a result is achievable and thus refine-able into a beautiful, concise, intuitive proof that does everything you could want. I feel this way about AI in every field - it replaces brick-makers so that everyone who used to make bricks can focus on making walls, Wall-makers can make rooms, room builders can make buildings and those used to making buildings can start to design neighbourhoods. It frees every thought worker to stand on a higher foundation and increases the scope of what they can do. So what if AIs start replacing the need for mathematitions to find proofs to simple problems; all this will do is free up those same mathematitions to find proofs for entire systems of related proofs instead of doing them one by one - which I think further improves math in gulps and bounds instead of sips and steps. This is my view on why I don't mind AI making static images; I expect the artists who used to make static images to instead shift their focus to coordinating/creating collages, videos and interactive museums of static images; AI that makes music - great. I expect musicians to be freed from making sounds to conducting orchestras, running virtual symphonies and theatres with many pieces of music. As an analogy: If farmers no longer have to plow because they have tractors, I expect the farming world to improve and the role of farmers to become coordinating different tractors working together and that this is actually an improvement in the role of farmers and their scope rather than their replacement.
21:58 well, deep blue actually lost to Gary Kasparov the first time they played, but its search algorithm was improved and it beat Kasparov on their rematch a little while later
Always gotta be carefull about almost all "tests" as they go through human weaknesses that reality doesnt. I'll explain. For example humans can remember insane amounts of things... but remembering a few things quickly and recalling them a bit after is hard. And its WAY harder if you ask them to use their brain for other stuff in between. And even harder if those things dont just need to be remembered but need to be used in the proccessing. Geez they are going to have an extremely hard time. Their familiarty with the concepts has to be insanely good to efficently encode and hold that information in such an on demand way and while asking them to leave brain power to do computation on it. Computers can save data, use it for logic, then rewrite it like its litterally nothing. This results in almost all "tests" being vastly favored towards computers... because most tests are lots of small problems that you need to load into your brain, do work with, then compute and answer, and repeat. Which is just not like normal real life problems that humans are good at. Real live problems you spend hours and hours and hours and months and months and months working on. And along with other humans and other stimuli. And with tons of hours of sleep to restructure the brain for being better at solving the problem. Almost all "Tests" are very much not like reality AND and very much favored towarda computers and their memory model. Only tests that are very long term act like reality. Reality is humans solving problems over long periods of time.
Also for the math topic. Think about mathamatica. Is mathamatica "a big step towards super intelegence"?... I mean its an ungodly powerful tool for math. Anyways just a seed for thought.
Honestly I find angle chasing done well extremely beautiful and creative. In fact, more creative than the shortest solution You gave. Just my honest opinion. Beauty lies in the eye of beholder. Does it enlighten? Of course it does. It introduces all those angles, giving a nontrivial parametrization of the layout of the problem in question.
I think you’ve hit the nail on the head with your observation about the aesthetics of mathematical proofs. I think current AI tools are a long way from ‘understanding’ beauty, and I suspect that in mathematics, as in other domains, they will show a lack of a creative spark, producing work that is derivative and soulless, with proofs that fail to capture our imagination.
I bet they can cut out the LLM and replace it with a random chooser. LLMs are expensive to run and I feel like the random chooser would have done just as well.
Yes. Deep Blue couldn't be easily turned around to play Go. But AlphaGo was quickly turned around into AlphaZero, which didn't even need the examples of expert play. Just given the rules of various games (Go, Chess, Shogi) and a day to study, it could play them at expert levels. That's more typical of what to expect for the development of this system.
To be fair, deep blue had to basically be “tuned” to play against Kasparov while Kasparov was not allowed to see how Deep Blue plays to develop his strategy.
@@realGBx64 No need for the "to be fair"... that's the point. Because of tech limitations (hard and soft), Deep Blue was very specialized to the task (special chess chips and programmed knowledge). The special hardware in modern systems is for general deep learning calculations, and the programming for AlphaGo itself was not an expert system (the programmers weren't great Go players, or relying on such knowledge to tune things). You don't have to work at the low levels to do powerful stuff anymore.
It was an ego war. IBM had a huge ego about being the first computer to crush the world champion, and Kasparov had a huge ego in being the last human to withstand this incursion. When they negotiated the terms of the match, all this was taken into account. Kasparov had a calculation that computers wouldn't be able to solve the problem of long tactical horizons in that time frame, and he felt he could steer the program into positions where this blindness would prove fatal. He got the computer into precisely such a position, and then it the chose the move he was convinced it couldn't find. This crushed his confidence about winning, and he made some blunders over the board subsequent to this. Later he sued IBM to justify how the computer had been able to make this move, and they came back with the rather unsatisfying explanation that the computer had crashed, and that a safety system had randomly picked that picked that move so that the computer didn't default on the whole game. This makes no sense to me as a safety system, since computer chess algorithms have long had a structure where you solve to a certain depth, order the available moves accordingly, then solve again to a higher depth, until you judge that your improvement won't justify a further time investment. This tends to be the best way to do it because the transposition table makes the repeated analysis rather inexpensive, and the successively improved order of evaluation makes the alpha-beta pruning more effective. When you do that, you always checkpoint the best known response from the last level of analysis, so there should always be a good checkpoint available, and you should never have to resort to a random move selection (unless you crash on the first ply at the first depth, in microseconds, suggesting you should start again and hope for better). I didn't actually read the details of this imbroglio, but I collected a few points from much peripheral discussion. I've also viewed an entire DVD documentary where Kasparov is allowed to litigate whether IBM cheated. It seems unlikely IBM needed to cheat, the program was incredibly strong, but the final explanation from IBM was not a good look. In any case, Kasparov was doomed in a year or two at the rate of progression, and it was just a matter of time in any case.
AlphaGeometry doesn't work much like AlphaGo/AlphaZero. Those are "simple games" in that they have simple rules and a limited set of moves. AlphaGeometry is a transformer based language model (like GPT) connected to a hand-coded "deduction algorithm" which doesn't use machine learning. The language model was trained on millions of computer generated geometry problems/proofs. This type of program is only as good as its human-coded engine allows it to be. AlphaGo is much simpler and more elegant, which is why it's so powerful. It's a machine learning algorithm that learns through play rather than through extensive tree-search methods like older chess engines. AlphaGo has limitations too. It tends to be strong against the world's best players, but it can be overwhelmed by rudimentary strategies that would never fool a good player, but haven't come up during its training. For instance, a year or two ago, someone discovered that a simple technique of surrounding AlphaGo's tiles, letting them win every time. (i forget the details since I don't play Go). Exploits like this for fooling adversarial neural networks will always be a problem.
@@vib80 But that's exactly what this is though, the actual AI portion of this is not the one doing the math and actually figuring out the proof. The AI here is just an interpreter that also tries to make suggestions for what to do next and then lets a totally normal math program just sort through everything until it hits a dead end. So they can't actually generalize this to other problems until someone figures out how to make an AI that itself can do math, as in not an AI with a CAS module bolted onto it. Until then this is basically how a lot of AIs in strategy games have worked where there are two seperate systems that pass ideas back and forth to each other in order to limit how long the AI spends looking for the next move.
It's hard to comprehend the true difficulty of phrases like "aligning AI with human values" until you are faced with real situations like this. Any human value I could brainstorm would have been superficial compared to so human as the *soul* of mathematics or the cultural necessity of beauty.
for the first problem i might be wrong here but cant you split the large triangle to 3 triangles, OAC, OAB, OBC, OAB is 90-x, name angle OAC a and angle OCB b, since its a circle OC=OB=OA so OBC=b, OAC=a, triangles must sum to 180 CAB+ABC+BCA = 180 (a+90-x)+(b+90-x)+(a+b) = 180 2(a+b)+180-2x=180 2(a+b)=2x a+b=x since BCA = a+b, BCA=x
AI makes me worry about being a math major sometimes. If AI performs math better than most mathematician, the people who are passionate about math and want to pursue a career in math are less likely to get hired in the future. It's really conflicting feeling like I'm committing myself to a profession that could start to die out in my lifetime.
If it’s comforting to you, which it is to me, consider seriously that “AI” is not a threat to maths nor to working mathematicians. I think it’s a technological dead end. Some people will cling to it with clawed hands but it’s too expensive to build and maintain, and it’ll be too expensive to manage and administrate. The economy will take care of AI on its own.
For me this really highlights the role of serendipity, i.e. being LUCKY, in discovery. There could be problems that are deductively or analytically solvable with our current information, but in reality will never be discovered because the universe never produces the right conditions necessary for someone to generate that idea in their brain (genetic, upbringing, education, that cup of coffee in the morning etc.)... Or in this case, that electric surge in the transistor and lava lamp configuration in San Fransisco that would've pushed the language model to generate a useful step.
I would argue that there is a very pragmatic and material value to beauty - as a subtype of things which brings us pleasure. Pleasure is necessary to manage our moods. People in bad moods work less efficiently, and moreover are less cooperative which further impairs the performance of the whole system they participate in. Conversely people with strong positive charge usually produce much more results of better quality.
AI is still missing this "big picture" part. When this is somehow added, then we do have AGI. Not sure, if this might just happen from training the next monster model.
Thanks for covering this paper elaborately. Seeing it solve the geometry problem like a middle schooler make it clear AGI is still very far from reality.
Thanks for making the point about the beauty of it. I often feel quite frustrated because my natural talent is in mathematics or things related to it, because this isn’t something that is looked upon as artistic. It might be looked as something that will benefit you, but never truly admired. And there is so much beauty in it as in a great song, or a beautiful picture or any other art. But we are never thought to seek it or appreciate it.
Great video Thank you for going into this depth. I agree with your conclusion. At 25:33, I would argue that your PhD programme / qualification fulfilled both useful criteria, even if the thesis itself did not: (1) It was of immediate practical benefit because it qualified you to take the next step in your career (2) It equipped you with the personal methods (resilience, perseverance, wider imagination) required to work on hard problems where the results are not already known - I could never do this
Some comments on beauty: As an AI researcher with a math degree, it's 100% possible for us to develop AI that generates "beautiful" proofs. As long as you can label whether a proof is beautiful or not, we can train an AI to optimize based on those labels. (Even if you don't understand what makes it beautiful, as long as you can provide a training signal, you can still optimize for it- that's one of the very powerful facts about AI)
Just what I was thinking. We need a big "beautiful proofs" dataset. I think the way to do this will come via automatic formalization of hunan generated results. It feels repetitively feasible to consume maths papers and spit out formalization in something like lean, at least within the 10 year timeframe
I'm curious to know more about your optimism in this regard, because my immediate impression is that this problem will be a LOT harder than simply feeding in a training set of "beautiful" proofs. I'm actually very optimistic about AI overall (after being a skeptic for many years), and I'm amazed at the progress of LLMs in the past couple years. But there are things I think AIs and this type of learning model are quite poor at. Specifically, they're good at generalizations from large numbers of examples (which is I think where you're coming from), but they're also often quite poor about being able to pick out details in some cases. This comes out of the fact that training requires specific goals (again as you note) and to an AI, it may sometimes have no idea why some seemingly minor details are important. We saw this in image generation for a long time, though recent models are starting to get better. I'm talking about things like how the AI images often would create models with 4 or 6 fingers. To a human, this is an obvious problem and flaw, and if you don't understand how fingers work, you don't understand how we touch things, hold hands, etc., hence making a lot of awkward weirdness in AI imagines. To an AI, it's often a seemingly minor detail in an image, some smaller cluster of pixels that it has no reason to realize the importance of or optimize for. Now, if we see that specific flaw, we can optimize of course and provide better training data, with better goals for the algorithms. But even in language learning, we see this fundamental lack of "understanding" from LLMs about basic concepts. For a long time, it was difficult even with the language fluidity of ChatGPT for example to understand that a haiku contains a pattern of syllables and to emulate that. It was even harder to get it to understand how to emulate the meter, specific number of syllables per line, and rhyme scheme of something like a sonnet. Even though GPT models could easily tell you what a sonnet was in great detail, even interpret what a lot of those definitions about meter and poetic feet mean, it had no "understanding" of how to apply that to generate a poem in that form, even after having been fed probably tens of thousands of sonnets in its training base. It just didn't know it should optimize in generating a sonnet around concepts like accent, meter, and rhyme, because even though it can quote definitions, it doesn't "understand" them in the way a human does. I'm not at all saying AI won't eventually develop such types of understanding, but to me while I'm hugely impressed by what it can do now, it still often lacks the ability to generalize a "concept" as a human would, and explaining such a concept to it often doesn't quite work either (though surprisingly it sometimes does, which gives me hope). While even a pretty dumb kid can understand how to create a haiku at least in terms of syllable restrictions very quickly, as they know what a "syllable" actually is and understand that concept. Still, one could still train the model to have some understanding of poetic meter, and it seems GPT models have improved over the past year in that regard. But AI wasn't able to extract this knowledge, I don't think -- but you can correct me, simply from having thousands of examples of sonnets in its training set. So now, going back to geometric proofs, the problem with "beauty" or "elegance" in proofs is that each new proof is its own optimization problem, often specific to that particular proof (or at least a small class of proofs). What makes something elegant in math is that it demonstrates true "understanding" of a concept, which AI models really seem to struggle with. And worse, each proof often has different concepts that need to be highlighted or optimized for. In the video example, for example, an intuitive deep understanding as noted can come from a theorem about secant lines. I'm sure this AI model can be given a long list of geometrical theorems, but how will it realize that in this particular case, such a route is not only more efficient, but "beautiful" because it highlights an intuitive understanding? To me, as someone who taught basic proof-based geometry to human students for quite a few years, I realize how humans gradually build up a set of tools and theorems which create shortcuts. But which shortcut is helpful for a particular proof in order to make it "elegant" is often a very non-intuitive process. It comes from doing lots of proofs, knowing lots of theorems, knowing which ones may be applicable to particular situations, and then also knowing how humans learn and understand, so stating the proof in that "beautiful" way makes a human go, "Oh... yeah, that makes so much sense now!" That feels to me like every single complex geometry proof is kind of its own little "figure out how a sonnet works" type problem for an AI. Not a general class of things that can easily be grouped under one category of "beautiful." And just like it's harder to highlight to an AI how to decide non-obvious features are important for generating a "sonnet," I think each geometrical problem beyond really basic proofs will have their own internal non-obvious features that need to be highlighted and optimized for before generating a "beautiful" proof. So, to me, this feels like something that is several orders of magnitude degrees harder than you make it sound. Simple labels of "beautiful or not" aren't going to be enough here, at least with how LLMs seem to work now. Maybe with lots more computing power, eventually a more general AI with actual reasoning and conceptual "understanding" can be developed that approaches the level of, say, a 5-year-old human child. Once we're there, I imagine grasping the concept of a "beautiful proof" won't be that much farther away, as understanding can grow exponentially (theoretically) with an AI. But for right now, I'm skeptical about your approach in this case. Maybe in a few years things will advance further, but "elegance" in math feels like a much harder problem to tackle.
@@pedroteran5885 I agree. AI algorithms currently are fantastic at optimizing toward a specific quantifiable goal. "Beauty" in the case of proofs like these isn't easily quantifiable, and if it eventually will be, it will have to involve some very complex multistage set of concepts (having to do with how human intuition works, as that's often what makes proofs "beautiful"). For now, it feels like except for the most trivial classes of proofs, what makes something "beautiful" in a specific proof is often quite particular to that situation, not an easily generalizable class of similar things. It's not even just a search for a "shorter" proof (even if that can be meaningfully quantified), as often rather terse proofs can be very non-intuitive and not insightful at all. What you're really optimzing for with "beauty" is something like, "Find me a proof that feels really intuitive to a human." And AIs are good so far at a lot of things, but they definitely don't understand the nuances that connect concepts in what humans often feel to be "intuitive" ways.
@@BobJones-rs1sd There is a whole field of AI research called "AI Alignment" that is designed around making AIs do what we want them to, even if we are unable to describe what that means. Basically, through RLHF, you can sorta train AI to follow vague concepts like "be moral" and "do a backflip". It's not perfect and it's an actual worry whether we're aligning AIs correctly, but for "beautiful math proofs" I think it'll be enough. TLDR as long as you can label proofs as "beautiful" or not, an AI can learn what you mean by it, even if you don't understand it yourself and can't explain what makes it beautiful. This is highly nontrivial and is part of the magic of AI.
We need AI that is able to train itself and store symbolic propositional memories for it to be better, because pre-trained LLMs don't get to learn; they are literally just static functions running on a context window of token parameters. One of the major differences I noticed between humans and LLMs is that humans have a continuous thought stream that differs from their speech output, whereas LLMs don't think without saying every thought they have, and since they're pressured to do so, they don't get to refine their speech before spitting it out improvisationally and without forethought. That makes them unable to learn dynamically, and unable to be creative. Also, the AIs here are not particularly multimodal, so they can't look at the images like we can to get a feel for them, and they also weren't challenged to optimise their proofs once they came up with them, which I find to be a bit unfair in this case. It's interesting how symbolic AI is already a thing that exists though. I myself came up with an idea for an AI that formally generates propositions based on syllogisms and deductions to create and strongly substantiate knowledge before I heard about this today
The biggest problem LLMs have by far is that they see "mimicking human speech" as the end goal. Before I could ever consider it to have any level of intelligence it would need to be using speech as a means to an end rather than being the end goal, ie. instead of learning to speak by copying humans it needs to learn why it needs to be able to speak first. If an AI is built to accomplish some task completely unrelated to speech and then learns how to interpret plain text as a way to improve how it performs at that task, even if it did it at a the level of a child, it would be 1000x more impressive to me than anything that LLMs are doing.
@@gvi341984 Generally, I mean. It should be able to do it for mathematics by learning the frameworks, and it should even be able to discover entirely new branches of mathematics all by itself if it runs for long enough. I probably haven't put nearly as much thought into it as some people, but still it seems way better for making something intelligent than your run-of-the-mill LLM. I wrote a bit about the idea on my site
@@FireyDeath4 It already can the issue is that it can do anything. I mean you can go through the API and try to cycle itself into learning its own math.
I broadly agree with your take, but I wonder how this applies to how modern math research is going, one specific case I have in mind being Mochizuki's attempted proof of the abc conjecture. With proofs of major results nowadays being easily hundreds of pages long, and take the leading experts in the field years to digest, do you think it'd be worth having the AI so it can check our work? Perhaps if we could run Mochizuki's manuscript through the checker then Scholze and others might've decided to spend their time elsewhere.
Actually I think beautiful proofs are very useful as you already told. It gives insight into the understanding the problem better and it builds our intuition.
A very rudimentary theorem prover is an accessible project for a computer science undergraduate without any LLM or neural net tech. Any axiom or theorem can be translated to disjunctive normal form, and any set of such statements can be trivially brute-force churned until a proof-by-contradiction is reached. The hard part is making it _good_ and pruning the junk out of the proofs.
I find it unclear where the ai even is in alphageometry. Obviously there is an llm making suggestions but how is that different from brute forcing suggestions? If the answer is a higher rate of useful suggestions, then is there any reason to believe there is no simpler, conventional (as in not ai) method that does even better than that?
Great take - I agree that the aesthetic aspect is very important, I wonder if the algorithm will eventually start to incorporate this (trying to get fewer steps, quantifying "more powerful" statements, etc). One thing I'll add is that even if we have an ugly proof of say RH, I think that humans will be able to use that to identify areas where our understanding was lacking and build new theories from there.
AI learns so fast, you can see the evolution in front of our eyes, it may be far of today but who knows what next version will bring, its still a baby.
I totally agree with your understanding of what this AI achieved. But I think you'd also be very surprised by the maths that prove Machine learning is possible and the equations that they use in the process. Edit: for your last point, I'm not really scared of AI entering the creative space. It'll replace the bargain bin tier stuff. The shovel ware books and bad content already exists all over the place. It's actually the most common stuff (which is also why it's the easiest for AI to produce). But there will always be those of us that care to put the work and make things exactly as we imagined them. That's beauty, art, philosophy, or what ever else you might call it. AI is a tool, not a replacement. And those who aren't seeing that and are going to try and use it as a replacement are going to fail.
AH is perpendicular to BC which is parallel to EF (midsegment), therefore AH is perpendicular to the line through the centers of the two circles, and it passses through H, a point the we know is on the radical axis of the two circles. We conclude that AH is the radical axis of the two circles and so AB2*AB1=AO*AH=AC1*AC2 implying that B1B2C2C1 is cyclic and we are done.
There is a simple addition to the program that probably engenders proofs that you would consider beautiful. That is reconsider all know theorems that are related to this problem and try to apply them. That is a technique that I applied to problems at university, resulting in tens. (A ten in an European university 1968 is worth more than an A, more over these are pen and paper exams, not multiple choice ones. A minor error in an otherwise correct proof somewhere results in at most nine.
The video is really incredible. I learnt so much, wether about geometry or AlphaGeometry. Thank you for explaining it, I was very intrigued about how it works but never took the time to get my hands into it.
Really interesting take. I think you're right. Otoh, getting more mathematical results in an automated way will not decrease the amount of interest in mathematical aptitude, similarly to how AlphaGo actually increased interest in Go. Let's get more people into math!
I think AI in a lot of cases really just lowers the burden of entry and that's ultimately a good thing. Like LLMs are basically just doing the same thing to the humanities that calculators and computers did to science and maths a long time ago, it's taking away a lot of the drudgery to enable people to focus on learning what actually matters. Like calculators didn't result in us forgetting how to do basic arithmatic but it meant that students could move onto more interesting stuff like algebra much quicker and usually in doing so you end up getting better at arithmatic. LLMs do the same thing for writing, they take care of all the basic stuff like grammar and decent sentence structure and in doing so allows a student to move on from struggling with those aspects and instead lets them think more critically about what they're writing. If in this case the only thing the AI accomplishes is making geometry more accessible by being able to understand natural human language then that's a job well done in my book.
I think ia could easily create beautiful and fresh hypothesis and conjectures if it had a third agent that actually construct measures and compares, lo fi with error bars, changing the variables, like a physicist playing with a toy model. Then the the other ones would try to prove them
I used to be good at maths, maybe not Olympiad level, but decent. I might have an idea on how these kinds of problems are being solved. I agree with your observation that AI solutions were horrible and I assumed so, I expected this, although this will improve in future better more specialised forms of a math AI. And yes, it might reach a level that it solves these kind of problems equally well as humans do, maybe even a lot faster. To the point people will see it as better than us, that it has surpass us in this. Which might seem true, but is it really? Don't forget, there is no person ever who solves these problems by reading and digesting all human mathematics knowledge or even all geometry knowledge. None. Then, while we use deduction, we can also think outside the box, use unlikely tricks, connect unlikely dots. And we don't mess around with a billion unlikely dots to connect, we kinda know when we do, even if it is completely unlikely and unrelated. People who compete in the math Olympiads are especially good at it. AI can't do that, it sucks at it, and it will never be able to do it. So, isn't human thinking kind of magic in a way? At least, it's definitely not the same at all with the way AI programs work. If AI programs were designed after a model of how we supposedly think, those are quite the differences, don't you find? There is a ton of things yet we do not understand about ourselves. And I seriously doubt those simplistic models on how we think and reason. So yeah, even if I kinda expected horrendous solutions from AI, in the end, I will disagree with you, this is not at all similar to how we think. By the way, since you mentioned chess, I'm not sure if you play, and sure chess AI engines have seemingly surpassed humans in strength of play, in truth, they are lacking, there are several areas they are lacking, where you need to be a chess player to notice this, or how they even damaged chess, as people despite the fact they know they are lacking, they are supper accurate in areas they are not, in areas they are good at, so yes, people use them so much for analysis, while they don't put as much effort themselves, and it actually harmed chess. Also, the way chess AI engines "think" and analyse, has absolutely nothing to do with how humans do it. In the same manner, the same goes for a math specialised AI, and I gave you a few differences. So what do you think? Do you really stick to your view that we think alike or are there huge differences between humans and AI programs? Not in capabilities, but in the way they operate. Although I could argue for our capabilities as well, AI programs just seem better - they are not. At least, they are vastly different to us.
I would say that Sabine is correct to the point where you can say that it's a promising direction that doesn't require all that much innovation going forward. AKA, it's a good framework for future AI. Hell, maybe maths will generalize with enough parameters similar to how language tasks seem to with large enough models? Anyways, I find AI like AlphaGeometry to be beautiful in themselves like math.
I just subscribed because I like your general take on beauty in maths. Recently I found something beautiful in number theory and wonder if you have any insights. Start with Euler's quadratic E(n)=n^2-n+p, p=41. The first 40 n's generate only primes which is well-known. Thereafter there's a curious mixture of primes and composites, and the patterns formed by the composites caught my eye. Define a block k s.t. kp
I might know a thing or 2 about LLMs (Large Language Models ), sometimes they'll just hallucinate, an LLM isn't actually processing using reasoning at all but rather just keeps breaking down the problem ( which is fine-tuned to do so ) and reprocessing its whole chain of thoughts hoping it might come to a conclusion, which explains the repetition of steps you've noticed, but neverthless I can't disagree that they are a significant milestone. Thanks for sharing btw
As a chess player, I can understand your obsession with "beauty", but you have to get over it. In chess, sometimes your "beautiful" sacrifice or opening line is not correct, because a chess engine like Stockfish finds an "ugly" refutation. But sometimes, the "ugly" computer move reveals a brilliant and beautiful idea. Computer chess changed not only the game profoundly, but also its aesthetic. And it won't be any different in geometry, or generally in mathematics, even if the first steps look crude and ugly. It will open a whole new world, whether you like it or not.
But did it change chess for the better? I mean, play is now higher in quality, but is it as exciting, or beautiful, or fun? A lot of GMs seem pretty unsatisfied with the current state of chess at the top level, what with it becoming so drawish and dependent on specific preparation.
Proofs and chess are two very different things; you're comparing apples to oranges. There are objective, mathematically "good" moves in chess, so we have some measure of what moves are "beautiful" - that is what moves win the game. They don't think it's ugly in that it is a bad move, but that it goes against what players are used to - in fact these are very insightful moves, and you appreciate the beauty in the move once you understand the reasoning. However, there is no such thing as an objectively "good" proof. There are only correct and incorrect proofs - otherwise you are given pretty much entirely free reign as what tools you want to use. He is not refusing the idea that AI will not get better (and very much states that it could improve in the future), but that in the current state, it's not giving elegant, insightful proofs (and it is subjective as to what makes a proof insightful or elegant or whatever). Rather, AlphaGeometry is just angle bashing, which is something that pretty much anyone and anything can do. A more apt comparison would be asking a GM what makes a move good versus asking a brute force algorithm. One could explain to you what moves to look out for, what lines are threats, etc. The other would just throw millions of possible futures in your face and say it just works. What explanation do you think is more beautiful? Finally, why do you think 3b1b has such a wide audience? Everything and more I can learn from textbooks, but it's almost as if there's a beauty as to how he makes his videos...
I should have said -- the steps I display on screen are the output of AlphaGeometry so it is able to output a stream of (usually) followable deductions
10 หลายเดือนก่อน
@@AnotherRoof Sorry, I probably should have been more exact: is correctness guaranteed by any means? In contrast to chatgpt on maths problems, where it just makes things up.
@ I think so by the nature of the deduction engine basically operating through classical "premise 1, premise 2, conclusion." I know I came off as quite critical in the video but I think this is the cool thing about it, in that it has the wild ChatGPT-esque "ideas" but the inferences are correct
You have to check the output, but the actual deductions are made by the angle chaser which does not make mistakes and proofs aren't that long, so in practice it's not that hard to check and it will probably never make an error
10 หลายเดือนก่อน
Ok, so my conclusion is that it is designed to be correct, but it doesn't produce any self-validating proof (such as verified steps in a formal theorem proving system) for it. @@yonatanbeer3475
The fascinating aspect is that we often can't predict what will become a key driver for future innovation. It might be an issue that currently seems ridiculous, but once it's resolved, along with a few others, it could lay the groundwork for groundbreaking developments.
Can you make a video tutorial on how to use the AlphaGeometry program? That is, how to install it on the computer, how to run it, how to enter the geometry problem data, the drawing and the problem requirements? I simply couldn't find any video explaining how AlphaGeometry can be used!
It seems to me that this concept of an ai model might be powerfull in (homotopy) type theory, I am no expert in that field but intuitively, the syntetic nature of that branche of mathematics give sufficient restriction to the steps the AI model is able to consider. If anyone reading this comment has more experience with type theory, I would love to know your thoughts.
The fact that a lot of HTT is, or can be, written in regular, plain-text syntax could be helpful. I feel like path constructors would be very non-intuitive to the AI though, because almost no other functional programming languages it would be trained on would have quotients as just another construct of the language; in Coq I believe you have very explicit coercions, whereas in Agda, I believe you have to verify your path constructors-internal but not syntactic equalities native to a type-hold after a computation. You'd need a good corpus of information to train into it the context to understand this, I think, because if you tell it this every time you want to solve a problem, if its work gets long enough, it will forget some of what you originally told it.
22:13 This is a good point, though I think there is no reason to only report on 1st place, considering 10 years back it was more a consensus among regular people that computers could never even attempt the kind of tasks that it is doing at the moment, while everyone knew computers could play chess before it beat Kasparov, because that had already been reported on earlier, it's just that nobody remembers or cares about "the first computer to ever be able to play chess", because who cares The first AGI, will be historic, something people likely remember for years to come, just like Deep blue, up until then we will see incremental improvements followed by randomly gaining unexpected abilities thought never before possible, until eventually an AGI is as good or better as the average human at everything AGI doesn't need to be better than every expert, it just needs to be average to break the worlds economic system, Like, if it takes governments seeing a headline in the paper, new AI virus maker can generate computer viruses (or genetic material for real viruses) that can break the worlds most secure everything, then it is already too late for them to do anything about it In the case of Deepmind, they could take as long as they liked, and the reporters never had to report anything, they literally could have ignored the story completely because it's "just some silly game" Chess getting botted just affects chess players, but imagine 90% of all traffic on all social media being bots with human-level intelligence or even slightly below average It may not be important to mathematicians because they have already all been made obsolete by an AGI that is better and cheaper and more scalable than all mathematical effort on earth, that's also more creative and more reliable, but, I think it's definitely worth the discussion if nothing else, because something definitely is happening, even if the media machine is overhyping it for the sake of ad revenue, something they didn't need to rely on in the past due to subscription service news outlets
18:45 this was the way I solved self-posed problems when I was young - playing around and brute forcing things because I didn’t have the tools or experience yet. I still do this to an extent today as an undergraduate student (CS and Math double major), where I often have to solve the problems I mess with using some concrete examples to catch most patterns
I feel like there are a lot of interesting problems in math that were first solved by flopping around. And later they are made compact and beautiful. So this might become a very useful tool in a few papers down the line.
FYI: The engine that played chess, AlphaZero, and played go, AlphaGo, only needed the rules of the games as a difference between the two. This was a completely different approach to the previous alpha-beta pruning approach to chess only, and appropriate to all small games. Until this approach was developed for Go, and got beyond top human level in one day, it was thought that Go was a large problem, not a small one, due to the astronomical number of multi-move paths, dozens of orders of magnitude more than chess. Now, as a combinatorial matter, I have no idea of the complexity level of the various mathematical fields, just that I do not even know the names of the majority of current branches of mathematics. Just in the matter of physics, there are dozens of competing versions of string theory, and no way at present to determine if any of them has additional properties that might be reflected in observed reality, requiring modification of the standard model. Similarly, it's quite clear that humans have not yet exhausted number theory, groups theory, topology, etc. Most humans would not distinguish between Howard Thurston, the magician, and William Thurston, mathematician. So far, baby steps, yet exceeding the realized skills of all but the top 0.1% of humans in mathematics.
I could possibly see the sort of ugly proofs it generates useful as a first step. If we wish to find a beautiful proof for a result, it saves time knowing what the answer is. For example, if we know a result is true, we won't waste time looking for counterexamples.
@@hedgehog3180 yes, but where I see these things being actually useful is in helping us to solve problems to which we don't already know the answer. These sorts of problems are used to test them.
Yeah, I think these sorts of AI papers are meant to impress average people who don't really know the exact sort of problems it's working on. I remember back when I was in school doing the olympiad round 1 and round 2 questions, the geometry ones were not very appealing to me because it was just a tedious process of angle chasing and constructing new points and lines until you get the result you were looking for. It's very much on the easy-for-computers side of problems, where there are just a few different options to consider at each step, but you might need to try them a lot of times to get to the result. That's unlike e.g. the number theory side of things, where there are seemingly endless things you could try, and it takes a bit more of an informed or refined approach to get to the desired result. You have to have more of an idea of what methods will get you closer to proving the result, and that's the sort of thing I think AI should be training to understand. Maybe that's just me, but I'd be much more impressed if it could solve any of the other olympiad problem types. And even then, that's school level maths, so it's still got a way to go.
They’re just doing the ordinary engineering practice of tackling tractable problems. They could figure out a way to do this with the tools they had, so they did it. In a few months someone will have have another breakthrough which people will also dismiss as trivial in retrospect and so on..
It's literally no different. If we take all the mathematical proofs we know as a set, then the set naturally increases over time. So with or without AI, if we assume that this set will increase between now and the next 1000 years as an example (including proofs in number theory), and if we assume that all the proofs are derived from existing axioms and built up over time from our current understanding, then this space of possibilities is finite. It doesn't matter if it's an AI or us humans doing it. If we assume that new information is derived from existing information via some transformation, then a neural network of a large enough size will be able to find it given long enough time.
@@Jackson_Zheng Someone hasn't heard of Gödel's incompleteness theorem. Also this is literally just a “monkeys on typewriters” argument so it doesn't really have any weight. Sure a Neural Network probably is capable of discovering something if we give it an arbitrary ammount of time and computing power but that's true of random chance as well. The thing that actually matters is whether it can do so using a reasonable ammount of time and a practical ammount of computing power. So far neural networks have only been able to approximate the average skill of humans in some very limited tasks like summerizing information, and come up with novel approaches in some very limited spaces like board games. I'm not saying that isn't impressive or useful but it is very far away from actually discovering useful information.
@@empathogen75 Right and Dogecoin is also supposed to go to the moon in just a month! For all the hype of AIs they have yet to actually find a practical application and haven't actually revolutionized anything other than spam so far.
I'd be interested to see if there are proofs AI can convince us are true, but through a proof we can't yet verify. I think your video got at a very important aspect of proofs: the power of explanation. More impressive than proving something unproven is explaining something unexplained.
My limited understanding of AI suggests that we can set a step limit and employ a backtracking mechanism. It's possible that Alpha Geometry can explore multiple solution paths concurrently and remain within the desired complexity bounds.
i think to refine the solution it would have to find many different solutions and then find the one that takes the least steps, since that has the highest probability of being the most efficient.
I think AI will makes us more intelligent. But we will do more and more based on memory and will forget how to solve problems. That's what happened with calculators. People forgot how numbers work and how to calculate rapidly.
Check out Sabine Hossenfelder's video here -- it is a great overview of AlphaGeometry, I just respectfully disagreed with that one part
th-cam.com/video/NrNjvIrCqII/w-d-xo.html
Yeah, no thanks. Don't need to watch a transphobe's videos.
Sabine is just an alt right troll that probably got her degree from a cereal box
I won't be supporting Sabine but I enjoyed your video.
Screw Terfs@@diribigal
Video 6h ago but comment one day ago how?
This brings back some memories of chasing the angles endlessly during a competition and wondering when to stop. Just to realize later that the solution is simple using a lesser known theorem. I hated geometry problems.
Yeah, I saw a one-line proof of this using the nine-point circle and Apollonius's Theorem! Never would have thought of that!
this is what makes me hate and love geometry problems at the same time. kind of a toxic relationship. I suck at them, but they're always so easy, it's just that they always require this one theorem that makes the problem trivial but you don't know it until you do... I feel that algebraic problems are more fun because there are often many ways to solve them not necessarily by brute force but by brilliant bit of ideas that lead to a solution. Where, in geometry, I don't feel like I can just invent a theorem out of thin air like that, but oh well, i'm not so good at geometry so anyways |:
What I want to know is does it know the theorems and identities involved in the proof or did it make them all up as it went along?
@@remotepinecone It knows a foundational set of results, which it then used to deduce valid theorems (known in the paper as "synthetic theorems"). Collectively these are the results it uses to make inferences in problems.
@@Osirion16 I relate to this so much omg.
I mean.. it's a general search algorithm:
- the "logic" part has the edges of the graph and
- the "creative" part picks less random directions than brute-forcing the whole graph.
So while it does generalize to basically all other fields, as you said its utility and insights are questionable for harder/novel problems.
The creative part would be really interesting if it offers a good true positives rate, to maybe offer in the future novel ideas to problems we are stuck on
I think a step minimization step would be great.
More training can help it replicate "beautiful" proofs, even if it's hard for us to explain what makes something beautiful- as long as we can label which proofs are beautiful, we can theoretically train an AI on those labels. (technical details: train a reward model on a set of human "beautiful proof" labels, and use that reward model to finetune a LLM using RLHF)
I think you misunderstand just how vastly more competent human brains are.
Human brains cooperate, adapt, and compute over months to years to solve a hard problem.
An AI that runs in even a few minutes is just not even logically close to the computational resources and inherent efficency of what humans do.
We cooperate, mentally adapt, and do it over months to years.
That is so ungodly powerful.
The idea that AI will help solve novel problems is a drastic misunderstanding of how insanely small its inherent computational ability and efficency is compared to a group of humans.
@@Dogo.R AI scaling laws and biological anchors disagree. Even the most generous estimates place AI as using as much compute as human brains within decades at most. Already, AI LLMs have >1/1000 the number of parameters as we have synapses, and that gap can be closed quickly with exponential growth, larger monetary investments, etc.
I think that your focus on the time scales is a proof of how deeply you are mistaken by putting value in how long it takes to solve the problem. There is abundance of historical evidence that the logic is inverse.
Or, take a look how could apple and ms pass such a great well established company as IBM.
That's a sunken cost fallacy.
Human civilizations wouldn't collapse if your logic was true. If you see something that is able to compete and is way younger and uses less resources it means it's more effective and closing the gap.
Also if you think computers are unable to cooperate in problem solving... Well, yeah...
Not to mention there groups of people behind those computers. They are still tools. Just as with industrial revolution humans elevated physical strength using tools. This has potential to elevate intelligence.
27:00 The painting of Yellow Alex is the most beautiful thing I have ever seen. My eyes are welling up with the sudden surge of emotion and awe.
Don't tell him that.
@@AnotherRoof
I think the British pronunciation of "upsilon" and "epsilon" highlights the etymology of the names. Long ago, the Greeks forgot the names for ε, υ, ο, ω, and some other letters. So they just called them by their sound, the way we do vowels in English, like we just call E "ee." But as some of these vowels started to sound like each other, they added words to the names to distinguish them, like "little o" for omicron and "big o" for omega, similar to the way some Spanish-speakers call b and v "be larga" and "ve corta" or similar. In the case of ε and υ, "psilon" meant "bare," on its own, so an epsilon is a bare e. This distinguished it from αι, which had the same sound. Similarly, upsilon is a bare u, as opposed to ου.
I...can't believe I've never put together that "o-micron" means small o and "o-mega" means large o. 🤯
not quite-it's bare υ as opposed to οι, which by then had come to sound the same. thus, at that time, ὗς "pig" and οἶς "sheep" were homophones pronounced "üs", because of which both being replaced by χοῖρος and πρόβατον, respectively, in the bible
> I think the British pronunciation of "upsilon" and "epsilon" highlights the etymology of the names.
It doesn't do this any more than the non-RP / American pronunciation does. Indeed, the Greek pronunciation of "psilon" has the stress on the second sylalble: "psi-LON", not "PSI-lon", and regardless, it's definitely not "SIGH-lon"/"ep-SIGH-lon". The RP pronunciation is merely a result of the application of British/English phonics rules to the English spelling of "psilon"/"epsilon". See also "pi", "phi", "chi", which in Greek are pronounced "pee", "fee", "kee" (well, /xi:/, but close enough), respectively.
@@JivanPal It's not eps + ilon but e + psilon. Neither the British nor the Americans pronounce these letters the same as the modern Greeks or the ancient Greeks, but the way the British pronounce them emphasizes the two morphemes, whereas the American pronunciation does not.
@@EebstertheGreat I disagree that either English pronunciation highlights that it's e+psilon.
Pausing at 12:45 and just LOOKING at that diagram... I can't even begin to imagine the pain of animating this
it's not an animation, they probably used something like geogebra
and no, it probably wasnt pain to move around points or highlight regions
@26:45 I'm so relieved you stressed the point of beauty! And also fun! My parents were mathematicians who did research together so I grew up hearing all the language at the dinner table. I had no patience for algebra, and still don't, but in my 50s my mathematician-husband helped me realize I can visualize math in my head (geometry, topology.. an if it isn't a visual math problem I'll turn it into one!). It helped that I already knew how to "talk math." Now we have several published papers. It's sooooo fun. I literally only do it for the joy, and also the hope that some day physics will find a use for our math - we already found a deep connection with light polarization. Anyway, now if I'm not busy with anything else I'm doing math research. So yeah, I'm a little weary of the AI. We're hoping it can eventually take the place of what Bruce does in Maple, cuz that is usually a pain.. but not take the fun away. 🤔
The animation visually showing the ai's step by step solution must have been a nightmare to animate. But it made fallowing along so much easer. Props!
I really appreciate you highlighting the point "the AI wasnt designed to make beautiful proofs". Too often in the public discussion around AI, people leave out the fact that the AI has been designed in the first place. It has a reward function - for example, LLMs are designed to mimic human conversations. But this mimicry doesn't require a deeper understanding in order to succeed; it can just cleverly explore the space of possible conversations, in order to see what sounds most like a human. Similarly, here, it is trying to cleverly explore the space of known problem solutions, in order to create a new result. Which is why it looks that way - it always seemed to me to have a lot in common with the "press the middle suggested word on your phone keyboard to write a sentence" game. Or Clippy.
At least for this flavour of artificial intelligence (which I guess is a name that is sticking now), it seems difficult to see how it can develop novel solutions. It might be useful in order to fill in the gaps of existing fields - e.g. find all possible consequences of a set of theorems. But it seems like making the leap to inventing a new field will be beyond it.
Not to mention that the usefulmess/validity of any theorems it does come up with are suspect - as mathematicians, we often _need_ to appeal to authority, because there is too much mathematics for us all to hold in our heads. The way we square this is by knowing that with enough effort, we could reproduce/have explained to us the results we need. But because AI is trying to maximise its reward function, it might not actually be generating valid proofs - each proof would need verified. And if its proofs are thousands of steps long, and not guaranteed to be correct, then how can we trust its results? This is a huge problem that needs to be resolved before it can be useful - this is where the application differs from that of e.g Deep Blue, or AlphaGo. We might be surprised at the way an alien plays chess, but we can at least verify whether or not it has beaten us.
I think (and that's just a guess) that the AI is designed to be perfectly logical, like, straight up use the theorems it knows' statements
As @gabitheancient7664 says, the proofs can be automatically verified if the "left brain" part only uses deduction steps that are known to be valid. It's fairly straightforward to guarantee that if it can find a proof, it's a valid proof.
@@NNOTM yeah having difficulty verifying proofs or making up one that's valid is a very human thing we aren't very "logical" by nature, but computers tho
Dude, you literally have no idea what you're talking about. Appeal to authority? What a joke.
Proofs in mathematics are verified through formal logic. LLMs might not be efficient at finding proofs yet, but the process of verifying mathematical proofs is very well defined. All it takes is a large amount compute.
As for your point about mimicry, it is literally proven that ANY sufficiently large neural network is able to approximate any function to any degree of accuracy.
Since any deterministic process (ie. same inputs = same outputs) can be represented as a function, so can your brain and the brains of every single human on earth.
The only way it can be false is if you deny the assumption that the laws of physics work the same no matter where you are in the universe.
@@Jackson_ZhengAll the points that you mentioned are pretty much in agreement with what the OP said. Not sure why you act so triggered by it. Do note that modern theorem provers are not fully automated, and still require a lot of human intervention (e.g. coq, Lean, Isabelle HOL, etc..), which in large part is what an LLM is supposed to substitute (and it is far away from it, just ask LeCun). And just because something can be approximated arbitrarily close, does not necessarily make the approximation useful, feasible, practical and/or good. That's something a lot of people miss when they say that LLMs just need to be scaled up to achieve AGI. No, we are missing something very fundamental besides mimicry that would enable an AI model to accurately represent a reasoning machine.
"Or it's neither like my phd thesis"
I am feeling you
Same!
We all do. :)
I do think it's important to note that for a lot of mathematical problems just knowing what the answer is might give a human mathematician a better starting point to figure out more elegant and more meaningful proofs even if the AI generated one was a over complicated mess.
Yes, proofs are like code. You can refactor the core ideas into something neater
Exactly. Brute force calculations are used to prove stuff in math all the time. Computers revolutionized many fields of math just because calculations could be run to far faster by computer than by humans, even if the methods were exactly the same. Computers also dont make mistakes unlike humans.
That doesnt mean computers are always useful for every situation in math, but even the relatively "dumb" computer brute force methods are insanely useful. This type of AI makes brute forcing stuff like geometric problems way more doable as well, and that can be very useful for certain projects. Is it going to be useful for everything geometry related? no. But itll be useful for some stuff, and thats what matters.
I agree. Even if the brute force proof cannot be refined into a more beautiful proof, knowing a result is true will allow a mathematician to focus their efforts in that direction. If we knew the Riemann hypothesis were true, we wouldn’t have to waste effort looking for counter examples or trying to disprove it.
Also, for important results (like the Pythagorean theorem) mathematicians will often come up with many different proofs. This is not because we need reassurance that the result is true, but because proving the result in new ways may provide new insights. A brute force approach may not be insightful, but it does tell us something.
@@mrtthepianoman another thing to consider as well is, if a brute force solution ISNT possible, then it means there a more novel approach is required. That can also help narrow down where to look or what kind of ideas to be thinking of.
tbh the a fun part of a math olympiad is the fact that you were able to solve it, a human person with a physical brain, other body parts, and limited time, and that needed you to develop some abilities (I mean my main interest in math olympiads is developing problem solving skills, almost a pedagogical objective)
that seems almost as impressive as saying we could make a machine that beats people in long jumping, like yeah amazing technology but that's not the cool part of seeing someone jumping really long
This has far reaching implications right, someone who was previously unable to solve such problems can now use AG and boom he can work at the same level as a person experienced with math olympiads. After working on your skills for years you get beaten by an AI, that is bad to me.
Basically you dont have an incentive to learn geometry anymore sadly if ag is perfected. you will never be as good as the AI.
@ram527 We already have (non-AI, non-heuristic) algorithms for solving olympiad and olympiad-adjecent problems like inequalities (many inequalities can be bruteforced by multiplying everything out and applying Muirhead) and integrals (Risch algorithm). Geometry also had these sort of algorithms: analytic geometry is popular brute force approach that is commonly used by olympic contestants. Alpha Geometry is only unique in a sense that it is a form of machine learning algorithms, but it is neither the only nor the first thing that can algorithmically solve math olympiad problems. It'd be surpriced if it affected math olympiads in any way.
@@ram527 the fun isn't seeing the solution tho, it's solving it
@@mironhunia300 there's also combinatorics and soome nt problems where you could theoretically read the question and just put it in your computer and test all cases, the fun part is having the creativity to make the cases very small and solve it by hand
I'm not surprised that this method has results, as I've always thought that one of the best applications of LLMs is providing leads to subject experts that can follow up on them. That subject expert could either be a human such as a lawyer or a bespoke program as is the case here. LLMs are essentially playing the role of general-purpose heuristics to prioritize what nodes are explored by the subject expert in what order. AlphaGeometry is just the first notable instance where a computer program played the role of the subject expert.
Well put. It's important people understand that this "subject expert", what they call a deduction engine, is not an AI model; or at least not a machine learning model. It's a more traditional sort of computer algorithm.
The first thing I'll do will be looking at the description and crying in pain upon discovering the empty investigator section
I apologise for the lack of developments recently!
same here
@@yours-truely-sir YOU. YOU ARE LITERALLY THE ONLY OTHER PERSON I'VE SEEN IN THE COMMENTS TRYING TO CRACK THIS. we should be friends XD
@@AnotherRoofat least the videos are great, keep it up!
how far in the puzzle have you gotten so far, and have you solved the countdown one?@@saiphrivas1437
I want to throw problems proven impossible at it and see what hallucinations it creates.
haha, you read my mind!
I don't think it will give you a solution at all. This type of algorithm should be able to throw out any hallucinations generated by the language model portion of it. It shouldn't ever give a wrong answer, but it won't always find an answer. Worst case is it ends up in some endless loop.
The results wont be any different to most human answers
@@iankrasnow5383everything makes mistakes , top computers , and even the best chess algorithms which have been more developed than math algorithms sometimes make mistakes , no matter how narrow the probability is its never zero
Hype! As a contest maths enthusiast, I'm curious what AI can do in this field.
I think our ability to solve contest mathematics with computers has always been bottlenecked by our ability to formalize the mathematics involved, rather than our ability to develop new solving algorithms.
Areas that have been formalized (e.g. algebra, calculus, and probability) have been child's play for WolframAlpha and the like for decades.
There have always been a countable number of statements that can be proven from any given setup, and only a finite number of them that can be proven in a fixed amount of steps. Computers' abilities to explore several more orders of magnitude of that space is easily going to overpower humans' abilities to draw on experience and pattern recognition to explore the right paths. This is especially true given our ability to codify more and more of our intuition into these programs.
I love this. I, for one, don't feel like AI has to be elegant, perfect, and utterly mind-blowing at first.
I think a lot of self-proclaimed "nerds" (for the lack of a better word) reward being amazingly good as a requisite for sharing anything on the internet, as opposed to consistent, often imperfect, growth (which I think is the more realistic progression toward anything good)... just like how people will nitpick on every youtube video for not being perfect, we'll nitpick AI and find a million reasons that "it's really not that impressive if you think about ..."
I'm more excited that we even arrived at this point. It's a step in the right direction toward having an amazing and creative logical supplement to human intelligence. What a time to be alive! Love these videos, I love that you took the time to do a deep dive and I'm super interested in more breakdowns. Subscribed!
Rename this video to "AI can solve geometry, buuuuut...", and do thumbnail like "left: human - 12 steps vs right: AI 49 steps, and draw simple proof on left and complex proof on right" and you will get a ton of views. I thought your video will be just paper explanation, and not your deep look into the theme. Please search a new title and thumbnail, that express that core of your video. It's really good, but current thumbnail is not that catchy.
Spot on
This video is his highest viewed in a few months, and its larger than his subscriber count. It is objectively catchy.
Maybe he’s satisfied with his work and doesn’t want to play the algorithm game
@@herobrine1847 it's not only about playing algorithm, it's about reaching certain people. I read every article and wathched every video about AlphaGeometry, because I obsessed with new ML advancements. At first I thought that I will learn nothing new from this video, but I started to watch it by mistake and pure chance.
So, there are may be a lot of people who will not watch this video, because they already watched Yannic or Sabrine and thought like me, but they want information from this video. But my variant of a thumbnail can boost their interest.
@@optozoraxwhile true, this thumbnail appeals to a different audience of people, your suggestions just shift the audience, not necessarily expanding it.
There's an epistemological problem here as to the "beauty" of a proof. Turing's supposition, in respect of what we now call LLMs, was that perfect mimicry is indistinguishable from some other immeasurable sign of agency, to the extent we know when LLMs fall down, in the strictest sense they are not passing the Turing test. I suspect that (rather quickly) AI will acquire enough learning to meet our indefinable sense of 'beauty' in a proof - if educated mathematicians are able to 'know it when [they] see it', there must be some set of insights, available to any organic or inorganic reasoning machine, that would include 'beauty' in respect of proofs (or for that matter general reasoning or 'creativity')
It's not clear to me that Turing believed this. What he certainly believed is that many people naively believe they know the contrary when they certainly don't (that there is for certain some elegant test). If you have such thing in hand, you should be able to distinguish mimicry from the real thing. If we can't distinguish the "real" thing from mimicry, then we are all wet in thinking we know what the "real" thing looks like. When the computer passes the Turing test, we don't actually know it only uses mimicry. It might actually be doing the real thing the same way we do the real thing. Unless you think we mimic ourselves to prove our own intelligence.
I realized AI was doing original research when a smaller LLM told me butterflies had no legs. I'm sure it looked at a lot of butterflies.
When I was taking AI in college back in the 1980s (yes, I'm Old (TM)), one of the neat programs they talked about was Automated Mathematicians (see wikipedia), which starting from real rock bottom definitions about sets and used heuristics to generate interesting lines of inquiry, eventually recapitulating Goldbach's Conjecture (although reading through these things now, it may be that the author overstated the case)
I've always thought that for a computer to be good at math, it would need to be able to gauge how close it is to finding a proof. A chess computer is strong because because it knows wether or not the position is good/winning even if it hasn't found a winning sequence. In other words, the math computer would need to know how easy/hard something is to prove with respect with what it knows, even if it doesn't know the proof. Now that would be impressive.
Now, I also that there is some hope for AI to discover meaningful things and not just prove give propositions. Given a dataset of results with their proofs we could try to identify results that would significantly simplify the existing proofs. However I don't really see how an AI would be able to define useful new objects, which is th richest type of math innovation. This amounts to the AI enriching the language it is working with.
That was a really cool video! (don't mind me I'm just trying to confuse your future viewers)
wait what
Ah, you got me there for a second
Eh?
Exactly
heh
the thing i notice is that it doesn't seem like the AI part of this is very important. i'm pretty sure it would be easy to code up a conventional program to play the part of left/right brain. it feels more like another thing trying to capture the AI craze than a real advancement.
Is it not half of the algorithm? I'm not sure AI is difficult to code (in general). You do need resources to train it though,
Yeah. I imagine the AI helps somewhat mainly in directing the results toward the actual answer. Effectively, all you need is some measure of distance from the current position to the proof and it's just like, a basic pathfinding algorithm.
I imagine the actual practical use of this system though would be to just integrate the part which is actually consistent into programs mathematicians use, so they can choose what new things to add while the computer automatically chases angles and does the "dirty work". Which, means that you're effectively throwing out the AI part in favor of a real intelligence and just using the already existing proof checking techniques with a bit of automated search applied.
@@electra_ I just browsed the paper, and you pretty much hit the nail on the head. The actual deduction engine is not based on neural networks. It's based on methods that were designed decades ago and are human-understandable. This kind of program just outcompetes humans in the way computers always have: by doing a lot of logical operations very fast. The major advance made was figuring out a way to train the language model, since there isn't enough training data. They avoided this problem by algorithmically generating millions of problems and their proofs, and then training the language model on that.
AlphaGeometry isn't better at the actual problem solving part, it just has a better language engine for converting a human-readable problem into a usable form, and then judging each attempt based on heuristics. The language processing part isn't doing anything that would be difficult for a human. The deductive engine is, but it's just a traditional algorithm, not an "intelligent" algorithm
A trained human with the benefit of such a deduction algorithm on their computer would vastly and easily outcompete AlphaGeometry. A general LLM like GPT without any deduction engine would not be able to solve these problems at the current time.
The AI part just seems like an interpreter that gives instructions to what is just a maths program. That potentially has uses and is sorta cool in the general sense that LLMs enable computers to more directly tackle problems you'd need a human to input before but also like all LLMs it isn't doing anything a human couldn't do and it does it at a fairly mediocre skill level. It's really just the general issue that current AI seems like a solution in search of a problem that isn't just producing spam.
I think what this does sort of show is that in order for an AI to be actually useful in a specific context, especially one where you need to be objective, you sort of need to heavily restrict the AI by placing it in the middle of a larger system, and have components that can validate its input.
The only difference between this and brute forcing Sudoku with all the codified inferences is when you have to add lines to the diagram, and there's an inexhaustible supply of novel lines to mess around with, and you could end up chasing angles forever, and never arrive at your desired conclusion.
If it was training on past Olympiad problems, then it has a good statistical model of what kinds of lines you might need to add for the kinds of problems that test tends to pose. And then you're pretty much back to brute force again, because the space of lines you might need to play with is relatively closed relative to your statistical model.
What happened to the first roof?
We don't talk about that.
The proof incident
My question for the creators of this AI is the same as for those that generate text or images: why do you create AIs that do the jobs that people want to do?
Making impressive stuff is fun
The absolute stupidity of your comment.
Progress. Why we discovered the fire? Why we created/discovered math? Why we create physics? Why we start organizing in society? Why we look up at the sky and started questioning the reality? Why we created the wifi, internet, medicine and all other things? AI has already been use to decodify and represent in 3d models for the first time a molecule, and this help and save so many people, AI are being used in nuclear's fusion reactors to produce more efficient and clean energy . The progress is beautiful if used correctly.
@@CeuAzulll Progress to causing mass unemployment, IS NOT PROGRESS.
Can't wait to watch this with everyone later!
5:22 The simplified version is sufficient because if the three circumferences were distinct, their pairwise radical axes should intersect, but they are the three sides of the triangle, which have no point in common
When I did math comps we called this approach "geometry bashing". When you have no idea what else to do, draw some lines and work out the angles, see if you get anything useful. The most generous interpretation is that AI has mastered the worst technique high-schoolers use. And that's the only tool in their toolbox. Obviously it's very cool that somebody taught an AI how to geometry bash, and it suits computers' strengths fairly well, but this is leagues away from anything generalizable.
It makes perfect sense if you imagine an evolutionary algorithm behind human problem-solving and learning. Something generates lots of mutated procedures, tests them, mates them, reproduces them, and then through Darwinian (or even Lamarckian) evolution, comes up with ways of proving and/or learning stuff. As the Wiki for the No Free Lunch Theorem says, "The 'No Free Lunch' Theorem argues that, without having substantive information about the modeling problem, there is no single model that will always do better than any other model. Because of this, a strong case can be made to try a wide variety of techniques, then determine which model to focus on."
Through optimization (again, using something like genetic algorithms) the system learns better and better ways how to prove things and how to learn.
To be fair, "International Olympiad with a calculator" is a different category of competition from just "International Olympiad". And a symbolic deduction engine is basically a very powerful calculator.
Still, even in this other category, it's an impressive achievement.
Mathematics as logical poetry, love this, being a math major and poet myself. Also the video itself is very well done. Keep it up!
It can also be solved fairly trivially with radical centres, without knowing anything about cyclic quadrilaterals
Really excellent video! I now understand why others haven’t taken a deep dive into how Alpha Geometry works 😂
I had this exact thought. "Why isn't anyone covering this in detail??" Then when I made lots of incredibly messy Geogebra diagrams: "Ah. This is why."
so have you seen how new ai from deepmind got silver medal in imo? its amazing
The brute force attack of any problem presented to AI does have the simplicity of use very base, core postulates. Where a human would use multiple time-saving theorems that each contain their own elaborate proofs, the lack of using or identifying these upper level theorems to solve a problem doesn't make them less real.
Does brute force like this take longer than using proved theorems to short-cut the system? yes.
Does such a longer process matter in a time scale that computers operate on? Not always, but sometimes it does.
To make this AI Math algorithm faster, and more intuitive, programmers WILL be able to program future versions to use more and more proven theorems... HOWEVER...
Is AI able to invent, or even know to invent, new theorems for future stream-lining? Not at this time... but that's the goal isn't it.
The amount of work put into this video is crazy
I also love the optimistic view of the video - that this could be a worthy foundation. This is the "messing around in the dark" part in coming up with a solution. If you could build on the solution - simplify it - connect it to higher level concepts in other proofs - you might start raising its beautification score in the answer. Then you'll get papers saying "our new solution averaged 0.26% more beautiful solutions than other leading AI mathematicians over a course of 20 standard proofs. ;)
Kudos for this video shouting out that we SHOULD build on this solution and we SHOULD have beauty as a metric.
As an aside, "I can't describe why it's beautiful, but I know it when I see it", is absolutely the hallmark of a neural net, so you must be using one in your head. :)
Great video, I really loved the insights into why we do proofs, I definitely agree with what you were saying about the beauty of proofs. For me Is if I could have thought of it myself? As in could I guess the next step of the proof or have an idea of the result you're trying to get to without already reading through the proof. AI just randomly works out enough information to try to guess of what the inner workings of the result which it does with enough trial and error, maybe this is the way a human could get to the answer but it isn't satisfying.
Seeing the underlying structure of a beautiful proof allows us to abstract those ideas used in the specific proof to other results in the same beautiful ways. To me that's what mathematics is all about seeing those connections between not necessarily intuitive topics but being able to construct a framework where those connections do become intuitive, that's why results like Fermat's last theorem are so satisfying...
This is all true, and yet I feel that an AI proof, even a long, ugly, unintuitive, meandering proof - as long as it is probably correct - would serve as a very solid foundation for a human mathematition to know that a result is achievable and thus refine-able into a beautiful, concise, intuitive proof that does everything you could want.
I feel this way about AI in every field - it replaces brick-makers so that everyone who used to make bricks can focus on making walls, Wall-makers can make rooms, room builders can make buildings and those used to making buildings can start to design neighbourhoods. It frees every thought worker to stand on a higher foundation and increases the scope of what they can do.
So what if AIs start replacing the need for mathematitions to find proofs to simple problems; all this will do is free up those same mathematitions to find proofs for entire systems of related proofs instead of doing them one by one - which I think further improves math in gulps and bounds instead of sips and steps.
This is my view on why I don't mind AI making static images; I expect the artists who used to make static images to instead shift their focus to coordinating/creating collages, videos and interactive museums of static images; AI that makes music - great. I expect musicians to be freed from making sounds to conducting orchestras, running virtual symphonies and theatres with many pieces of music.
As an analogy: If farmers no longer have to plow because they have tractors, I expect the farming world to improve and the role of farmers to become coordinating different tractors working together and that this is actually an improvement in the role of farmers and their scope rather than their replacement.
21:58 well, deep blue actually lost to Gary Kasparov the first time they played, but its search algorithm was improved and it beat Kasparov on their rematch a little while later
Always gotta be carefull about almost all "tests" as they go through human weaknesses that reality doesnt.
I'll explain.
For example humans can remember insane amounts of things... but remembering a few things quickly and recalling them a bit after is hard.
And its WAY harder if you ask them to use their brain for other stuff in between.
And even harder if those things dont just need to be remembered but need to be used in the proccessing. Geez they are going to have an extremely hard time. Their familiarty with the concepts has to be insanely good to efficently encode and hold that information in such an on demand way and while asking them to leave brain power to do computation on it.
Computers can save data, use it for logic, then rewrite it like its litterally nothing.
This results in almost all "tests" being vastly favored towards computers... because most tests are lots of small problems that you need to load into your brain, do work with, then compute and answer, and repeat.
Which is just not like normal real life problems that humans are good at.
Real live problems you spend hours and hours and hours and months and months and months working on.
And along with other humans and other stimuli.
And with tons of hours of sleep to restructure the brain for being better at solving the problem.
Almost all "Tests" are very much not like reality AND and very much favored towarda computers and their memory model.
Only tests that are very long term act like reality. Reality is humans solving problems over long periods of time.
Also for the math topic. Think about mathamatica. Is mathamatica "a big step towards super intelegence"?... I mean its an ungodly powerful tool for math.
Anyways just a seed for thought.
Hence why LLMs aren't writing papers or doing research.
Honestly I find angle chasing done well extremely beautiful and creative. In fact, more creative than the shortest solution You gave. Just my honest opinion. Beauty lies in the eye of beholder.
Does it enlighten? Of course it does. It introduces all those angles, giving a nontrivial parametrization of the layout of the problem in question.
I think you’ve hit the nail on the head with your observation about the aesthetics of mathematical proofs. I think current AI tools are a long way from ‘understanding’ beauty, and I suspect that in mathematics, as in other domains, they will show a lack of a creative spark, producing work that is derivative and soulless, with proofs that fail to capture our imagination.
I bet they can cut out the LLM and replace it with a random chooser. LLMs are expensive to run and I feel like the random chooser would have done just as well.
This video is a wonderful review of the state of the art in the development of assisted proofs, taking alphageometry as a case study!
Yes. Deep Blue couldn't be easily turned around to play Go. But AlphaGo was quickly turned around into AlphaZero, which didn't even need the examples of expert play. Just given the rules of various games (Go, Chess, Shogi) and a day to study, it could play them at expert levels. That's more typical of what to expect for the development of this system.
To be fair, deep blue had to basically be “tuned” to play against Kasparov while Kasparov was not allowed to see how Deep Blue plays to develop his strategy.
@@realGBx64 No need for the "to be fair"... that's the point. Because of tech limitations (hard and soft), Deep Blue was very specialized to the task (special chess chips and programmed knowledge). The special hardware in modern systems is for general deep learning calculations, and the programming for AlphaGo itself was not an expert system (the programmers weren't great Go players, or relying on such knowledge to tune things). You don't have to work at the low levels to do powerful stuff anymore.
It was an ego war. IBM had a huge ego about being the first computer to crush the world champion, and Kasparov had a huge ego in being the last human to withstand this incursion. When they negotiated the terms of the match, all this was taken into account.
Kasparov had a calculation that computers wouldn't be able to solve the problem of long tactical horizons in that time frame, and he felt he could steer the program into positions where this blindness would prove fatal. He got the computer into precisely such a position, and then it the chose the move he was convinced it couldn't find. This crushed his confidence about winning, and he made some blunders over the board subsequent to this.
Later he sued IBM to justify how the computer had been able to make this move, and they came back with the rather unsatisfying explanation that the computer had crashed, and that a safety system had randomly picked that picked that move so that the computer didn't default on the whole game. This makes no sense to me as a safety system, since computer chess algorithms have long had a structure where you solve to a certain depth, order the available moves accordingly, then solve again to a higher depth, until you judge that your improvement won't justify a further time investment. This tends to be the best way to do it because the transposition table makes the repeated analysis rather inexpensive, and the successively improved order of evaluation makes the alpha-beta pruning more effective.
When you do that, you always checkpoint the best known response from the last level of analysis, so there should always be a good checkpoint available, and you should never have to resort to a random move selection (unless you crash on the first ply at the first depth, in microseconds, suggesting you should start again and hope for better).
I didn't actually read the details of this imbroglio, but I collected a few points from much peripheral discussion. I've also viewed an entire DVD documentary where Kasparov is allowed to litigate whether IBM cheated. It seems unlikely IBM needed to cheat, the program was incredibly strong, but the final explanation from IBM was not a good look.
In any case, Kasparov was doomed in a year or two at the rate of progression, and it was just a matter of time in any case.
AlphaGeometry doesn't work much like AlphaGo/AlphaZero. Those are "simple games" in that they have simple rules and a limited set of moves. AlphaGeometry is a transformer based language model (like GPT) connected to a hand-coded "deduction algorithm" which doesn't use machine learning. The language model was trained on millions of computer generated geometry problems/proofs. This type of program is only as good as its human-coded engine allows it to be.
AlphaGo is much simpler and more elegant, which is why it's so powerful. It's a machine learning algorithm that learns through play rather than through extensive tree-search methods like older chess engines. AlphaGo has limitations too. It tends to be strong against the world's best players, but it can be overwhelmed by rudimentary strategies that would never fool a good player, but haven't come up during its training. For instance, a year or two ago, someone discovered that a simple technique of surrounding AlphaGo's tiles, letting them win every time. (i forget the details since I don't play Go). Exploits like this for fooling adversarial neural networks will always be a problem.
@@vib80 But that's exactly what this is though, the actual AI portion of this is not the one doing the math and actually figuring out the proof. The AI here is just an interpreter that also tries to make suggestions for what to do next and then lets a totally normal math program just sort through everything until it hits a dead end. So they can't actually generalize this to other problems until someone figures out how to make an AI that itself can do math, as in not an AI with a CAS module bolted onto it. Until then this is basically how a lot of AIs in strategy games have worked where there are two seperate systems that pass ideas back and forth to each other in order to limit how long the AI spends looking for the next move.
It's hard to comprehend the true difficulty of phrases like "aligning AI with human values" until you are faced with real situations like this. Any human value I could brainstorm would have been superficial compared to so human as the *soul* of mathematics or the cultural necessity of beauty.
for the first problem
i might be wrong here but cant you split the large triangle to 3 triangles, OAC, OAB, OBC, OAB is 90-x, name angle OAC a and angle OCB b, since its a circle OC=OB=OA so OBC=b, OAC=a,
triangles must sum to 180
CAB+ABC+BCA = 180
(a+90-x)+(b+90-x)+(a+b) = 180
2(a+b)+180-2x=180
2(a+b)=2x
a+b=x
since BCA = a+b,
BCA=x
AI makes me worry about being a math major sometimes. If AI performs math better than most mathematician, the people who are passionate about math and want to pursue a career in math are less likely to get hired in the future. It's really conflicting feeling like I'm committing myself to a profession that could start to die out in my lifetime.
If it’s comforting to you, which it is to me, consider seriously that “AI” is not a threat to maths nor to working mathematicians. I think it’s a technological dead end. Some people will cling to it with clawed hands but it’s too expensive to build and maintain, and it’ll be too expensive to manage and administrate. The economy will take care of AI on its own.
For me this really highlights the role of serendipity, i.e. being LUCKY, in discovery. There could be problems that are deductively or analytically solvable with our current information, but in reality will never be discovered because the universe never produces the right conditions necessary for someone to generate that idea in their brain (genetic, upbringing, education, that cup of coffee in the morning etc.)... Or in this case, that electric surge in the transistor and lava lamp configuration in San Fransisco that would've pushed the language model to generate a useful step.
I would argue that there is a very pragmatic and material value to beauty - as a subtype of things which brings us pleasure. Pleasure is necessary to manage our moods. People in bad moods work less efficiently, and moreover are less cooperative which further impairs the performance of the whole system they participate in. Conversely people with strong positive charge usually produce much more results of better quality.
AI is still missing this "big picture" part. When this is somehow added, then we do have AGI. Not sure, if this might just happen from training the next monster model.
Thanks for covering this paper elaborately. Seeing it solve the geometry problem like a middle schooler make it clear AGI is still very far from reality.
Thanks for making the point about the beauty of it.
I often feel quite frustrated because my natural talent is in mathematics or things related to it, because this isn’t something that is looked upon as artistic. It might be looked as something that will benefit you, but never truly admired.
And there is so much beauty in it as in a great song, or a beautiful picture or any other art. But we are never thought to seek it or appreciate it.
Great video
Thank you for going into this depth. I agree with your conclusion.
At 25:33, I would argue that your PhD programme / qualification fulfilled both useful criteria, even if the thesis itself did not:
(1) It was of immediate practical benefit because it qualified you to take the next step in your career
(2) It equipped you with the personal methods (resilience, perseverance, wider imagination) required to work on hard problems where the results are not already known - I could never do this
Some comments on beauty: As an AI researcher with a math degree, it's 100% possible for us to develop AI that generates "beautiful" proofs. As long as you can label whether a proof is beautiful or not, we can train an AI to optimize based on those labels. (Even if you don't understand what makes it beautiful, as long as you can provide a training signal, you can still optimize for it- that's one of the very powerful facts about AI)
You can optimize for proofs similar to those, but they wouldn't necessarily be similar in the aspects that make them beautiful.
Just what I was thinking. We need a big "beautiful proofs" dataset. I think the way to do this will come via automatic formalization of hunan generated results. It feels repetitively feasible to consume maths papers and spit out formalization in something like lean, at least within the 10 year timeframe
I'm curious to know more about your optimism in this regard, because my immediate impression is that this problem will be a LOT harder than simply feeding in a training set of "beautiful" proofs. I'm actually very optimistic about AI overall (after being a skeptic for many years), and I'm amazed at the progress of LLMs in the past couple years.
But there are things I think AIs and this type of learning model are quite poor at. Specifically, they're good at generalizations from large numbers of examples (which is I think where you're coming from), but they're also often quite poor about being able to pick out details in some cases. This comes out of the fact that training requires specific goals (again as you note) and to an AI, it may sometimes have no idea why some seemingly minor details are important.
We saw this in image generation for a long time, though recent models are starting to get better. I'm talking about things like how the AI images often would create models with 4 or 6 fingers. To a human, this is an obvious problem and flaw, and if you don't understand how fingers work, you don't understand how we touch things, hold hands, etc., hence making a lot of awkward weirdness in AI imagines. To an AI, it's often a seemingly minor detail in an image, some smaller cluster of pixels that it has no reason to realize the importance of or optimize for.
Now, if we see that specific flaw, we can optimize of course and provide better training data, with better goals for the algorithms. But even in language learning, we see this fundamental lack of "understanding" from LLMs about basic concepts.
For a long time, it was difficult even with the language fluidity of ChatGPT for example to understand that a haiku contains a pattern of syllables and to emulate that. It was even harder to get it to understand how to emulate the meter, specific number of syllables per line, and rhyme scheme of something like a sonnet. Even though GPT models could easily tell you what a sonnet was in great detail, even interpret what a lot of those definitions about meter and poetic feet mean, it had no "understanding" of how to apply that to generate a poem in that form, even after having been fed probably tens of thousands of sonnets in its training base.
It just didn't know it should optimize in generating a sonnet around concepts like accent, meter, and rhyme, because even though it can quote definitions, it doesn't "understand" them in the way a human does.
I'm not at all saying AI won't eventually develop such types of understanding, but to me while I'm hugely impressed by what it can do now, it still often lacks the ability to generalize a "concept" as a human would, and explaining such a concept to it often doesn't quite work either (though surprisingly it sometimes does, which gives me hope). While even a pretty dumb kid can understand how to create a haiku at least in terms of syllable restrictions very quickly, as they know what a "syllable" actually is and understand that concept.
Still, one could still train the model to have some understanding of poetic meter, and it seems GPT models have improved over the past year in that regard. But AI wasn't able to extract this knowledge, I don't think -- but you can correct me, simply from having thousands of examples of sonnets in its training set.
So now, going back to geometric proofs, the problem with "beauty" or "elegance" in proofs is that each new proof is its own optimization problem, often specific to that particular proof (or at least a small class of proofs). What makes something elegant in math is that it demonstrates true "understanding" of a concept, which AI models really seem to struggle with. And worse, each proof often has different concepts that need to be highlighted or optimized for.
In the video example, for example, an intuitive deep understanding as noted can come from a theorem about secant lines. I'm sure this AI model can be given a long list of geometrical theorems, but how will it realize that in this particular case, such a route is not only more efficient, but "beautiful" because it highlights an intuitive understanding?
To me, as someone who taught basic proof-based geometry to human students for quite a few years, I realize how humans gradually build up a set of tools and theorems which create shortcuts. But which shortcut is helpful for a particular proof in order to make it "elegant" is often a very non-intuitive process. It comes from doing lots of proofs, knowing lots of theorems, knowing which ones may be applicable to particular situations, and then also knowing how humans learn and understand, so stating the proof in that "beautiful" way makes a human go, "Oh... yeah, that makes so much sense now!"
That feels to me like every single complex geometry proof is kind of its own little "figure out how a sonnet works" type problem for an AI. Not a general class of things that can easily be grouped under one category of "beautiful." And just like it's harder to highlight to an AI how to decide non-obvious features are important for generating a "sonnet," I think each geometrical problem beyond really basic proofs will have their own internal non-obvious features that need to be highlighted and optimized for before generating a "beautiful" proof.
So, to me, this feels like something that is several orders of magnitude degrees harder than you make it sound. Simple labels of "beautiful or not" aren't going to be enough here, at least with how LLMs seem to work now. Maybe with lots more computing power, eventually a more general AI with actual reasoning and conceptual "understanding" can be developed that approaches the level of, say, a 5-year-old human child. Once we're there, I imagine grasping the concept of a "beautiful proof" won't be that much farther away, as understanding can grow exponentially (theoretically) with an AI.
But for right now, I'm skeptical about your approach in this case. Maybe in a few years things will advance further, but "elegance" in math feels like a much harder problem to tackle.
@@pedroteran5885 I agree. AI algorithms currently are fantastic at optimizing toward a specific quantifiable goal. "Beauty" in the case of proofs like these isn't easily quantifiable, and if it eventually will be, it will have to involve some very complex multistage set of concepts (having to do with how human intuition works, as that's often what makes proofs "beautiful"). For now, it feels like except for the most trivial classes of proofs, what makes something "beautiful" in a specific proof is often quite particular to that situation, not an easily generalizable class of similar things. It's not even just a search for a "shorter" proof (even if that can be meaningfully quantified), as often rather terse proofs can be very non-intuitive and not insightful at all. What you're really optimzing for with "beauty" is something like, "Find me a proof that feels really intuitive to a human." And AIs are good so far at a lot of things, but they definitely don't understand the nuances that connect concepts in what humans often feel to be "intuitive" ways.
@@BobJones-rs1sd There is a whole field of AI research called "AI Alignment" that is designed around making AIs do what we want them to, even if we are unable to describe what that means. Basically, through RLHF, you can sorta train AI to follow vague concepts like "be moral" and "do a backflip". It's not perfect and it's an actual worry whether we're aligning AIs correctly, but for "beautiful math proofs" I think it'll be enough.
TLDR as long as you can label proofs as "beautiful" or not, an AI can learn what you mean by it, even if you don't understand it yourself and can't explain what makes it beautiful. This is highly nontrivial and is part of the magic of AI.
We need AI that is able to train itself and store symbolic propositional memories for it to be better, because pre-trained LLMs don't get to learn; they are literally just static functions running on a context window of token parameters. One of the major differences I noticed between humans and LLMs is that humans have a continuous thought stream that differs from their speech output, whereas LLMs don't think without saying every thought they have, and since they're pressured to do so, they don't get to refine their speech before spitting it out improvisationally and without forethought. That makes them unable to learn dynamically, and unable to be creative. Also, the AIs here are not particularly multimodal, so they can't look at the images like we can to get a feel for them, and they also weren't challenged to optimise their proofs once they came up with them, which I find to be a bit unfair in this case. It's interesting how symbolic AI is already a thing that exists though. I myself came up with an idea for an AI that formally generates propositions based on syllogisms and deductions to create and strongly substantiate knowledge before I heard about this today
The biggest problem LLMs have by far is that they see "mimicking human speech" as the end goal. Before I could ever consider it to have any level of intelligence it would need to be using speech as a means to an end rather than being the end goal, ie. instead of learning to speak by copying humans it needs to learn why it needs to be able to speak first. If an AI is built to accomplish some task completely unrelated to speech and then learns how to interpret plain text as a way to improve how it performs at that task, even if it did it at a the level of a child, it would be 1000x more impressive to me than anything that LLMs are doing.
For mathematics? They have thousands of papers to read from and the combination of the Wolfram AI with Chatgpt is changing how math is learned
@@gvi341984 Generally, I mean. It should be able to do it for mathematics by learning the frameworks, and it should even be able to discover entirely new branches of mathematics all by itself if it runs for long enough. I probably haven't put nearly as much thought into it as some people, but still it seems way better for making something intelligent than your run-of-the-mill LLM. I wrote a bit about the idea on my site
@@FireyDeath4 It already can the issue is that it can do anything. I mean you can go through the API and try to cycle itself into learning its own math.
I broadly agree with your take, but I wonder how this applies to how modern math research is going, one specific case I have in mind being Mochizuki's attempted proof of the abc conjecture. With proofs of major results nowadays being easily hundreds of pages long, and take the leading experts in the field years to digest, do you think it'd be worth having the AI so it can check our work? Perhaps if we could run Mochizuki's manuscript through the checker then Scholze and others might've decided to spend their time elsewhere.
Actually I think beautiful proofs are very useful as you already told. It gives insight into the understanding the problem better and it builds our intuition.
I mean it's the only way we can really learn.
A very rudimentary theorem prover is an accessible project for a computer science undergraduate without any LLM or neural net tech. Any axiom or theorem can be translated to disjunctive normal form, and any set of such statements can be trivially brute-force churned until a proof-by-contradiction is reached. The hard part is making it _good_ and pruning the junk out of the proofs.
Reminds me of a younger me, doing a dozen fractions because I refused to learn long division.
I like the perspective of framining proof of a theorem as a search problem. All thats left to do is intelligently search the space.
I find it unclear where the ai even is in alphageometry. Obviously there is an llm making suggestions but how is that different from brute forcing suggestions? If the answer is a higher rate of useful suggestions, then is there any reason to believe there is no simpler, conventional (as in not ai) method that does even better than that?
Yes, it's unlikely that there isn't. But nobody has come up with one yet.
Great take - I agree that the aesthetic aspect is very important, I wonder if the algorithm will eventually start to incorporate this (trying to get fewer steps, quantifying "more powerful" statements, etc). One thing I'll add is that even if we have an ugly proof of say RH, I think that humans will be able to use that to identify areas where our understanding was lacking and build new theories from there.
AI learns so fast, you can see the evolution in front of our eyes, it may be far of today but who knows what next version will bring, its still a baby.
I totally agree with your understanding of what this AI achieved. But I think you'd also be very surprised by the maths that prove Machine learning is possible and the equations that they use in the process.
Edit: for your last point, I'm not really scared of AI entering the creative space. It'll replace the bargain bin tier stuff. The shovel ware books and bad content already exists all over the place. It's actually the most common stuff (which is also why it's the easiest for AI to produce). But there will always be those of us that care to put the work and make things exactly as we imagined them. That's beauty, art, philosophy, or what ever else you might call it. AI is a tool, not a replacement. And those who aren't seeing that and are going to try and use it as a replacement are going to fail.
I remember using AI for a topology task to prove something with mappings. God, never using it again...
AH is perpendicular to BC which is parallel to EF (midsegment), therefore AH is perpendicular to the line through the centers of the two circles, and it passses through H, a point the we know is on the radical axis of the two circles. We conclude that AH is the radical axis of the two circles and so AB2*AB1=AO*AH=AC1*AC2 implying that B1B2C2C1 is cyclic and we are done.
There is a simple addition to the program that probably engenders proofs that you would consider beautiful. That is reconsider all know theorems that are related to this problem and try to apply them. That is a technique that I applied to problems at university, resulting in tens. (A ten in an European university 1968 is worth more than an A, more over these are pen and paper exams, not multiple choice ones. A minor error in an otherwise correct proof somewhere results in at most nine.
The video is really incredible. I learnt so much, wether about geometry or AlphaGeometry.
Thank you for explaining it, I was very intrigued about how it works but never took the time to get my hands into it.
Font on diagrams too small for mobile viewing
The way you drilled through tbis video, you seem very passionate about mathematics.
Really interesting take. I think you're right. Otoh, getting more mathematical results in an automated way will not decrease the amount of interest in mathematical aptitude, similarly to how AlphaGo actually increased interest in Go. Let's get more people into math!
I think AI in a lot of cases really just lowers the burden of entry and that's ultimately a good thing. Like LLMs are basically just doing the same thing to the humanities that calculators and computers did to science and maths a long time ago, it's taking away a lot of the drudgery to enable people to focus on learning what actually matters. Like calculators didn't result in us forgetting how to do basic arithmatic but it meant that students could move onto more interesting stuff like algebra much quicker and usually in doing so you end up getting better at arithmatic. LLMs do the same thing for writing, they take care of all the basic stuff like grammar and decent sentence structure and in doing so allows a student to move on from struggling with those aspects and instead lets them think more critically about what they're writing. If in this case the only thing the AI accomplishes is making geometry more accessible by being able to understand natural human language then that's a job well done in my book.
I'd like to know if you can get Geometry AIs to "prove" false conjectures by using basic steps so many times a mistake is made.
I think ia could easily create beautiful and fresh hypothesis and conjectures if it had a third agent that actually construct measures and compares, lo fi with error bars, changing the variables, like a physicist playing with a toy model. Then the the other ones would try to prove them
I used to be good at maths, maybe not Olympiad level, but decent. I might have an idea on how these kinds of problems are being solved. I agree with your observation that AI solutions were horrible and I assumed so, I expected this, although this will improve in future better more specialised forms of a math AI. And yes, it might reach a level that it solves these kind of problems equally well as humans do, maybe even a lot faster. To the point people will see it as better than us, that it has surpass us in this. Which might seem true, but is it really? Don't forget, there is no person ever who solves these problems by reading and digesting all human mathematics knowledge or even all geometry knowledge. None. Then, while we use deduction, we can also think outside the box, use unlikely tricks, connect unlikely dots. And we don't mess around with a billion unlikely dots to connect, we kinda know when we do, even if it is completely unlikely and unrelated. People who compete in the math Olympiads are especially good at it. AI can't do that, it sucks at it, and it will never be able to do it. So, isn't human thinking kind of magic in a way? At least, it's definitely not the same at all with the way AI programs work. If AI programs were designed after a model of how we supposedly think, those are quite the differences, don't you find? There is a ton of things yet we do not understand about ourselves. And I seriously doubt those simplistic models on how we think and reason. So yeah, even if I kinda expected horrendous solutions from AI, in the end, I will disagree with you, this is not at all similar to how we think.
By the way, since you mentioned chess, I'm not sure if you play, and sure chess AI engines have seemingly surpassed humans in strength of play, in truth, they are lacking, there are several areas they are lacking, where you need to be a chess player to notice this, or how they even damaged chess, as people despite the fact they know they are lacking, they are supper accurate in areas they are not, in areas they are good at, so yes, people use them so much for analysis, while they don't put as much effort themselves, and it actually harmed chess. Also, the way chess AI engines "think" and analyse, has absolutely nothing to do with how humans do it. In the same manner, the same goes for a math specialised AI, and I gave you a few differences. So what do you think? Do you really stick to your view that we think alike or are there huge differences between humans and AI programs? Not in capabilities, but in the way they operate. Although I could argue for our capabilities as well, AI programs just seem better - they are not. At least, they are vastly different to us.
What can AI do with a theorem that is unprovable (Godel)?
I would say that Sabine is correct to the point where you can say that it's a promising direction that doesn't require all that much innovation going forward. AKA, it's a good framework for future AI. Hell, maybe maths will generalize with enough parameters similar to how language tasks seem to with large enough models?
Anyways, I find AI like AlphaGeometry to be beautiful in themselves like math.
I just subscribed because I like your general take on beauty in maths. Recently I found something beautiful in number theory and wonder if you have any insights.
Start with Euler's quadratic E(n)=n^2-n+p, p=41. The first 40 n's generate only primes which is well-known. Thereafter there's a curious mixture of primes and composites, and the patterns formed by the composites caught my eye. Define a block k s.t. kp
I might know a thing or 2 about LLMs (Large Language Models ), sometimes they'll just hallucinate, an LLM isn't actually processing using reasoning at all but rather just keeps breaking down the problem ( which is fine-tuned to do so ) and reprocessing its whole chain of thoughts hoping it might come to a conclusion, which explains the repetition of steps you've noticed, but neverthless I can't disagree that they are a significant milestone.
Thanks for sharing btw
As a chess player, I can understand your obsession with "beauty", but you have to get over it. In chess, sometimes your "beautiful" sacrifice or opening line is not correct, because a chess engine like Stockfish finds an "ugly" refutation. But sometimes, the "ugly" computer move reveals a brilliant and beautiful idea. Computer chess changed not only the game profoundly, but also its aesthetic. And it won't be any different in geometry, or generally in mathematics, even if the first steps look crude and ugly. It will open a whole new world, whether you like it or not.
i think beauty in math is more than superficial, as it serves to more effectively enlighten the proof reader
But did it change chess for the better? I mean, play is now higher in quality, but is it as exciting, or beautiful, or fun? A lot of GMs seem pretty unsatisfied with the current state of chess at the top level, what with it becoming so drawish and dependent on specific preparation.
Proofs and chess are two very different things; you're comparing apples to oranges. There are objective, mathematically "good" moves in chess, so we have some measure of what moves are "beautiful" - that is what moves win the game. They don't think it's ugly in that it is a bad move, but that it goes against what players are used to - in fact these are very insightful moves, and you appreciate the beauty in the move once you understand the reasoning. However, there is no such thing as an objectively "good" proof. There are only correct and incorrect proofs - otherwise you are given pretty much entirely free reign as what tools you want to use. He is not refusing the idea that AI will not get better (and very much states that it could improve in the future), but that in the current state, it's not giving elegant, insightful proofs (and it is subjective as to what makes a proof insightful or elegant or whatever). Rather, AlphaGeometry is just angle bashing, which is something that pretty much anyone and anything can do.
A more apt comparison would be asking a GM what makes a move good versus asking a brute force algorithm. One could explain to you what moves to look out for, what lines are threats, etc. The other would just throw millions of possible futures in your face and say it just works. What explanation do you think is more beautiful?
Finally, why do you think 3b1b has such a wide audience? Everything and more I can learn from textbooks, but it's almost as if there's a beauty as to how he makes his videos...
@@EebstertheGreat Maybe not, but is that relevant? Chess is a game, math is about understanding reality
I think it's like cleaning. They tend to be emotional responses (too much, ignore or stress, or compulsory). Not bad or good, just there.
Does it generate a formal proof in some theorem proving framework or do we need to check the output by hand?
I should have said -- the steps I display on screen are the output of AlphaGeometry so it is able to output a stream of (usually) followable deductions
@@AnotherRoof Sorry, I probably should have been more exact: is correctness guaranteed by any means? In contrast to chatgpt on maths problems, where it just makes things up.
@ I think so by the nature of the deduction engine basically operating through classical "premise 1, premise 2, conclusion." I know I came off as quite critical in the video but I think this is the cool thing about it, in that it has the wild ChatGPT-esque "ideas" but the inferences are correct
You have to check the output, but the actual deductions are made by the angle chaser which does not make mistakes and proofs aren't that long, so in practice it's not that hard to check and it will probably never make an error
Ok, so my conclusion is that it is designed to be correct, but it doesn't produce any self-validating proof (such as verified steps in a formal theorem proving system) for it. @@yonatanbeer3475
Someone should ask it to square the circle and see what it spits out.
That's not a proof
@@MichaelDarrow-tr1mn
I know, I just want to see what gobblygook it tries to pawn off on us.
The fascinating aspect is that we often can't predict what will become a key driver for future innovation. It might be an issue that currently seems ridiculous, but once it's resolved, along with a few others, it could lay the groundwork for groundbreaking developments.
Can you make a video tutorial on how to use the AlphaGeometry program? That is, how to install it on the computer, how to run it, how to enter the geometry problem data, the drawing and the problem requirements? I simply couldn't find any video explaining how AlphaGeometry can be used!
It seems to me that this concept of an ai model might be powerfull in (homotopy) type theory, I am no expert in that field but intuitively, the syntetic nature of that branche of mathematics give sufficient restriction to the steps the AI model is able to consider. If anyone reading this comment has more experience with type theory, I would love to know your thoughts.
The fact that a lot of HTT is, or can be, written in regular, plain-text syntax could be helpful. I feel like path constructors would be very non-intuitive to the AI though, because almost no other functional programming languages it would be trained on would have quotients as just another construct of the language; in Coq I believe you have very explicit coercions, whereas in Agda, I believe you have to verify your path constructors-internal but not syntactic equalities native to a type-hold after a computation.
You'd need a good corpus of information to train into it the context to understand this, I think, because if you tell it this every time you want to solve a problem, if its work gets long enough, it will forget some of what you originally told it.
22:13
This is a good point, though I think there is no reason to only report on 1st place, considering 10 years back it was more a consensus among regular people that computers could never even attempt the kind of tasks that it is doing at the moment, while everyone knew computers could play chess before it beat Kasparov, because that had already been reported on earlier, it's just that nobody remembers or cares about "the first computer to ever be able to play chess", because who cares
The first AGI, will be historic, something people likely remember for years to come, just like Deep blue, up until then we will see incremental improvements followed by randomly gaining unexpected abilities thought never before possible, until eventually an AGI is as good or better as the average human at everything
AGI doesn't need to be better than every expert, it just needs to be average to break the worlds economic system,
Like, if it takes governments seeing a headline in the paper, new AI virus maker can generate computer viruses (or genetic material for real viruses) that can break the worlds most secure everything, then it is already too late for them to do anything about it
In the case of Deepmind, they could take as long as they liked, and the reporters never had to report anything, they literally could have ignored the story completely because it's "just some silly game"
Chess getting botted just affects chess players, but imagine 90% of all traffic on all social media being bots with human-level intelligence or even slightly below average
It may not be important to mathematicians because they have already all been made obsolete by an AGI that is better and cheaper and more scalable than all mathematical effort on earth, that's also more creative and more reliable, but, I think it's definitely worth the discussion if nothing else, because something definitely is happening, even if the media machine is overhyping it for the sake of ad revenue, something they didn't need to rely on in the past due to subscription service news outlets
So we basically just upgraded the calculator?
0:10 would have been nice of you to mention that later an exploit was found that allowed even amateurs to beat alpha go.
18:45 this was the way I solved self-posed problems when I was young - playing around and brute forcing things because I didn’t have the tools or experience yet. I still do this to an extent today as an undergraduate student (CS and Math double major), where I often have to solve the problems I mess with using some concrete examples to catch most patterns
I feel like there are a lot of interesting problems in math that were first solved by flopping around. And later they are made compact and beautiful. So this might become a very useful tool in a few papers down the line.
FYI: The engine that played chess, AlphaZero, and played go, AlphaGo, only needed the rules of the games as a difference between the two. This was a completely different approach to the previous alpha-beta pruning approach to chess only, and appropriate to all small games. Until this approach was developed for Go, and got beyond top human level in one day, it was thought that Go was a large problem, not a small one, due to the astronomical number of multi-move paths, dozens of orders of magnitude more than chess.
Now, as a combinatorial matter, I have no idea of the complexity level of the various mathematical fields, just that I do not even know the names of the majority of current branches of mathematics. Just in the matter of physics, there are dozens of competing versions of string theory, and no way at present to determine if any of them has additional properties that might be reflected in observed reality, requiring modification of the standard model. Similarly, it's quite clear that humans have not yet exhausted number theory, groups theory, topology, etc. Most humans would not distinguish between Howard Thurston, the magician, and William Thurston, mathematician. So far, baby steps, yet exceeding the realized skills of all but the top 0.1% of humans in mathematics.
I could possibly see the sort of ugly proofs it generates useful as a first step. If we wish to find a beautiful proof for a result, it saves time knowing what the answer is. For example, if we know a result is true, we won't waste time looking for counterexamples.
I mean these are Olympiac problems so they will always have a solution.
@@hedgehog3180 yes, but where I see these things being actually useful is in helping us to solve problems to which we don't already know the answer. These sorts of problems are used to test them.
Yeah, I think these sorts of AI papers are meant to impress average people who don't really know the exact sort of problems it's working on.
I remember back when I was in school doing the olympiad round 1 and round 2 questions, the geometry ones were not very appealing to me because it was just a tedious process of angle chasing and constructing new points and lines until you get the result you were looking for. It's very much on the easy-for-computers side of problems, where there are just a few different options to consider at each step, but you might need to try them a lot of times to get to the result.
That's unlike e.g. the number theory side of things, where there are seemingly endless things you could try, and it takes a bit more of an informed or refined approach to get to the desired result. You have to have more of an idea of what methods will get you closer to proving the result, and that's the sort of thing I think AI should be training to understand.
Maybe that's just me, but I'd be much more impressed if it could solve any of the other olympiad problem types. And even then, that's school level maths, so it's still got a way to go.
They’re just doing the ordinary engineering practice of tackling tractable problems. They could figure out a way to do this with the tools they had, so they did it. In a few months someone will have have another breakthrough which people will also dismiss as trivial in retrospect and so on..
It's literally no different. If we take all the mathematical proofs we know as a set, then the set naturally increases over time. So with or without AI, if we assume that this set will increase between now and the next 1000 years as an example (including proofs in number theory), and if we assume that all the proofs are derived from existing axioms and built up over time from our current understanding, then this space of possibilities is finite. It doesn't matter if it's an AI or us humans doing it. If we assume that new information is derived from existing information via some transformation, then a neural network of a large enough size will be able to find it given long enough time.
@@Jackson_Zheng Someone hasn't heard of Gödel's incompleteness theorem. Also this is literally just a “monkeys on typewriters” argument so it doesn't really have any weight. Sure a Neural Network probably is capable of discovering something if we give it an arbitrary ammount of time and computing power but that's true of random chance as well. The thing that actually matters is whether it can do so using a reasonable ammount of time and a practical ammount of computing power. So far neural networks have only been able to approximate the average skill of humans in some very limited tasks like summerizing information, and come up with novel approaches in some very limited spaces like board games. I'm not saying that isn't impressive or useful but it is very far away from actually discovering useful information.
@@empathogen75 Right and Dogecoin is also supposed to go to the moon in just a month! For all the hype of AIs they have yet to actually find a practical application and haven't actually revolutionized anything other than spam so far.
My goodness im so happy i never ever have to do eucalyptic gneometry ever again. Its so fascinating but so hard.
I'd be interested to see if there are proofs AI can convince us are true, but through a proof we can't yet verify. I think your video got at a very important aspect of proofs: the power of explanation. More impressive than proving something unproven is explaining something unexplained.
My limited understanding of AI suggests that we can set a step limit and employ a backtracking mechanism. It's possible that Alpha Geometry can explore multiple solution paths concurrently and remain within the desired complexity bounds.
i think to refine the solution it would have to find many different solutions and then find the one that takes the least steps, since that has the highest probability of being the most efficient.
I think AI will makes us more intelligent. But we will do more and more based on memory and will forget how to solve problems.
That's what happened with calculators. People forgot how numbers work and how to calculate rapidly.