Man this is your 1st video (probably not, according to professionalism in result) yet it is very well done, good luck to you and your channel, you deserve it, keep rocking!
This is an amazing video omg. Im pretty sure many math enthusiasts are gonna enjoy this pop up in their YT feed. Would you ever consider doing a video on the Glicko rating system? Both Glicko and Glicko-2.
What an amazing video! clear visual style aids the explanations perfectly. I liked it a lot, and understood everything. I am only left with a question why didn't the original Elo rating based on the normal probability of performance ever catch up? is the assumption not correct? does it converge more slowly? I guess I'm gonna research those question son my own.
Hay, great video. Really enjoyed watching it. I was wondering about some other use cases of elo rating. Like in puzzle chess games where I would presume each puzzle gets a rating of its own and win or lose rating according to result of player getting the puzzle right or not. Or the rating system in the context of competitive programming, like in codeforces. It would be fascinating to see a video going deeper on less known use cases of these systems. thanks again for the video, great job.
Great video, it really showed the fundamentals of probability. If im gonna be honest, I didn't know what odds were since the term is so saturated in gambling lol. Did you use manim for this btw?
Yup, manim and Flash. I also never really knew what odds were before doing this project! That's why I felt the need to include a discussion about them.
Man this is the best video I have seen on the Elo system. I love it. Do you have an explanation on how physicists may have come up with this model from their physics knowledge? It looks like the Fermi function in Fermi-Dirac statistics?
I didn't dive too deep into the history, but I think basically Elo used the Thurstone model originally, then they swapped in a logistic for a normal distribution later. To be clear (I've seen people get confused about this, including me): this does NOT mean that the Zermelo/Bradley-Terry model is equivalent to both players generating numbers from a logistic distribution, and then the bigger number wins. It's just that the Thurstone and BT models are both of the form p = f(R1 - R2), where f is an increasing function from R into [0, 1]. For the Thurstone, f is the CDF of a Gaussian, and for the BT it's a logistic function. I think they basically just swapped in a different f that gave better results. As for what motivated this specific choice of f... I don't know. In this video I motivate it the way Zermelo does in his paper, using this notion of "strengths", but I'm sure there are different angles that could be motivated by statistical physics as you suggest. I suppose you could start by looking at sections 8.3 and 8.4 of Elo's book (Elo 1978). In those sections he also cites (Elo 1966) and (Berkson 1929, 1944). I haven't looked into those. References: - Elo 1978, The Rating of Chess Players Past and Present - Elo 1966, Use of the Standard Sigmoid and Logistic Curves in Pairwise Comparisons (sounds very relevant!) - Berkson 1929, Application of the Logistic Function to Experimental Data - Berksen 1944, Application of the Logistic Function to Bioassay
@@j3m-math It's true that the Zermelo model is not equivalent to both players generating numbers from a logistic distribution with the bigger number winning, but there's something almost as good. The issue is that the difference of two iid logistic random variables is not logistically distributed, but if we could find a (family of) distribution(s) (say D) such that when X ~ D(a), Y ~ D(b) independent, then X - Y ~ Logistic(a-b,1), we'd be all set. The Zermelo model would then be equivalent to player 1 generating X, player 2 generating Y, and the largest number wins. It turns out there is a distribution that works, called the Gumbel distribution. You can go the other way too. For example, Elo's original model has a nice transitivity property; knowing p_ij and p_jk uniquely determines p_ik.
If one where to simulate strengths from the historical Elo and use the algorithm of the modern Elo, what would the results be? In other words it would be interesting to investigate how robust the Elo system is to different models of the true strength data generating process.
Really great video ! Now I'll always wonder to my self does the input variable of my sigmoid Neural Net satisfy the Zermalo model. If yes I'll see each neurone as a battle field between my variable 😂. 19:49 Hurts my math eyes to see a 4 under a square root 😅
Is the option for a tie changes the model? I played a chess game on lichess and the rating of me and the opponent didn't change by the same value. What is different
(15:35 for context) I haven't really thought about Elo inflation, so everything I'm about to say is based on thinking about it while on a 20 minute walk. But... maybe? First of all, to be clear, the value of C definitely does not change "over time" as in over the course of one of the simulations in this video. C is constant during a single simulation since we can calculate it as the difference between the average estimated rating and the average true rating. But of course, what these simulations don't take into account is that true ratings shift as players get better... If everyone got better over time, then yes, I suppose that would cause "inflation", basically measured by the value of C? After all, the Elo system is "self-normalizing" in that it constrains the mean rating to be 1500 (or whatever initial rating you're using). So a 1500 now that everyone's better would not mean the same thing as a 1500 from 10 years ago. Changing player skill is actually one of the things that would be really cool for someone to look at in a follow-up video with more sophisticated simulations. Another thing I don't take into account here when I say average Elo is constant is that in real life, players _leave_. So, imagine after a while all the amateurs get bored and only the grandmasters are still playing. Their Elos won't all magically shift downward, so now, the average Elo would effectively go up from 1500 to, say, 2500. Alternatively, if you win a bunch of games and then bounce, you're basically pulling an Elo heist - hoarding a bunch of points and then disappearing with them. I'm not even sure how to factor that into this whole story.
This has the underling assumption that there is one true strength, which might not always be the case. For example if there are 3 players (p1,p2,p3), where p1 wins against p2, p2 wins against p3, and p3 wins against p1. Then the elo system will not be a good model for it. I believe that most times elo comes to use, is in online games to put fair matches. This example shows it might not be good enough and lacks the nuances to deal with the reality of fair matches.
Absolutely! The property of odds transitivity shows very clearly that this model can't deal with "rock paper scissors" kinds of situations, where there are different "kinds" of good players that non-transitively beat each other. Of course, really that assumption is inherent in the very idea of a rating - the moment you're assigning ratings to people, you're saying player skill is transitive. Looking at how Elo performs with a game like that would be another fun idea to look at in some follow-up simulations.
@@j3m-math I am not so sure that the assumption is inherent to the idea of rating. In the rock paper scissors example, you could say they all have the same rating. But we could also change that. for example, rock wins against paper 10% of times. Now rock is certainly the "better player", however if I understand correctly the current elo system will not diverge even though there seems to be a way you can rate them.
@@darklion13 Well, if Rock, Paper and Scissors all have the same rating, then the probability of any of them beating the other would have to be 50% - assuming that the win probability is some function of the ratings. What I mean is that any system in which P(P1 beats P2) = f(R1, R2) and the Ri are real numbers is doomed to be transitive - at least if you also throw in some kind of assumption of "monotonicity", like f(x, R2) is monotonically increasing in x or something (I haven't worked out the details). I feel like the simplest way to model a game with "non-transitive skill" would be to have a multi-dimensional rating. A really simple example off the top of my head would be to model the game internally as being rock-paper-scissors. Each player's "true rating" is modelled as a probability distribution over the set {Rock, Paper, Scissors}, and when two players play, they choose a strategy from that distribution (this has the added advantage over the Zermelo/Bradley-Terry model of automatically accounting for draws, which is cool). This is really a two-dimensional rating since the sum of the three probabilities has to add to one, so we lose a degree of freedom. Then maybe you could come up with some algorithm to estimate that "2d rating vector" from the outcomes of games. It's an interesting rabbit hole.
It depends on what you want to model! If you dump a bunch of players into a round robin tournament, and all you care about is everyone's final win-loss record, Elo works fine. All of the cycles where p1 beats p2 who beats p3 who beats p1 average out to zero. If you care about the outcomes of specific matchups, Elo simply doesn't carry enough information, although if you ask me it's a good place to start. If there are 3 players in a cycle like that, call them rock, paper, and scissors, and they played a lot of games, all of them would have an even record and the same rating. If rock wins against paper 10% of the time instead of 0%, then rock would have a slightly positive record and a higher Elo rating, while paper would have a lower rating.
Man this is your 1st video (probably not, according to professionalism in result) yet it is very well done, good luck to you and your channel, you deserve it, keep rocking!
This is an amazing video omg. Im pretty sure many math enthusiasts are gonna enjoy this pop up in their YT feed. Would you ever consider doing a video on the Glicko rating system? Both Glicko and Glicko-2.
Nice video!
IIRC modern games often use the glicko rating system insteadof elo. Might be material for a future video
Incredible video! Would be facinating to see how the simulation changes with skill based matchmaking instead of a randomly selected opponent
Great video, I’m really looking forward to see your upcoming videos, keep it up 👍
THIS IS YOUR FIRST VIDEO??? I NEED MORE OF THESE!!
13:09 the formula for transitivity has an extra equal sign i think. Anyway, great explaination !
I loved this video! The fact you concluded by dispelling common misconceptions was very welcome.
Keep up the great work! I love the effort that you put into the video. Very impressive for your first upload!
What an amazing video! clear visual style aids the explanations perfectly. I liked it a lot, and understood everything.
I am only left with a question why didn't the original Elo rating based on the normal probability of performance ever catch up? is the assumption not correct? does it converge more slowly? I guess I'm gonna research those question son my own.
Yeah! I was left humgry for a part 2 covering the gaussian-based model further!
Gonna go big for sure...keep doing it
The effort put into this video amazes me, cant wait for a new video 🙏
Wow, great video.
I would live to see a video from you on Swiss score.
Really good video! would love to see something similar on the Glicko system :D
Really excellent stuff! Subscribed and excited for anything you may make next!
i like the music in this
Hay, great video. Really enjoyed watching it. I was wondering about some other use cases of elo rating. Like in puzzle chess games where I would presume each puzzle gets a rating of its own and win or lose rating according to result of player getting the puzzle right or not. Or the rating system in the context of competitive programming, like in codeforces. It would be fascinating to see a video going deeper on less known use cases of these systems. thanks again for the video, great job.
This video is how I wish I could have learnt the content from my Maths undergrad lectures!
This is the best video on this topic, thank you
Really good stuff, well presented
super high quality video!
Great video, it really showed the fundamentals of probability.
If im gonna be honest, I didn't know what odds were since the term is so saturated in gambling lol.
Did you use manim for this btw?
Yup, manim and Flash. I also never really knew what odds were before doing this project! That's why I felt the need to include a discussion about them.
@@j3m-math Did you voice this yourself?
@@shrekeyes2410 Yup
thanks for the great video! it seems like your target audience is math nerds that like chess. im here for it.
Man this is the best video I have seen on the Elo system. I love it. Do you have an explanation on how physicists may have come up with this model from their physics knowledge? It looks like the Fermi function in Fermi-Dirac statistics?
I didn't dive too deep into the history, but I think basically Elo used the Thurstone model originally, then they swapped in a logistic for a normal distribution later. To be clear (I've seen people get confused about this, including me): this does NOT mean that the Zermelo/Bradley-Terry model is equivalent to both players generating numbers from a logistic distribution, and then the bigger number wins. It's just that the Thurstone and BT models are both of the form p = f(R1 - R2), where f is an increasing function from R into [0, 1]. For the Thurstone, f is the CDF of a Gaussian, and for the BT it's a logistic function. I think they basically just swapped in a different f that gave better results.
As for what motivated this specific choice of f... I don't know. In this video I motivate it the way Zermelo does in his paper, using this notion of "strengths", but I'm sure there are different angles that could be motivated by statistical physics as you suggest. I suppose you could start by looking at sections 8.3 and 8.4 of Elo's book (Elo 1978). In those sections he also cites (Elo 1966) and (Berkson 1929, 1944). I haven't looked into those.
References:
- Elo 1978, The Rating of Chess Players Past and Present
- Elo 1966, Use of the Standard Sigmoid and Logistic Curves in Pairwise Comparisons (sounds very relevant!)
- Berkson 1929, Application of the Logistic Function to Experimental Data
- Berksen 1944, Application of the Logistic Function to Bioassay
@@j3m-math It's true that the Zermelo model is not equivalent to both players generating numbers from a logistic distribution with the bigger number winning, but there's something almost as good. The issue is that the difference of two iid logistic random variables is not logistically distributed, but if we could find a (family of) distribution(s) (say D) such that when X ~ D(a), Y ~ D(b) independent, then X - Y ~ Logistic(a-b,1), we'd be all set. The Zermelo model would then be equivalent to player 1 generating X, player 2 generating Y, and the largest number wins. It turns out there is a distribution that works, called the Gumbel distribution.
You can go the other way too. For example, Elo's original model has a nice transitivity property; knowing p_ij and p_jk uniquely determines p_ik.
If one where to simulate strengths from the historical Elo and use the algorithm of the modern Elo, what would the results be? In other words it would be interesting to investigate how robust the Elo system is to different models of the true strength data generating process.
phenomenal video!
Really good video
Very nice video!
Incredable for a first vid.
Great video! Keep it up!
Congrats, you got a subscriber
The noises they make are crazy
i am really really hoping this gets picked up by The Algorithm
You will be a big soon
fantastic stuff
Really great video !
Now I'll always wonder to my self does the input variable of my sigmoid Neural Net satisfy the Zermalo model. If yes I'll see each neurone as a battle field between my variable 😂.
19:49 Hurts my math eyes to see a 4 under a square root 😅
Hey, did you build this using Manim library, would be really interested to know...
Great content ❤
Is the option for a tie changes the model?
I played a chess game on lichess and the rating of me and the opponent didn't change by the same value. What is different
very cool
does it work like this on faceit in cs2?
Впевнений ти станеш відомим, цікаве відео!
Is this value C changing over time and is this behavior what people refer as elo inflation?
(15:35 for context)
I haven't really thought about Elo inflation, so everything I'm about to say is based on thinking about it while on a 20 minute walk. But... maybe? First of all, to be clear, the value of C definitely does not change "over time" as in over the course of one of the simulations in this video. C is constant during a single simulation since we can calculate it as the difference between the average estimated rating and the average true rating. But of course, what these simulations don't take into account is that true ratings shift as players get better...
If everyone got better over time, then yes, I suppose that would cause "inflation", basically measured by the value of C? After all, the Elo system is "self-normalizing" in that it constrains the mean rating to be 1500 (or whatever initial rating you're using). So a 1500 now that everyone's better would not mean the same thing as a 1500 from 10 years ago. Changing player skill is actually one of the things that would be really cool for someone to look at in a follow-up video with more sophisticated simulations.
Another thing I don't take into account here when I say average Elo is constant is that in real life, players _leave_. So, imagine after a while all the amateurs get bored and only the grandmasters are still playing. Their Elos won't all magically shift downward, so now, the average Elo would effectively go up from 1500 to, say, 2500. Alternatively, if you win a bunch of games and then bounce, you're basically pulling an Elo heist - hoarding a bunch of points and then disappearing with them. I'm not even sure how to factor that into this whole story.
@@j3m-math Thanks for the detailed explanation! That explains a lot. Love the research you put into this
Great stufff
Thank You
This has the underling assumption that there is one true strength, which might not always be the case.
For example if there are 3 players (p1,p2,p3), where p1 wins against p2, p2 wins against p3, and p3 wins against p1. Then the elo system will not be a good model for it.
I believe that most times elo comes to use, is in online games to put fair matches. This example shows it might not be good enough and lacks the nuances to deal with the reality of fair matches.
Absolutely! The property of odds transitivity shows very clearly that this model can't deal with "rock paper scissors" kinds of situations, where there are different "kinds" of good players that non-transitively beat each other. Of course, really that assumption is inherent in the very idea of a rating - the moment you're assigning ratings to people, you're saying player skill is transitive. Looking at how Elo performs with a game like that would be another fun idea to look at in some follow-up simulations.
@@j3m-math I am not so sure that the assumption is inherent to the idea of rating.
In the rock paper scissors example, you could say they all have the same rating. But we could also change that. for example, rock wins against paper 10% of times. Now rock is certainly the "better player", however if I understand correctly the current elo system will not diverge even though there seems to be a way you can rate them.
@@darklion13 Well, if Rock, Paper and Scissors all have the same rating, then the probability of any of them beating the other would have to be 50% - assuming that the win probability is some function of the ratings. What I mean is that any system in which P(P1 beats P2) = f(R1, R2) and the Ri are real numbers is doomed to be transitive - at least if you also throw in some kind of assumption of "monotonicity", like f(x, R2) is monotonically increasing in x or something (I haven't worked out the details).
I feel like the simplest way to model a game with "non-transitive skill" would be to have a multi-dimensional rating. A really simple example off the top of my head would be to model the game internally as being rock-paper-scissors. Each player's "true rating" is modelled as a probability distribution over the set {Rock, Paper, Scissors}, and when two players play, they choose a strategy from that distribution (this has the added advantage over the Zermelo/Bradley-Terry model of automatically accounting for draws, which is cool). This is really a two-dimensional rating since the sum of the three probabilities has to add to one, so we lose a degree of freedom. Then maybe you could come up with some algorithm to estimate that "2d rating vector" from the outcomes of games. It's an interesting rabbit hole.
It depends on what you want to model! If you dump a bunch of players into a round robin tournament, and all you care about is everyone's final win-loss record, Elo works fine. All of the cycles where p1 beats p2 who beats p3 who beats p1 average out to zero. If you care about the outcomes of specific matchups, Elo simply doesn't carry enough information, although if you ask me it's a good place to start.
If there are 3 players in a cycle like that, call them rock, paper, and scissors, and they played a lot of games, all of them would have an even record and the same rating. If rock wins against paper 10% of the time instead of 0%, then rock would have a slightly positive record and a higher Elo rating, while paper would have a lower rating.
are you michael fassbender you sound just like him
Reminds me of La linea
Can anyone send this to Riot Games
Who are you? Why haven't you been shown in my feed till now?
This channel only has one vid, lol
Helllllll yeahhhhhhhh
The proof of convergnce is not that hard... the cost function is obviously convex and so a gradient decent algo should work.
Subscriber 225