So the scope of this video actually changed MASSIVELY when I first conceptualised doing an Elo video about a year or two back; I decided to generalise the topic a bit and enter it into 3b1b's SoME initiative instead. The original idea was to go way, way in depth into Pokémon Unite/League of Legends/etc.'s rating system and try and figure out how they mathematically convert to Elo (basically a deep dive into the Elo quote about how rating systems are algebraically similar/identical in form). I unfortunately lost the data I made for that proposed video, but if there's interest I may revisit this topic eventuallyyy. Anyways, thanks so much for watching and I hope you all enjoy it! This one definitely took a while.
I saw this video mentioned by 3b1b and immediately had to watch. My trouble with finding way to approximate true likelihood of winning is that things like Elo are only approximation since the true solution involves having a prior distribution and updating which I don't think has an elegant solution.
You can approximate it pretty simply with importance sampling. But you also have to account for player ratings changing over time. Otherwise the ratings could settle to an out of date value and become very hard to budge.
Prior should arguably be something like the average of new players in the game, which is easy to obtain if you have an online game. Rating changing over time: That can also be approximated via online game user statistics.
The little mention 3b1b made of this video sent me on a heck of a rabbit hole learning about rating systems yesterday. Circling back to this vid, i think it's a little basic, but I'm still pretty grateful! Though the tangent into GXE without properly explaining what a rating estimate is (or comparing it with other Glicko rating estimates like the more common "conservative rating estimate") confused me until I looked into it myself. As a total amateur, and disregarding ease of implementation, I think ultimately the real draw Elo has over Glicko is the same reason Glicko is "better": Glicko gets more confident about its estimate the more games are played, which leads to some frustration with rating changes being slower to reflect, say, a drastic change in an older player's skill level, or even a new opponent catching up to their level. There's a lot of things that can lead to a situation like that. Besides being frustrating to player psychology and their "personal narratives" (why ratings are best hidden by default), i do think that's a practical concern for Glicko's accuracy. But also, I think this ties into a more general frustration I have with all current rating systems: they fail to account for context. This presents some clear challenges in games that are more complicated and/or in greater flux than chess: If the game's rules are patched, often some kind of soft ladder reset is employed to account for that. But if I play a different character from usual, how much of my skill should be presumed to be transferred? What if a game is team-based, and I switch teams? And what about less objectively measurable things, like if I want to experiment with a new strategy, or did a lot of offscreen training? Heck, glicko may prevent an off day from affecting my overall rating, but what if I *want* my matchmaking to reflect today's skill level? I don't yet know what a solution to that would look like, let alone one that doesn't allow for on-demand smurfing. At least, besides the only reliable fallback: private matches and social agreements.
If you have a lot of players and a lot of data you could estimate how various changes in behavior (like different character) affect skill levels (predict win rates). It's all a big pile of statistics. You probably could even use machine learning.
One thing to consider when it comes to digital games resetting ELO/rankings is that this often coincides with a patch that changes aspects of the game. I definitely agree on it being used to increase enjoyment (number go up!) but it also makes sense to reset the ratings of players when the mechanics of the game change. Additionally, as another commenter mentioned, it is also often used to try and match players of similar skill levels.
I love your vids and I'm not gonna lie I love seeing that you actually have friends and I mean that in the nicest way possible. "Teacher who spends his spare time when not teaching or grading stuff makes content for video game math crunching" doesnt seem like the dude most likely to hang out with people and I'm glad my assumption was dumb
@@dreamingsuntide1 The answer is yes 😂 Both nerdy enough to seem endearing and show genuine knowledge, but then flexed on people by showing you're not *that* nerdy :P
Thanks a lot for the video. I'm trying to understand elo in Age of Empires II, funny to see you mention the game and Spirit of the Law ! I would love a video that breaks down the rating deviation and rating volatility formulas, they're too dense for me to understand on my own. Also, 15:00 must be the only time in the history of the Internet where the word "overrated" has been used to its true meaning
Nice video. I had a bit of trouble mentally following what you're saying at around 11:25 and the data table at the same time because the table's values update well after you've stated its values. I assume during editing you were updating the table to match the chess games in the timelapse, but I'm never going to be able to follow the details of a timelapsed video. This is just a minor nitpick in an overall well edited video, but I would find that particular part easier to understand if the table updated to match your voice over.
Thanks for watching!!! And yeah that's basically exactly what happened; I editted the table to the timelapse and then never editted both to the audio properly (but did mostly edit the equations to the audio). I guess ideally I probably should've editted all four to the same thing instead of to two separate lines, but by that point I think I was getting lazy 😅
The meaning of the ratings makes intuitive sense, but the linear update rule seems a bit arbitrary. Why not a bayesian update? Perhaps still multiplied by a 'learning rate' (k)?
I thought, whether it's in chess or digital games, more than anything rating is for matchmaking. It's widely understood that competitive games are best when you're fighting with someone on your skill level, and rating helps immensely with that. I mean i can see how it can make a game more addictive, but of course only because it makes a game funner. And in this case player and dev incentives align.
I'm confused... Glicko system has as a "benefit" that volatile players will have more volatile ratings and less volatile players have less volatile ratings? I would say that, in ELO, consistent players will have stable ratings and inconsistent players will have inconsistent ratings, which is the same. I don't see how there is benefit in magnifying this effect, as you could see a volatile player on a hot streak suddenly hit number 1 in the world (conceptually at least). And if you were to apply Glicko into a game where RNG features (Pokemon, TCGs, etc) into the matches themselves, you'd have an overly sensitive rating system to actually be useful.
So the scope of this video actually changed MASSIVELY when I first conceptualised doing an Elo video about a year or two back; I decided to generalise the topic a bit and enter it into 3b1b's SoME initiative instead. The original idea was to go way, way in depth into Pokémon Unite/League of Legends/etc.'s rating system and try and figure out how they mathematically convert to Elo (basically a deep dive into the Elo quote about how rating systems are algebraically similar/identical in form). I unfortunately lost the data I made for that proposed video, but if there's interest I may revisit this topic eventuallyyy.
Anyways, thanks so much for watching and I hope you all enjoy it! This one definitely took a while.
Instant like for the title alone, Glick-bait is brilliant. Love your videos.
I saw this video mentioned by 3b1b and immediately had to watch. My trouble with finding way to approximate true likelihood of winning is that things like Elo are only approximation since the true solution involves having a prior distribution and updating which I don't think has an elegant solution.
You can approximate it pretty simply with importance sampling. But you also have to account for player ratings changing over time. Otherwise the ratings could settle to an out of date value and become very hard to budge.
Prior should arguably be something like the average of new players in the game, which is easy to obtain if you have an online game.
Rating changing over time: That can also be approximated via online game user statistics.
The little mention 3b1b made of this video sent me on a heck of a rabbit hole learning about rating systems yesterday. Circling back to this vid, i think it's a little basic, but I'm still pretty grateful! Though the tangent into GXE without properly explaining what a rating estimate is (or comparing it with other Glicko rating estimates like the more common "conservative rating estimate") confused me until I looked into it myself.
As a total amateur, and disregarding ease of implementation, I think ultimately the real draw Elo has over Glicko is the same reason Glicko is "better": Glicko gets more confident about its estimate the more games are played, which leads to some frustration with rating changes being slower to reflect, say, a drastic change in an older player's skill level, or even a new opponent catching up to their level. There's a lot of things that can lead to a situation like that. Besides being frustrating to player psychology and their "personal narratives" (why ratings are best hidden by default), i do think that's a practical concern for Glicko's accuracy.
But also, I think this ties into a more general frustration I have with all current rating systems: they fail to account for context. This presents some clear challenges in games that are more complicated and/or in greater flux than chess:
If the game's rules are patched, often some kind of soft ladder reset is employed to account for that. But if I play a different character from usual, how much of my skill should be presumed to be transferred? What if a game is team-based, and I switch teams? And what about less objectively measurable things, like if I want to experiment with a new strategy, or did a lot of offscreen training? Heck, glicko may prevent an off day from affecting my overall rating, but what if I *want* my matchmaking to reflect today's skill level?
I don't yet know what a solution to that would look like, let alone one that doesn't allow for on-demand smurfing. At least, besides the only reliable fallback: private matches and social agreements.
If you have a lot of players and a lot of data you could estimate how various changes in behavior (like different character) affect skill levels (predict win rates). It's all a big pile of statistics. You probably could even use machine learning.
One thing to consider when it comes to digital games resetting ELO/rankings is that this often coincides with a patch that changes aspects of the game. I definitely agree on it being used to increase enjoyment (number go up!) but it also makes sense to reset the ratings of players when the mechanics of the game change. Additionally, as another commenter mentioned, it is also often used to try and match players of similar skill levels.
Really really nice! The part in which you explain the issues of the Elo and the other systems were wonderful! The comment ath 4:03 is legendary!!
I love your vids and I'm not gonna lie I love seeing that you actually have friends and I mean that in the nicest way possible. "Teacher who spends his spare time when not teaching or grading stuff makes content for video game math crunching" doesnt seem like the dude most likely to hang out with people and I'm glad my assumption was dumb
LOL i'm not sure whether to take that as an insult or a compliment
@@dreamingsuntide1 The answer is yes 😂 Both nerdy enough to seem endearing and show genuine knowledge, but then flexed on people by showing you're not *that* nerdy :P
18:00 this is what more online ranked players should understand
I clicked because you put Showdown in the thumbnail and was curious to see if you would talk about GXE. I've read the post at 16:38 many times 🙂
3b1b gave you a mention in his video
OMG I JUST SAW I’M SO EXCITED
Thanks a lot for the video. I'm trying to understand elo in Age of Empires II, funny to see you mention the game and Spirit of the Law !
I would love a video that breaks down the rating deviation and rating volatility formulas, they're too dense for me to understand on my own.
Also, 15:00 must be the only time in the history of the Internet where the word "overrated" has been used to its true meaning
Well done video, clear explanations and good examples. Thank you!
Nice video. I had a bit of trouble mentally following what you're saying at around 11:25 and the data table at the same time because the table's values update well after you've stated its values. I assume during editing you were updating the table to match the chess games in the timelapse, but I'm never going to be able to follow the details of a timelapsed video. This is just a minor nitpick in an overall well edited video, but I would find that particular part easier to understand if the table updated to match your voice over.
Thanks for watching!!! And yeah that's basically exactly what happened; I editted the table to the timelapse and then never editted both to the audio properly (but did mostly edit the equations to the audio). I guess ideally I probably should've editted all four to the same thing instead of to two separate lines, but by that point I think I was getting lazy 😅
Why in the logistic function, the base is chosed to be 10?
3b1b sent me here and I greatly enjoyed this video!
Brilliant explanation! :)
Okay, now i'm interested in (afaik unpublished) tetris effect SR scoring.
The meaning of the ratings makes intuitive sense, but the linear update rule seems a bit arbitrary. Why not a bayesian update? Perhaps still multiplied by a 'learning rate' (k)?
I thought, whether it's in chess or digital games, more than anything rating is for matchmaking. It's widely understood that competitive games are best when you're fighting with someone on your skill level, and rating helps immensely with that.
I mean i can see how it can make a game more addictive, but of course only because it makes a game funner. And in this case player and dev incentives align.
Excellent video.
I'm confused... Glicko system has as a "benefit" that volatile players will have more volatile ratings and less volatile players have less volatile ratings? I would say that, in ELO, consistent players will have stable ratings and inconsistent players will have inconsistent ratings, which is the same. I don't see how there is benefit in magnifying this effect, as you could see a volatile player on a hot streak suddenly hit number 1 in the world (conceptually at least). And if you were to apply Glicko into a game where RNG features (Pokemon, TCGs, etc) into the matches themselves, you'd have an overly sensitive rating system to actually be useful.
Makenzie Loop
Loren Fork
Karlee Plaza
Nina Row