Are Elo Systems Overrated? Everything you wanted to know about Rating Systems -

dreamingsuntide

มุมมอง 7 650

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ม.ค. 2025

ความคิดเห็น • 38

@dreamingsuntide1 ปีที่แล้ว ⁺²²
So the scope of this video actually changed MASSIVELY when I first conceptualised doing an Elo video about a year or two back; I decided to generalise the topic a bit and enter it into 3b1b's SoME initiative instead. The original idea was to go way, way in depth into Pokémon Unite/League of Legends/etc.'s rating system and try and figure out how they mathematically convert to Elo (basically a deep dive into the Elo quote about how rating systems are algebraically similar/identical in form). I unfortunately lost the data I made for that proposed video, but if there's interest I may revisit this topic eventuallyyy.
Anyways, thanks so much for watching and I hope you all enjoy it! This one definitely took a while.
@SmugLookingBarrel ปีที่แล้ว ⁺¹⁰
Instant like for the title alone, Glick-bait is brilliant. Love your videos.
@MATHsegnale ปีที่แล้ว ⁺³
Really really nice! The part in which you explain the issues of the Elo and the other systems were wonderful! The comment ath 4:03 is legendary!!
@ckq ปีที่แล้ว ⁺¹⁰
I saw this video mentioned by 3b1b and immediately had to watch. My trouble with finding way to approximate true likelihood of winning is that things like Elo are only approximation since the true solution involves having a prior distribution and updating which I don't think has an elegant solution.
@Houshalter ปีที่แล้ว
You can approximate it pretty simply with importance sampling. But you also have to account for player ratings changing over time. Otherwise the ratings could settle to an out of date value and become very hard to budge.
@cube2fox ปีที่แล้ว
Prior should arguably be something like the average of new players in the game, which is easy to obtain if you have an online game.
Rating changing over time: That can also be approximated via online game user statistics.
@noahniederklein8038 25 วันที่ผ่านมา
Great video! Nice explanation and humor throughout
@DoubleATam ปีที่แล้ว ⁺²
The little mention 3b1b made of this video sent me on a heck of a rabbit hole learning about rating systems yesterday. Circling back to this vid, i think it's a little basic, but I'm still pretty grateful! Though the tangent into GXE without properly explaining what a rating estimate is (or comparing it with other Glicko rating estimates like the more common "conservative rating estimate") confused me until I looked into it myself.
As a total amateur, and disregarding ease of implementation, I think ultimately the real draw Elo has over Glicko is the same reason Glicko is "better": Glicko gets more confident about its estimate the more games are played, which leads to some frustration with rating changes being slower to reflect, say, a drastic change in an older player's skill level, or even a new opponent catching up to their level. There's a lot of things that can lead to a situation like that. Besides being frustrating to player psychology and their "personal narratives" (why ratings are best hidden by default), i do think that's a practical concern for Glicko's accuracy.
But also, I think this ties into a more general frustration I have with all current rating systems: they fail to account for context. This presents some clear challenges in games that are more complicated and/or in greater flux than chess:
If the game's rules are patched, often some kind of soft ladder reset is employed to account for that. But if I play a different character from usual, how much of my skill should be presumed to be transferred? What if a game is team-based, and I switch teams? And what about less objectively measurable things, like if I want to experiment with a new strategy, or did a lot of offscreen training? Heck, glicko may prevent an off day from affecting my overall rating, but what if I *want* my matchmaking to reflect today's skill level?
I don't yet know what a solution to that would look like, let alone one that doesn't allow for on-demand smurfing. At least, besides the only reliable fallback: private matches and social agreements.
@cube2fox ปีที่แล้ว
If you have a lot of players and a lot of data you could estimate how various changes in behavior (like different character) affect skill levels (predict win rates). It's all a big pile of statistics. You probably could even use machine learning.
@EugeneOneguine ปีที่แล้ว
Thanks a lot for the video. I'm trying to understand elo in Age of Empires II, funny to see you mention the game and Spirit of the Law !
I would love a video that breaks down the rating deviation and rating volatility formulas, they're too dense for me to understand on my own.
Also, 15:00 must be the only time in the history of the Internet where the word "overrated" has been used to its true meaning
@irakyl ปีที่แล้ว
I clicked because you put Showdown in the thumbnail and was curious to see if you would talk about GXE. I've read the post at 16:38 many times 🙂
@samuelmillar9918 ปีที่แล้ว ⁺¹
One thing to consider when it comes to digital games resetting ELO/rankings is that this often coincides with a patch that changes aspects of the game. I definitely agree on it being used to increase enjoyment (number go up!) but it also makes sense to reset the ratings of players when the mechanics of the game change. Additionally, as another commenter mentioned, it is also often used to try and match players of similar skill levels.
@MrRyanSnyder ปีที่แล้ว ⁺¹
I love your vids and I'm not gonna lie I love seeing that you actually have friends and I mean that in the nicest way possible. "Teacher who spends his spare time when not teaching or grading stuff makes content for video game math crunching" doesnt seem like the dude most likely to hang out with people and I'm glad my assumption was dumb
@dreamingsuntide1 ปีที่แล้ว
LOL i'm not sure whether to take that as an insult or a compliment
@MrRyanSnyder ปีที่แล้ว
@@dreamingsuntide1 The answer is yes 😂 Both nerdy enough to seem endearing and show genuine knowledge, but then flexed on people by showing you're not *that* nerdy :P
@Raymoclaus ปีที่แล้ว
Nice video. I had a bit of trouble mentally following what you're saying at around 11:25 and the data table at the same time because the table's values update well after you've stated its values. I assume during editing you were updating the table to match the chess games in the timelapse, but I'm never going to be able to follow the details of a timelapsed video. This is just a minor nitpick in an overall well edited video, but I would find that particular part easier to understand if the table updated to match your voice over.
@dreamingsuntide1 ปีที่แล้ว
Thanks for watching!!! And yeah that's basically exactly what happened; I editted the table to the timelapse and then never editted both to the audio properly (but did mostly edit the equations to the audio). I guess ideally I probably should've editted all four to the same thing instead of to two separate lines, but by that point I think I was getting lazy 😅
@notapplicable8957 หลายเดือนก่อน
The thing you mentioned about rating resets is also correct. That feels really good for good players and really bad for bad players, because you have to play against people who are way better than you more often than you would otherwise. But it feels like a random reward to really good players, they get to stomp and that resets their schedule of reinforcement for future matches.
I think if people realized this kills player count and makes the game toxic, they'd try to do this less or mask it somehow. (Assuredly they already do, in ways.) But it undeniably keeps loyal (read as, addicted) players engaged long term.
@LowMelaninLife หลายเดือนก่อน
18:00 this is what more online ranked players should understand
@375mnaylor 8 หลายเดือนก่อน
Well done video, clear explanations and good examples. Thank you!
@boas_ ปีที่แล้ว ⁺⁵
3b1b gave you a mention in his video
@dreamingsuntide1 ปีที่แล้ว ⁺⁵
OMG I JUST SAW I’M SO EXCITED
@abhaychandra2624 3 หลายเดือนก่อน
Why in the logistic function, the base is chosed to be 10?
@notapplicable8957 หลายเดือนก่อน
I think one of the reasons people hate rating systems is because when they're coupled with AI, that AI can actually enforce the expected outcome by using extra data that makes those outcomes not just certain but also so that they happen in ways that are infuriating.
I've long suspected that most matches in League of Legends, for instance, are arranged in such a way that if you have gained over a certain amount of ELO recently, you get "checked" so to speak, where the matchmaking puts 2-3 people of 100 less ELO who have lost recently on your team and are about to cross into that same threshold of losing on your team to virtually ensure that you lose. And it probably does so in the most mathematically likely way to make sure average player rating is maintained in the match, by likewise putting several 100 more ELO players on the enemy team that have virtually nothing to gain from winning against your entire team.
You can see how if you had to face these kinds of matches all the time, you'd start to lose your mind and say evil things you otherwise might not say to your teammates and hey, that's exactly what we see happening in the game.
Very likely scenario.
@mindvr ปีที่แล้ว
Okay, now i'm interested in (afaik unpublished) tetris effect SR scoring.
@louispetrik7431 9 หลายเดือนก่อน
Brilliant explanation! :)
@Sadarac152 ปีที่แล้ว ⁺¹
I'm confused... Glicko system has as a "benefit" that volatile players will have more volatile ratings and less volatile players have less volatile ratings? I would say that, in ELO, consistent players will have stable ratings and inconsistent players will have inconsistent ratings, which is the same. I don't see how there is benefit in magnifying this effect, as you could see a volatile player on a hot streak suddenly hit number 1 in the world (conceptually at least). And if you were to apply Glicko into a game where RNG features (Pokemon, TCGs, etc) into the matches themselves, you'd have an overly sensitive rating system to actually be useful.
@guyraveh2712 ปีที่แล้ว
The meaning of the ratings makes intuitive sense, but the linear update rule seems a bit arbitrary. Why not a bayesian update? Perhaps still multiplied by a 'learning rate' (k)?
@xHyperElectric ปีที่แล้ว
3b1b sent me here and I greatly enjoyed this video!
@Fanaro 9 หลายเดือนก่อน
Excellent video.
@UriahDebean 17 วันที่ผ่านมา
The intro sounds like Je te veux by eirk satie lol
@HazhMcMoor ปีที่แล้ว ⁺²
I thought, whether it's in chess or digital games, more than anything rating is for matchmaking. It's widely understood that competitive games are best when you're fighting with someone on your skill level, and rating helps immensely with that.
I mean i can see how it can make a game more addictive, but of course only because it makes a game funner. And in this case player and dev incentives align.
@FaradayMerle-r7s 4 หลายเดือนก่อน
Makenzie Loop
@lmadlsc 26 วันที่ผ่านมา
Chess is for fun. We should make a club and an ELO that would eliminate all the professionals and psychopaths who spend all their time playing Chess. We should keep those who raise a family, have a job and spend a maximum of 4 or 5 hours/week playing Chess. In other words; normal people.
@ErinParent-y5q 3 หลายเดือนก่อน
Loren Fork
@ClemensTemple-f9v 3 หลายเดือนก่อน
Karlee Plaza
@CarriePettigrew-o5i 4 หลายเดือนก่อน
Nina Row

ต่อไป

เล่นอัตโนมัติ

The Elo Rating System for Chess and Beyond