I think the best way to interpret the Z-score here is 'how much better than the competition were they in their time'. If you use the Z-score to directly compare two teams, you could argue that someone in an amateur league with a better Z-score is a better team than a professional team.
Correct. We use IQ as an example in my classes because it is *always* M = 100 and SD = 15, because it is “normed” to the current population. What this ignores is what’s called the Flynn Effect, which is that IQ is constantly increasing (or we’re getting better at taking tests). So someone with an average IQ now (100) is smarter than someone with that same average IQ score 50 years ago.
Yes there isnt an allowance for the whole league being stronger over time. You can kind of get away with it in this example, but its the 'Babe Ruth vs Modern Baseball or Muhammed Ali vs Modern Boxer' questions that get posed where it falls apart, never mind the 'who is the greatest motorsport driver' questions where the actual demands and rules of the sport change heavily over time.
yeah, the point about somehow getting rid of the comparative/relative nature of the data irritated me pretty much, you cannot turn the information "how many times did they win/lose AGAINST OTHER TEAMS" into a datapoint that gives us information independant of the teams they played against.
@@wobaguk Yeah for a lot of sports that won't work. For some events like swimming, the sport has changed only a little, but times have improved a lot. Evidently, the average professional swimmer today is faster than the best professional swimmers from 50 years ago. So a Z-score comparison wouldn't be very helpful there.
To answer the question at the end, the lowest z-score to win in the end theat i found was 1900 VFL Melbourne team who won from 6th with 24 points (league average of 28). They had a z-score of -0.33072.
Worth noting that in the early days of VFL - all teams qualified for the finals series. Which meant you could theoretically have won the championship even after losing every game in the regular season. Obviously that never happened - but early VFL rules allowed for incredibly low z-scores. The 1901 season completely overhauled the finals system.
A 1-in-25Billion performance! Edit: If there have been 117Billon humans in history, then only 4-5 humans have ever perfected their craft to this level.
I tried to get the percentage of players with lower z-score than Bradman using Wolfram Alpha with CDF[NormalDistribution[0, 1], 6.5] The answer is 1, which I think we can interpret to mean "all of them"
7:45 No. It doesn't remove the issue at all, it just ignores it. It remains true that the same Z factor from different sets may mean totally different things as far as which team is better.
Yeah, it sounds like he has it backwards. You can't compare historical teams directly, but Z scoring lets you compare how they played relative to their competition.
And in the NBA it's virtually impossible because of how dramatically teams can change from one year to another. In other sports you can use an ELO rating idea to compare teams from different times. But the further apart they are in time, the harder it is to draw a sound conclusion.
If Matt Parker isn't already using this to comedically compare F1 driver Lewis Hamilton's career against NASCAR driver Jimmie Johnson's in order to troll as many people as possible, I bet he is after reading this.
As a football data analyst, I can confirm that z-scores are used everywhere, for instance comparing metrics that are measured differently (i.e. goals scored and possession %). They also allow us to create compound metrics which are more reliable and meaningful. Normalizing data without losing the details of how it's distributed (as you would with percentile ranking) is fantastic.
@@therealax6 if your data perfectly follows the normal distribution that is true, but not with real data. For example the top competitor's percentile ranking will always be 100%, no matter how far ahead of the others they are. Z-scores however reflect that gap.
@@matthiasgreen4042 That makes sense! I always assumed you'd just adjust fractiles for discrete data so that they represent the midpoint of a continuous range (e.g., you have four data points: use fractiles .5/4, 1.5/4, 2.5/4 and 3.5/4 instead of 0/3, 1/3, 2/3, 3/3), but that adjustment is probably too inaccurate to adequately capture the edges of the underlying distribution.
I just want to say that I recently have gotten a job tutoring math to fellow college students - something that i would have never even considered before stumbling upon Numberphile! math used to be a subject I lamented, now it's my favorite.
Throughout the years I've come to realise that people don't hate math; they hate math class. Math at general education levels (primary school, secondary school, entry-level courses at university, etc.) is taught in a way that makes it about as uninteresting and uninspiring as a subject can be. Students are taught to pointlessly memorise formulas that they then have to echo back in the correct order without knowing why or what for. Students are almost never exposed to actual problems that let them exercise their problem-solving skills, sadly.
@@therealax6yes, I was very lucky to have parents who were huge math nerds and multiple teachers who made the subject engaging… and even then, I still had huge math anxiety and took computer science instead of math at university to fulfill the requirement.
I am actually quite unsure if you can use the analogy of z-score to portray greatness. An underlying assumption is that the observations are i.i.d. (independent and indentically distributed) and they are clearly not independent. The total number of wins is equal for each season - if one wins, another one looses. And because you compare how much of an outlier you are relatively to the seasonal distribution, you end up just comparing how much your win share is different from the others - while their win distribution are dependent on eachother - which could lead to some weird constructs of high z-scores occuring more often than they actually should. I think a much better estimate is the combinatorical calculation of how likely that win number (with all the other teams win number also happening) is to occur for a season!
I completely agree, using the Z-score of win totals across different sports leagues makes no sense. I don't think it's even very useful to compare across eras of a single sports league as many others have said. The wins aren't independent, the league average win rate is absolutely necessarily 41 wins in an 82 game season, and there is a limit on the standard deviation that's related to the number of games too. So comparing the biggest outlier for wins in the NBA vs the Premier league vs the MLB doesn't really work. Much less comparing Z-scores of wins in these leagues to independent statistics. My back of the napkin math says it's mathematically impossible for an NBA team's "win rate" Z-score to reach the Bradman batting score of 6.5 that's quoted. But that's not to discount that Bradman score as an incredible outlier. Use Z-score to show me how far above league average Jordan's midrange efficiency was and compare that to Curry's insane 3 point shooting and I'll listen, because you can use independent data points.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
Hey, would you be able to further expand on how you would use the combinatorial calculation? What would you define as the 'n' and 'r' for example in the case of the Bulls vs Warriors situation? Thanks!
@@4DINK It is a little bit more complex than that, as we have to calculate the consequence of losing/winning into the other teams schedule as well. I think it is easiest to build based on a permutation tree, where we have to assert some construction rules (e.g. equal amount of total wins and losses, numberwise equal amount of equals and odds, etc...) 🙂
Not comfortable with three aspects of the assumptions. First we take discrete data (not a particularly large dataset) and fit it to a continuous function. It is not possible to win 13.2 games of basketball and there is no possibility of winning 200 games in a season even though the normal distribution says there is a very small but finite probability. Second, the wins could be distributed very differently in different seasons. In one extreme the overall winners could win every game with all other teams winning just above or just below half of their games. In another extreme, the winning team could win by the narrowest of margins (overall points scored and same number of wins as the team that came second. Third, the winning margins etc could be identical but it could be that all teams in the second dataset are ALL either much better or much worse than in the comparison dataset.
Yes, but the normal distribution is a close approximation to the discrete distribution (in this case probably a binomial) given large enough sample size and p not too close to 0 or 1. Agree with the other two points though :)
This approach is used all the time in physics. Theoretical physicists commonly plot in terms of dimensionless parameters to distill the physics without influence from individual parameters
Another interesting wrinkle in all this is that most sports statistics are competitive. Like, The 2015/16 Warriors getting so many wins reduced the number of available wins for all the other teams, because all of them played (and, mostly, lost) games against GSW. This makes it harder to meaningfully compare across eras, because your sense of how strong the competition is is, in part, self-correcting, so just dividing out the standard deviation doesn't necessarily account for it entirely. (For a toy example, consider a tournament where matches are decided by rolling a 6-sided die, but then one year they add a rule where every player adds 4 to their result. The win distribution should have exactly the same standard deviation, but even mediocre players from the +4 year will obviously have a huge advantage in hypothetical head-to-head matches against top players from other years.) This comes up a lot in baseball discussions: Modern pitching has advanced so far that it's pretty questionable whether Babe Ruth could even get to play in the major leagues today, whereas a replacement-level modern hitter sent back to his time would likely be a superstar because they'd trained against better opponents. I don't think any of this discounts Tom's actual point (in fact, I think it supports it) but it's interesting to think about how these statistics are shaped.
I don't think your toy example works. We're not just "just dividing out the standard deviation", since we take off the mean first. This method deals fine with your example, because taking off the mean accounts for the +4.
So, applying this technique, who wins then? The 1994 brazilian world champion football team, Mike Tyson or the Nigerian bobsled team they made that movie about? In a more serious note: Does the quality of what makes a sport change little enough over time that these comparisons can be made? Is Basketball played primarily for 3 pointers the same sport as Basketball played mostly for 2 pointers?
The man compared two basketball teams in two different eras! The 3 points were there in the 1990s! While you are ridiculously trying to dismiss him by talking about three different sports! .. go away!
Dismiss 😅? It's just a question. A reply with yes or with no and maybe some reasoning are both fine. I guess you can also try to dunk (get it?) on other people for internet brownie points too though. Cheers! 🎉
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
You are not accounting for the skill inflation. For example there are many skilled chess players today who could defeat the deep blue thanks to modern preparation methods, while back in the day it has defeated the best chess player of it's time.
Certainly, there are more skilled players of anything today than there were decades ago. This should have been taken into account (if possible to quantify) to draw a fairer comparison
Yes I thought of this, but couldn't think of a way to account for it... could you account for it by points per game per shot attempts? Maybe add in blocks per game per block attempts? Or more generally like VO2 Max of the athletes or their height, body fat?
That’s a separate argument. We’re comparing one championship team’s performance *relatively to others at that time* to another team’s performance *relative to others at their own time*. Someone with an IQ of 100 today is smarter than someone with the same 50 years ago, but they’re both *average* intelligence *for their time*.
yeah, you can argue which team is better in the context of their season. but you can't say which is more likely to win in a match on the grounds of that, for example it is easy to imagine the Chicago bulls data set coming from a wheel chair basket ball league team, identical numbers, but that team would lose badly against the actual Chicago bulls unless by some miracle wheelchairs beat legs. the direct comparison cannot be made between the teams, because the conclusion is a function of the dataset not the teams ability to play. it just means they were better in their time, playing the teams they did :).
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
When you look into salary cap changes, you get a better feel for just how strong MJ was. There was two years where he got paid over 100% of the bulls normal salary cap, and all the other players signed for tiny contracts that were fully in the luxury tax area. If you gave Lebron James or Steph Curry 100% of a teams salary cap, and only put players around them that would sign for bare minimums, they would literally never win a single match.
As an engineer, you'll see some very impressive Z factors from manufacturing lines. Particularly in industries where performance is literally life or death. For example, in aviation engineering, a manufacturer may achieve Z factors of 7 or 8 for a given part made in large quantities. Anything above 6 is equivalent to a process making a faulty part less than one in a million times, hence there's an entire branch of engineering rigour known as Six Sigma.
The biggest problem with this (in my mind) is that the two teams were playing under different sets of rules. The NBA changed rules over this time, so how are you accounting for the different sets of rules? Would the 1990s Bulls performed better under the new rules that promoted offense?
Indeed. It would assist that one could use the same mathematics to compare, say, a basketball team and a javelin thrower, because everything becomes a unitless Z-score
Heck, you could compare a basketball team to a singer (where the number and length of time at number 1 in the charts is used as the metric), or to anything else that you can rank. I'm not sure I'm ok with that, though.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
@@phyphorthat’s a lame way to dismiss the video! Dude compared same sports and same league but different eras. Not any meaningful changes in the rules. I still put my eggs on Bulls though!
This is a big thing in baseball stats - IE OPS+, ERA+ are all similar type metrics but just divide by the average instead of taking the z score. It would be the same ranking if you assume that the in season personal variance is not useful
The Z-Score compares unique data sets assuming the competition level is equal. It doesn’t account for modern teams being more developed from decades of improvement in sports science, conditioning, & play-calling. It might work for 2 teams 5-10 years apart with similar strength leagues but over time most sports drift. At 14:55 he does mention that it doesn’t directly say who would win in a modern head to head game.
I was expecting an ELO model video. I believe this is the best way to compare teams and performance across different leagues, sports, etc. Even adding some modifiers to the model can take into consideration home and away factors, best results against certain opponents and so
I really like the clarification of what you can actually compare using this methodology. And that there are a whole host of unlisted factors which contribute to a win in a head to head matchup. Love the video!
8:45 The z factor definitely does *not* tell you which is better, at least for the usual definition of "better" (aka, team A is better than team B if the former would beat the latter most of the time). It tells you which had the most exceptional result over its peers, which is a very different thing. There is no argument that those two are the same. I'd have a better win-loss ratio, and therefore a larger z factor, playing my nieces at chess than Magnus Carlsen would have against super-grandmasters, but I'm definitely not a better chess player than him. Tom touches on this at 5:50, but concludes by making the most common mistake that happens in mathematical modelling: rushing to jump into a quantitative method rather than asking if it is actually suited to the question at hand. Brushing away the flaws of using this model for this question as being just "opinions" is no good, just because the problem can't be expressed mathematically easily doesn't mean that it's all subjective and there are no right or wrong answers. Now I know that this numberphile and the real aim is to teach uses of the Gaussian distribution rather than deciding which basketball team is better. But it's still ironic that it commits the cardinal sin of modelling and statistics in doing so. Of course he does address this somewhat after the click-baity part of the video, but it's rather doing it all out of order.
So is it “impossible” to state which team is “better” crossing generations? Or do you need multiple statistical points like wins, winning margin, opponent wins, league record average, and potentially more to get a more rounded result?
I’d like to do baseball stuff but comparing Ted Williams’ 1941 season to Mike Trout’s 2015 is drastically different in the way baseball was played and skill level in the game, games played etc.
Just to be especially contrarian a month after my opinion would have been welcome: while I'm no basketball expert, my understanding is that the rule changes since Michael Jordan retired from basketball have also made the defense's job harder than it was when Jordan had to get in to score points. Back when the defense had more options for positioning, he was still doing incredible numbers each game. So you need to factor in the fact that the 2016 Golden State Warriors also had less obstacles to scoring than the Bulls did when Jordan played, meaning if he were playing under 2016 rule sets, he'd likely be scoring more than he was when he was playing professionally.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
He already explained clearly that this is just about one parameter (number of wins in the season) among peers of their period. If you want to level up the complexity, you could add more parameters to consider then get their combined Z scores. For “changing rules” adding complexity alone: you need to establish how you would have gotten that rule-changing factor in a numeric scale… that’s math for some other days. Again, you can’t ignore the bravery of them bringing this topic up while telling others his favorite Premier League team… and the comment with the most upvote proves people went out of their way to be emotional for just, one, parameter’s Z-factor analysis… (that’s in the field of psychology on team sports)
Wait a second... so what happens when the stddev -> 0?? (i.e. all the teams tie, and the final winner is from a league-wide-tie-breaker)? That team, that simply won the tie-breaker, would have A Z-FACTOR THAT EXPLODES.. no..??
Amateur NBA statistical analyst here! This concept is exactly what i use to compare player statistics across eras. I don't like using it for team comparisons, however, because throughout nba history the number of teams in the league has changed a bunch of times, and when theres fewer teams, one particular team's dominance counts against them more than when theres more teams. They play opposing teams more often, and so if theyre really good, opposing teams will naturally lose more of their games. For instance, in the 1967 nba season, the Philadelphia 76ers set an NBA record in winning 84% of their games (68-13). In the 1960s is was infrequent for the best teams in the league to win more than even 60 games, let alone 68. There were only 10 teams in the league at the time, however, so the bottom teams that year, like the Baltimore Bullets, who went 20-61, might seem like an left hand outlier in our distribution, but they played the 76ers 9 times and went 1-8 against them, whereas nba teams today with 30 teams play each other between 2 and 4 times. For this reason you have to exclude the team in question from the distribution. Doing so, the '67 Sixers have a z score of 2.82, compared to the 16 warriors score of 2.5 What i use z-score for is individual player statistics, and doing "inflation adjustments" between eras. I think this works much better because this means there are way way more data points and so you dont run into the problem of one outlier player inflating the standard deviation of the distribution, working against their own chances of producing a high z score
This is very flawed unless you intended to compare the relative percormance compared to their respective fields. To me it sounded like you wanted to compare who would win in a match between the two basketball teams if they were to play against one another.
The 1996/97 bulls was not the Bulls' highest winning season. They won 69 games that season, but 72 games the year prior. Brady's question around 13:10 is quite prescient, because for smaller sample sizes of games, average point differential is a better predictor that team's future win record than their past win/loss record. I'm not sure where the crossing point is (i.e. is it more or less than 82 games). Doesn't this analysis require that each season, individually, is at least approximately normal? There are (well, at least since shortly before the Bull's season in questions) about 30 teams in the NBA, which is right on the cusp of the commonly used threshold to apply a normal approximation, but looking at the histograms around 6:00, these look only very roughly normal. A few other factors that one would want to take into consideration: -teams play more games against other teams in their conference (East or West) than in the other conference, and the 2 conferences are often of unequal strength. I believe both teams were in the stronger conference in their own time, but I don't have a good measure of by how much. I'm actually not sure how this affects their z-score: In some sense, it might be less "surprising" that a particularly good team would emerge from a group with a higher mean, but their performance is also further above the *league* average than their W/L implies. -many teams will deliberately rest late in the season, especially good teams who have secured a playoff spot. But a team that is close to setting a record might continue to play as hard as possible, and in fact I believe the Warriors seemed to do this, and it might have contributed to their comparatively poor performance in the playoffs.
You might want to look into the ELO rating system. Its been in use for over 60 years in rating chess players, and probably will do a better job of comparing across different time periods.
11 หลายเดือนก่อน +1
Let me understand this... On one side you have 29 players with an Elo ranking under 1000 and one with a 2500 Elo, playing a double round robin league. On another side you have 29 players with an Elo ranking between 2400 and 2490 and one with a 2500 Elo, in a similar league. In both cases the Z factor of the 2500 Elo player will be the same? Won't there be much more distance between the 29 players group and the winner in the first case than in the second? PS. If my memory is not wrong, per each 400 Elo points difference, you have about 90% chance to win the game. So, 800 points is 99,01%. 100 points more is about 65%, 50 points is 57%.
nope! lots of dads are not even remotely thought of in a positive light by their primary school aged children. There are even lots of examples in literature.
If you were talking about the actual best team, you would have to team up the 90's team with the 2020's teams, and in that case I think the old teams would lose every match. If you look at a more interesting and much more popular sport, like juggling for instance, almost no-one could even do 7-clubs juggling in the 90's, but now "everyone" does it. I believe this massive improvement is probably occuring in lesser sports like football and basketball as well.
A lot happened in '96, so this all depends on whether Jordan has met and trained with the Cartoon All-Stars and saved the planet from the Monstars already. You'd also have to ban his world-destroying Chaos Dunk, for obvious reasons.
Soooo - my question. Is there a way of more directly comparing teams from different years by trending the z-factors of ALL the teams, to create a lineage from team from era A to team from era B?
Isn't it possible that ALL the other teams in 2001 were really bad, but the same amount of bad as each other? The distribution being tighter doesn't have to mean that the competition was tougher. In fact, I don't see any relation between those two things.
6:00 For someone who’s not into basketball this doesn’t make sense. In the season 96/97 Chicago Bulls won a certain percentage of games, so I have no idea what that graph represents. They either won between 70 and 80% of their games, or they didn’t, so to me there should be just a probability of 1 in one of these bins. Also, how come the probabilities don’t add up to 1. The mathematician guy just said they must add up to 1.
About Brady's last question. This is not the actual actual answer, but let me talk to you about 2004 Pumas (UNAM) in Liga MX (although I think it wasn't called that back then). In those times the Liga used to have a very weird format, in which 18 teams were split into 3 random groups, but everyone would play everyone (even if they were in a different group). Then, the best of 2 of each group and the best 3rd places would go on to the playoffs. This made no sense, but it was the format. As a ressult, Pumas got second place in their group despite being actually 9th overall. They went on to win the playoffs and win the championship, with a Z-factor of -0.1 (yes, that's a "minus" sign on front). EDIT: Ok... I only counted the regular season and not the playoffs. If we add the playoffs the Z-factor goes up to 1.63. But it's a bit unfair as it gives some teams more games than others. If we take teh playoffs into account and measure by (points gained)/(number of games) we get Z=1.07.
However, there cannot be a normal distribution of values whose distribution is fundamentally asymmetric. For example, the height of people. If it had a normal distribution, it would mean that, although very rare, there are people with negative height. The central limit theorem states that standard distributed are the *mean* value of peoples' height measured by different researchers or taken from various reference books. Not the heights of different people in any - even as big as UK - set.
It's also important to note that what you're really saying is that Golden State might beat Chicago if they played a statistically significant number of games. You really can never tell how one particular game is going to go.
So I have often wondered about "adding dimensions" to this process. Like, if you wanted to examine two different but connected measures (say total wins AND points differential), could you just slap the second measure down as an orthogonal axis and glean anything useful from that? And if so, can you just kind of Linear Algebra + MV Calculus your way to more robust conclusions? It naively feels like it should be an option, but I can't even begin to conceive of the methodology you'd use to ... do it.
My half baked thought is how does fitting a standard distribution to the Scottish premier league work - afaik the SPL normally has 2 super teams (Rangers & Celtic) that win almost every game they play and then 'the rest' who are scrapping amongst themselves for the lower positions.
Is there anything invalid about finding the Z-scores for multiple data points and then averaging them to create sort of a "z-index"? I know often times doing things like that has unintended effects lol, but on the surface it seems like it might provide an even more "informed" comparison between the teams.
Could you do the NFL by breaking down the games played into each individual play? For example, could you measure the offense's performance in terms of American Yards gained on the field per play against the Yards prevented by the defense on each play? Obvious pitfalls here is it doesn't account for long drives from one end of the field toward the goal that does *not* result in a score, but I think it would help to more accurately determine the output of the team's performance in term of their athletic ability?
I don't like the 'wins' evaluation method because average wins will always be the same from one season to another, as long as the number of teams in the league and games played per season stays constant. I think it's basically just another way to report the % of games won. I think it would be more useful to compare points scored to the average for that season, and points allowed to the average for that season. That would give relative strength of offense and defense. It still wouldn't allow a direct comparison, but it would show how much stronger each team was relative to the average team in these areas for their particular year. You could compare years this way as well, using league average offense and defense, but if teams stopped playing defense then offenses would look outrageous, and if they stopped shooting then defenses would look unbeatable. I don't know of a way to directly or indirectly compare. Perhaps if you were to take years when there were rule changes and work out how much those impacted the game with the same general pool of players the following few years, then factor that in, maybe grow the difference slowly over time as the game optimizes for the new rule set until a plateau or new rules, then you could standardize the playing field in some way. I don't know. It's complicated.
This method allows to compare competitivity but it cannot answer the question as stated. Because it cannot estimate technicity (or team and personal performances, each time you try, you'll confront their data to some sets of numbers of their adversaries in their era). Each generation its own. Young pele at his peak couldnt shine in today's football the same. However you choose your dataset, choosen numbers are anchored to their generational context. So after such a time gap, it's mostly possible competitors became stronger in their art. Adversity grows, competitivity oscillates (it's also circumstantial), and technicity grows. Thus statistics are super important and powerful on medium periods.
I would like to see an average of the Z factors between several stats, from important like wins to seemingly unimportant like hand size and team average height. Also Pele’s 3.8 is amazing but what about Michael Phelps and his Gold medals?
Sounds to me that to have an ultimate Z-factor for sports you should combine a number of Z-factors in all kinds of statistics which are really key for a sport and work that out in 1 Z-factor.
Acknowledging, as you said yourself, a limitation of this analysis is "based on level of skill at that time," let's just assume that the average level of competition in different times is equal, then maybe you could do the following statistical calculation: Given z1 = 2.06 (Bulls) and z2 = 2.54 (Warriors), what is the probability that Warriors would win over Bulls? Or for another way of looking at it, if the two teams would play 100 games, what's the most likely split of wins between them?
Years ago (early 2000's) I saw an article about this for bat and ball sports batting stats where I believe they normalized the z-factors of cricket and baseball to see who was the best batter/batsman. Bradman and Dimaggio were the answers. The problem was exact methods and measures used mattered so it was possible to rig the answer if they wished. But both Bradman's average score and Dimaggio's hitting streak were mentioned as the only 2 stats unlikely to be matched or bettered within the next 150 years using their methods.
How about Ted Williams .406 batting average in 1941 (143/154 games played)? Closest to that is .394 by Tony Gwynn in 1994 but he only played 110/162 games. Next, .390 by George Brett in 1980 but he similarly only played 117/162 games. A similar batting average to games played as Williams is Rod Carew in 1977 155/161* with a .388. *should be 162 games but MIN and other teams didn’t play a 162 for one reason or another 🤷♂️
My statistics brain is a bit rusty... Does it make a difference that data points for "Win %" are very much not independent? In that, for one value to go up, some one else's value *must* go down? Compared to say, "Total Points" which could be higher or lower for a given team without necessarily affecting another team's outcome.
"The numbers outside the grid indicate how many skyscrapers are visible in that direction" has nothing to do with height - I was wondering if that makes the outside numbers completely meaningless!?
@@maxid87 It's a standard puzzle type but the way it's presented here is really confusing. The number indicates how many absolute increases are in that row or column. Or how many times a new height record is broken from that direction. So when the 6 is outside the grid that should indicate that there are 6 increases in that direction which is forced to be 1 2 3 4 5 6. But in this solve they've put a 6 in the first position so it can't increase any further.
Yeah, putting a 6 inside the grid next to the 6 outside makes it clear they're not actually attempting to solve the puzzle. Also, they have two 2s in that left column, which breaks the "Latin Square" rule..
This doesn't actually measure the actual skill of each player/team, only the comparative skill compared to their respective eras' players/teams. So it's completely meaningless in comparing two players from different eras on who would win.
If no one has corrected Tom yet re. his knowledge of the total regular season games in an NFL season, a couple/few years ago the League decided to add an additional game to the regular season; so, now it's 17 rather than 16 games -- not that that additional data point for the sample size is all that much more statistically significant, of course
it's not, it's really close because the mean is high and the standard deviation is small, so the non-negative effect is negligible, it's just another instance of central limit theorem
The only thing I can think of that has happened in baseball that might have a Z=3.5+ is two grand slam home runs, in the same inning, by the same player off the same pitcher. Fernando Tatis Sr in 1999. Even that isn’t 70 goals in 20 games level.
You can pretty much make any team the best by picking the right data to base the Z-factor on. As my statistics professor said: "Statistics are like a bikini, it shows a lot but hides the most important parts"
Could you check the Z factor of all teams in the league for each season and then measure who had better win records against the rest of their respective leagues' Z factors? That could determine a more answer the question better.
Is Ole Einar Bjørndalen the King of Biathlon? I wonder if there is a calculation of biathletes z-factor per year and since they are individual athletes, whether the evolution of the z-factor for all athletes can allow us to do two things: normalize an athletes performance over time and hence make an actual inter-generational comparison (without the mess of players changing teams). Trouble is, I would have to work with ranks in races not wins...
I think the best way to interpret the Z-score here is 'how much better than the competition were they in their time'.
If you use the Z-score to directly compare two teams, you could argue that someone in an amateur league with a better Z-score is a better team than a professional team.
Correct. We use IQ as an example in my classes because it is *always* M = 100 and SD = 15, because it is “normed” to the current population. What this ignores is what’s called the Flynn Effect, which is that IQ is constantly increasing (or we’re getting better at taking tests). So someone with an average IQ now (100) is smarter than someone with that same average IQ score 50 years ago.
Yes there isnt an allowance for the whole league being stronger over time. You can kind of get away with it in this example, but its the 'Babe Ruth vs Modern Baseball or Muhammed Ali vs Modern Boxer' questions that get posed where it falls apart, never mind the 'who is the greatest motorsport driver' questions where the actual demands and rules of the sport change heavily over time.
yeah, the point about somehow getting rid of the comparative/relative nature of the data irritated me pretty much, you cannot turn the information "how many times did they win/lose AGAINST OTHER TEAMS" into a datapoint that gives us information independant of the teams they played against.
That's what he says in the video.
@@wobaguk Yeah for a lot of sports that won't work. For some events like swimming, the sport has changed only a little, but times have improved a lot. Evidently, the average professional swimmer today is faster than the best professional swimmers from 50 years ago. So a Z-score comparison wouldn't be very helpful there.
To answer the question at the end, the lowest z-score to win in the end theat i found was 1900 VFL Melbourne team who won from 6th with 24 points (league average of 28). They had a z-score of -0.33072.
You got the result for a regular system, where it's just 1st place points = 1 place, not like Belgium/Australia?
4 points for a win, 2 for a draw, 0 for a loss
Worth noting that in the early days of VFL - all teams qualified for the finals series.
Which meant you could theoretically have won the championship even after losing every game in the regular season.
Obviously that never happened - but early VFL rules allowed for incredibly low z-scores.
The 1901 season completely overhauled the finals system.
20:50 Yeah, just a 6.5, no big deal. For those who don't know, this has been cited as the most impressive statistics in all of sports.
Didn't the guy say he'd never seen one higher than 4? Then we end the video with a 6.5? Heck, the whole video maybe should have been about Bradman.
I plot Z-scores every day and all the axes cut off at 4. I've only seen a couple players barely above 4, 6.5 is ridiculous!
@@esotericVideos He never saw higher than 4 for team games won in a season. Bradman data is not a team winning games
I sat through the whole video waiting them to get to Bradman.
He was beyond 6 sigmas away from the mean. There is a GE executive somewhere who just orgasmed talking about this.
That bradman fact at the end. He was just totally god tier. 20:49
A 1-in-25Billion performance!
Edit: If there have been 117Billon humans in history, then only 4-5 humans have ever perfected their craft to this level.
I tried to get the percentage of players with lower z-score than Bradman using Wolfram Alpha with CDF[NormalDistribution[0, 1], 6.5]
The answer is 1, which I think we can interpret to mean "all of them"
7:45 No. It doesn't remove the issue at all, it just ignores it. It remains true that the same Z factor from different sets may mean totally different things as far as which team is better.
Yeah, it sounds like he has it backwards. You can't compare historical teams directly, but Z scoring lets you compare how they played relative to their competition.
And in the NBA it's virtually impossible because of how dramatically teams can change from one year to another. In other sports you can use an ELO rating idea to compare teams from different times. But the further apart they are in time, the harder it is to draw a sound conclusion.
I spent half the video wondering about Don Bradman’s Z-score. Had a feeling it would be huge. Thanks for including that.
Don Bradman at Z=6.5 is insane!
If Matt Parker isn't already using this to comedically compare F1 driver Lewis Hamilton's career against NASCAR driver Jimmie Johnson's in order to troll as many people as possible, I bet he is after reading this.
The Z Factor sounds like a game show where teen contestants face each other in a battle royale-style TikTok dance-off.
As a football data analyst, I can confirm that z-scores are used everywhere, for instance comparing metrics that are measured differently (i.e. goals scored and possession %). They also allow us to create compound metrics which are more reliable and meaningful. Normalizing data without losing the details of how it's distributed (as you would with percentile ranking) is fantastic.
Why would you lose that data with a percentile ranking? A z-score can be trivially converted to a fractile and back, after all.
@@therealax6 if your data perfectly follows the normal distribution that is true, but not with real data. For example the top competitor's percentile ranking will always be 100%, no matter how far ahead of the others they are. Z-scores however reflect that gap.
@@matthiasgreen4042 That makes sense! I always assumed you'd just adjust fractiles for discrete data so that they represent the midpoint of a continuous range (e.g., you have four data points: use fractiles .5/4, 1.5/4, 2.5/4 and 3.5/4 instead of 0/3, 1/3, 2/3, 3/3), but that adjustment is probably too inaccurate to adequately capture the edges of the underlying distribution.
Somewhere, Michael Jordan is taking this video personally...
Don Bradman having a Z Factor of 6.5 is insane
I just want to say that I recently have gotten a job tutoring math to fellow college students - something that i would have never even considered before stumbling upon Numberphile! math used to be a subject I lamented, now it's my favorite.
Throughout the years I've come to realise that people don't hate math; they hate math class. Math at general education levels (primary school, secondary school, entry-level courses at university, etc.) is taught in a way that makes it about as uninteresting and uninspiring as a subject can be. Students are taught to pointlessly memorise formulas that they then have to echo back in the correct order without knowing why or what for.
Students are almost never exposed to actual problems that let them exercise their problem-solving skills, sadly.
Same here!! I love getting opportunities to show people that math isn't so bad.
@@therealax6yes, I was very lucky to have parents who were huge math nerds and multiple teachers who made the subject engaging… and even then, I still had huge math anxiety and took computer science instead of math at university to fulfill the requirement.
I am actually quite unsure if you can use the analogy of z-score to portray greatness. An underlying assumption is that the observations are i.i.d. (independent and indentically distributed) and they are clearly not independent. The total number of wins is equal for each season - if one wins, another one looses. And because you compare how much of an outlier you are relatively to the seasonal distribution, you end up just comparing how much your win share is different from the others - while their win distribution are dependent on eachother - which could lead to some weird constructs of high z-scores occuring more often than they actually should.
I think a much better estimate is the combinatorical calculation of how likely that win number (with all the other teams win number also happening) is to occur for a season!
I completely agree, using the Z-score of win totals across different sports leagues makes no sense. I don't think it's even very useful to compare across eras of a single sports league as many others have said.
The wins aren't independent, the league average win rate is absolutely necessarily 41 wins in an 82 game season, and there is a limit on the standard deviation that's related to the number of games too. So comparing the biggest outlier for wins in the NBA vs the Premier league vs the MLB doesn't really work. Much less comparing Z-scores of wins in these leagues to independent statistics. My back of the napkin math says it's mathematically impossible for an NBA team's "win rate" Z-score to reach the Bradman batting score of 6.5 that's quoted.
But that's not to discount that Bradman score as an incredible outlier. Use Z-score to show me how far above league average Jordan's midrange efficiency was and compare that to Curry's insane 3 point shooting and I'll listen, because you can use independent data points.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
@@B3Bandbro this is a MATH video
Hey, would you be able to further expand on how you would use the combinatorial calculation? What would you define as the 'n' and 'r' for example in the case of the Bulls vs Warriors situation? Thanks!
@@4DINK It is a little bit more complex than that, as we have to calculate the consequence of losing/winning into the other teams schedule as well. I think it is easiest to build based on a permutation tree, where we have to assert some construction rules (e.g. equal amount of total wins and losses, numberwise equal amount of equals and odds, etc...) 🙂
Not comfortable with three aspects of the assumptions. First we take discrete data (not a particularly large dataset) and fit it to a continuous function. It is not possible to win 13.2 games of basketball and there is no possibility of winning 200 games in a season even though the normal distribution says there is a very small but finite probability.
Second, the wins could be distributed very differently in different seasons. In one extreme the overall winners could win every game with all other teams winning just above or just below half of their games. In another extreme, the winning team could win by the narrowest of margins (overall points scored and same number of wins as the team that came second.
Third, the winning margins etc could be identical but it could be that all teams in the second dataset are ALL either much better or much worse than in the comparison dataset.
There are discrete distributions that could be used as well.
@@soloban81 Agreed - though not mentioned in this video
Yes, but the normal distribution is a close approximation to the discrete distribution (in this case probably a binomial) given large enough sample size and p not too close to 0 or 1. Agree with the other two points though :)
Revisit this in 100 years please someone, see what has changed..
It’s cute to see how Tom really knows some (mostly math) stuff but clearly also has no clue about other (mostly sports) stuff. :-)
This approach is used all the time in physics. Theoretical physicists commonly plot in terms of dimensionless parameters to distill the physics without influence from individual parameters
Another interesting wrinkle in all this is that most sports statistics are competitive. Like, The 2015/16 Warriors getting so many wins reduced the number of available wins for all the other teams, because all of them played (and, mostly, lost) games against GSW. This makes it harder to meaningfully compare across eras, because your sense of how strong the competition is is, in part, self-correcting, so just dividing out the standard deviation doesn't necessarily account for it entirely. (For a toy example, consider a tournament where matches are decided by rolling a 6-sided die, but then one year they add a rule where every player adds 4 to their result. The win distribution should have exactly the same standard deviation, but even mediocre players from the +4 year will obviously have a huge advantage in hypothetical head-to-head matches against top players from other years.) This comes up a lot in baseball discussions: Modern pitching has advanced so far that it's pretty questionable whether Babe Ruth could even get to play in the major leagues today, whereas a replacement-level modern hitter sent back to his time would likely be a superstar because they'd trained against better opponents. I don't think any of this discounts Tom's actual point (in fact, I think it supports it) but it's interesting to think about how these statistics are shaped.
I don't think your toy example works.
We're not just "just dividing out the standard deviation", since we take off the mean first. This method deals fine with your example, because taking off the mean accounts for the +4.
So, applying this technique, who wins then? The 1994 brazilian world champion football team, Mike Tyson or the Nigerian bobsled team they made that movie about?
In a more serious note: Does the quality of what makes a sport change little enough over time that these comparisons can be made? Is Basketball played primarily for 3 pointers the same sport as Basketball played mostly for 2 pointers?
Really??? Nigerian bobsled team? Completely wrong continent dude
@@levibernard1838 actually Jamaican, found it after searching. Movie name is Cool Runnings 😅
The man compared two basketball teams in two different eras! The 3 points were there in the 1990s! While you are ridiculously trying to dismiss him by talking about three different sports! .. go away!
Dismiss 😅? It's just a question.
A reply with yes or with no and maybe some reasoning are both fine.
I guess you can also try to dunk (get it?) on other people for internet brownie points too though. Cheers! 🎉
This is why statistics is the best 😄 Great video!
What about the '95/'96 Bulls? That was supposed to be the best year for the Bulls.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
It was an error. He meant to use the 95/96 team.
The usual problem with applying statistical methods to real-world things: It doesn't work well. The real world has too many "other factors".
Very nice. I can understand this math and it is also put into context of "how exceptional" performances were.
Would love to see the Z factors around Wayne Gretzky. Every statistical fact I've read of him seems unreal.
You are not accounting for the skill inflation. For example there are many skilled chess players today who could defeat the deep blue thanks to modern preparation methods, while back in the day it has defeated the best chess player of it's time.
Certainly, there are more skilled players of anything today than there were decades ago. This should have been taken into account (if possible to quantify) to draw a fairer comparison
Yes I thought of this, but couldn't think of a way to account for it... could you account for it by points per game per shot attempts? Maybe add in blocks per game per block attempts? Or more generally like VO2 Max of the athletes or their height, body fat?
That’s a separate argument.
We’re comparing one championship team’s performance *relatively to others at that time* to another team’s performance *relative to others at their own time*.
Someone with an IQ of 100 today is smarter than someone with the same 50 years ago, but they’re both *average* intelligence *for their time*.
The skill inflation favours modern teams anyway, so the answer wouldn't change
Its just a model.
yeah, you can argue which team is better in the context of their season. but you can't say which is more likely to win in a match on the grounds of that, for example it is easy to imagine the Chicago bulls data set coming from a wheel chair basket ball league team, identical numbers, but that team would lose badly against the actual Chicago bulls unless by some miracle wheelchairs beat legs. the direct comparison cannot be made between the teams, because the conclusion is a function of the dataset not the teams ability to play. it just means they were better in their time, playing the teams they did :).
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
The rules were also a bit different back then, so it’s difficult to compare.
Now I kind of want to see Sachin Tendulkar against a bodyline bowler.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
@@B3Bandno meaningful changes in the rules. I still would put my eggs on the bulls’ basket
When you look into salary cap changes, you get a better feel for just how strong MJ was. There was two years where he got paid over 100% of the bulls normal salary cap, and all the other players signed for tiny contracts that were fully in the luxury tax area. If you gave Lebron James or Steph Curry 100% of a teams salary cap, and only put players around them that would sign for bare minimums, they would literally never win a single match.
Assumption is that rules, athleticism, and technical skills have remained the same.
As an engineer, you'll see some very impressive Z factors from manufacturing lines. Particularly in industries where performance is literally life or death. For example, in aviation engineering, a manufacturer may achieve Z factors of 7 or 8 for a given part made in large quantities. Anything above 6 is equivalent to a process making a faulty part less than one in a million times, hence there's an entire branch of engineering rigour known as Six Sigma.
The biggest problem with this (in my mind) is that the two teams were playing under different sets of rules. The NBA changed rules over this time, so how are you accounting for the different sets of rules? Would the 1990s Bulls performed better under the new rules that promoted offense?
Indeed. It would assist that one could use the same mathematics to compare, say, a basketball team and a javelin thrower, because everything becomes a unitless Z-score
Heck, you could compare a basketball team to a singer (where the number and length of time at number 1 in the charts is used as the metric), or to anything else that you can rank. I'm not sure I'm ok with that, though.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
@@phyphorthat’s a lame way to dismiss the video! Dude compared same sports and same league but different eras. Not any meaningful changes in the rules. I still put my eggs on Bulls though!
This is a big thing in baseball stats - IE OPS+, ERA+ are all similar type metrics but just divide by the average instead of taking the z score. It would be the same ranking if you assume that the in season personal variance is not useful
The 1995-96 Bulls, actually.
Both 95-6 and 96-7 Bulls. Easily, too.
The Z-Score compares unique data sets assuming the competition level is equal. It doesn’t account for modern teams being more developed from decades of improvement in sports science, conditioning, & play-calling.
It might work for 2 teams 5-10 years apart with similar strength leagues but over time most sports drift.
At 14:55 he does mention that it doesn’t directly say who would win in a modern head to head game.
I was expecting an ELO model video. I believe this is the best way to compare teams and performance across different leagues, sports, etc. Even adding some modifiers to the model can take into consideration home and away factors, best results against certain opponents and so
I really like the clarification of what you can actually compare using this methodology. And that there are a whole host of unlisted factors which contribute to a win in a head to head matchup. Love the video!
8:45 The z factor definitely does *not* tell you which is better, at least for the usual definition of "better" (aka, team A is better than team B if the former would beat the latter most of the time). It tells you which had the most exceptional result over its peers, which is a very different thing. There is no argument that those two are the same. I'd have a better win-loss ratio, and therefore a larger z factor, playing my nieces at chess than Magnus Carlsen would have against super-grandmasters, but I'm definitely not a better chess player than him.
Tom touches on this at 5:50, but concludes by making the most common mistake that happens in mathematical modelling: rushing to jump into a quantitative method rather than asking if it is actually suited to the question at hand. Brushing away the flaws of using this model for this question as being just "opinions" is no good, just because the problem can't be expressed mathematically easily doesn't mean that it's all subjective and there are no right or wrong answers. Now I know that this numberphile and the real aim is to teach uses of the Gaussian distribution rather than deciding which basketball team is better. But it's still ironic that it commits the cardinal sin of modelling and statistics in doing so. Of course he does address this somewhat after the click-baity part of the video, but it's rather doing it all out of order.
So is it “impossible” to state which team is “better” crossing generations? Or do you need multiple statistical points like wins, winning margin, opponent wins, league record average, and potentially more to get a more rounded result?
I’d like to do baseball stuff but comparing Ted Williams’ 1941 season to Mike Trout’s 2015 is drastically different in the way baseball was played and skill level in the game, games played etc.
Just to be especially contrarian a month after my opinion would have been welcome: while I'm no basketball expert, my understanding is that the rule changes since Michael Jordan retired from basketball have also made the defense's job harder than it was when Jordan had to get in to score points. Back when the defense had more options for positioning, he was still doing incredible numbers each game. So you need to factor in the fact that the 2016 Golden State Warriors also had less obstacles to scoring than the Bulls did when Jordan played, meaning if he were playing under 2016 rule sets, he'd likely be scoring more than he was when he was playing professionally.
You could argue that the strongest team was the one that won the season with the lowest standard deviation.
I think that doesn't account for the strength of the winner. Something like the ratio of Z-score to standard deviation might be better.
Also, the Warriors literally lost the championship that season. They couldn't even win the most important game of the season against their contemporaries. Just saying.
He already explained clearly that this is just about one parameter (number of wins in the season) among peers of their period. If you want to level up the complexity, you could add more parameters to consider then get their combined Z scores. For “changing rules” adding complexity alone: you need to establish how you would have gotten that rule-changing factor in a numeric scale… that’s math for some other days.
Again, you can’t ignore the bravery of them bringing this topic up while telling others his favorite Premier League team… and the comment with the most upvote proves people went out of their way to be emotional for just, one, parameter’s Z-factor analysis… (that’s in the field of psychology on team sports)
The Bulls’ 72-10 season was in 1995-1996 and not 1996-1997.
Wait a second... so what happens when the stddev -> 0??
(i.e. all the teams tie, and the final winner is from a league-wide-tie-breaker)?
That team, that simply won the tie-breaker, would have A Z-FACTOR THAT EXPLODES.. no..??
Amateur NBA statistical analyst here! This concept is exactly what i use to compare player statistics across eras. I don't like using it for team comparisons, however, because throughout nba history the number of teams in the league has changed a bunch of times, and when theres fewer teams, one particular team's dominance counts against them more than when theres more teams. They play opposing teams more often, and so if theyre really good, opposing teams will naturally lose more of their games. For instance, in the 1967 nba season, the Philadelphia 76ers set an NBA record in winning 84% of their games (68-13). In the 1960s is was infrequent for the best teams in the league to win more than even 60 games, let alone 68. There were only 10 teams in the league at the time, however, so the bottom teams that year, like the Baltimore Bullets, who went 20-61, might seem like an left hand outlier in our distribution, but they played the 76ers 9 times and went 1-8 against them, whereas nba teams today with 30 teams play each other between 2 and 4 times. For this reason you have to exclude the team in question from the distribution. Doing so, the '67 Sixers have a z score of 2.82, compared to the 16 warriors score of 2.5
What i use z-score for is individual player statistics, and doing "inflation adjustments" between eras. I think this works much better because this means there are way way more data points and so you dont run into the problem of one outlier player inflating the standard deviation of the distribution, working against their own chances of producing a high z score
This is very flawed unless you intended to compare the relative percormance compared to their respective fields. To me it sounded like you wanted to compare who would win in a match between the two basketball teams if they were to play against one another.
Integrating from minus infinity to x1 makes sense for the mathematical function f(x) but not for the distribution of people's heights.
The 1996/97 bulls was not the Bulls' highest winning season. They won 69 games that season, but 72 games the year prior.
Brady's question around 13:10 is quite prescient, because for smaller sample sizes of games, average point differential is a better predictor that team's future win record than their past win/loss record. I'm not sure where the crossing point is (i.e. is it more or less than 82 games).
Doesn't this analysis require that each season, individually, is at least approximately normal? There are (well, at least since shortly before the Bull's season in questions) about 30 teams in the NBA, which is right on the cusp of the commonly used threshold to apply a normal approximation, but looking at the histograms around 6:00, these look only very roughly normal.
A few other factors that one would want to take into consideration:
-teams play more games against other teams in their conference (East or West) than in the other conference, and the 2 conferences are often of unequal strength. I believe both teams were in the stronger conference in their own time, but I don't have a good measure of by how much. I'm actually not sure how this affects their z-score: In some sense, it might be less "surprising" that a particularly good team would emerge from a group with a higher mean, but their performance is also further above the *league* average than their W/L implies.
-many teams will deliberately rest late in the season, especially good teams who have secured a playoff spot. But a team that is close to setting a record might continue to play as hard as possible, and in fact I believe the Warriors seemed to do this, and it might have contributed to their comparatively poor performance in the playoffs.
You might want to look into the ELO rating system. Its been in use for over 60 years in rating chess players, and probably will do a better job of comparing across different time periods.
Let me understand this... On one side you have 29 players with an Elo ranking under 1000 and one with a 2500 Elo, playing a double round robin league. On another side you have 29 players with an Elo ranking between 2400 and 2490 and one with a 2500 Elo, in a similar league.
In both cases the Z factor of the 2500 Elo player will be the same?
Won't there be much more distance between the 29 players group and the winner in the first case than in the second?
PS. If my memory is not wrong, per each 400 Elo points difference, you have about 90% chance to win the game. So, 800 points is 99,01%. 100 points more is about 65%, 50 points is 57%.
I'd like a tour of the office.
I think the real champions are our dads when we were in primary school
nope! lots of dads are not even remotely thought of in a positive light by their primary school aged children. There are even lots of examples in literature.
If you were talking about the actual best team, you would have to team up the 90's team with the 2020's teams, and in that case I think the old teams would lose every match.
If you look at a more interesting and much more popular sport, like juggling for instance, almost no-one could even do 7-clubs juggling in the 90's, but now "everyone" does it. I believe this massive improvement is probably occuring in lesser sports like football and basketball as well.
Is Steve Kerr playing or coaching then?
now I am interessting how strong Verstappen was in the view of the Z-factor
A lot happened in '96, so this all depends on whether Jordan has met and trained with the Cartoon All-Stars and saved the planet from the Monstars already. You'd also have to ban his world-destroying Chaos Dunk, for obvious reasons.
Normally I have no clue wtf they’re talking about on numberphile. Today however, is intro to statistics stuff. A class I’ve taken 3 times. 😅
I assume that if you simply ranked using win percentage you'd get the same ranking as using the z factor?
Soooo - my question.
Is there a way of more directly comparing teams from different years by trending the z-factors of ALL the teams, to create a lineage from team from era A to team from era B?
Isn't it possible that ALL the other teams in 2001 were really bad, but the same amount of bad as each other? The distribution being tighter doesn't have to mean that the competition was tougher. In fact, I don't see any relation between those two things.
6:00 For someone who’s not into basketball this doesn’t make sense. In the season 96/97 Chicago Bulls won a certain percentage of games, so I have no idea what that graph represents. They either won between 70 and 80% of their games, or they didn’t, so to me there should be just a probability of 1 in one of these bins.
Also, how come the probabilities don’t add up to 1. The mathematician guy just said they must add up to 1.
About Brady's last question. This is not the actual actual answer, but let me talk to you about 2004 Pumas (UNAM) in Liga MX (although I think it wasn't called that back then). In those times the Liga used to have a very weird format, in which 18 teams were split into 3 random groups, but everyone would play everyone (even if they were in a different group). Then, the best of 2 of each group and the best 3rd places would go on to the playoffs. This made no sense, but it was the format. As a ressult, Pumas got second place in their group despite being actually 9th overall. They went on to win the playoffs and win the championship, with a Z-factor of -0.1 (yes, that's a "minus" sign on front).
EDIT: Ok... I only counted the regular season and not the playoffs. If we add the playoffs the Z-factor goes up to 1.63.
But it's a bit unfair as it gives some teams more games than others. If we take teh playoffs into account and measure by (points gained)/(number of games) we get Z=1.07.
However, there cannot be a normal distribution of values whose distribution is fundamentally asymmetric. For example, the height of people. If it had a normal distribution, it would mean that, although very rare, there are people with negative height.
The central limit theorem states that standard distributed are the *mean* value of peoples' height measured by different researchers or taken from various reference books. Not the heights of different people in any - even as big as UK - set.
Goes to show just how impressive it was that Cleveland beat GS that year!
Baseball is the ultimate statistical nerd sport. Sheer number of data points is endless.
Chicago bulls with mj, any time, always. No math needed.
I would love to see F1 comparision. Senna, Shumacher, Prost, Hamilton...
It's also important to note that what you're really saying is that Golden State might beat Chicago if they played a statistically significant number of games. You really can never tell how one particular game is going to go.
So I have often wondered about "adding dimensions" to this process. Like, if you wanted to examine two different but connected measures (say total wins AND points differential), could you just slap the second measure down as an orthogonal axis and glean anything useful from that? And if so, can you just kind of Linear Algebra + MV Calculus your way to more robust conclusions?
It naively feels like it should be an option, but I can't even begin to conceive of the methodology you'd use to ... do it.
My half baked thought is how does fitting a standard distribution to the Scottish premier league work - afaik the SPL normally has 2 super teams (Rangers & Celtic) that win almost every game they play and then 'the rest' who are scrapping amongst themselves for the lower positions.
Please please do a part 2 focusing on various z-scored stats for Messi v. Ronaldo
Can you please share datasets I can use to test this out?
Is there anything invalid about finding the Z-scores for multiple data points and then averaging them to create sort of a "z-index"? I know often times doing things like that has unintended effects lol, but on the surface it seems like it might provide an even more "informed" comparison between the teams.
Depends on if you care about the result being correct. If you just want a result then it's fine
Could you do the NFL by breaking down the games played into each individual play? For example, could you measure the offense's performance in terms of American Yards gained on the field per play against the Yards prevented by the defense on each play? Obvious pitfalls here is it doesn't account for long drives from one end of the field toward the goal that does *not* result in a score, but I think it would help to more accurately determine the output of the team's performance in term of their athletic ability?
I don't like the 'wins' evaluation method because average wins will always be the same from one season to another, as long as the number of teams in the league and games played per season stays constant. I think it's basically just another way to report the % of games won. I think it would be more useful to compare points scored to the average for that season, and points allowed to the average for that season. That would give relative strength of offense and defense. It still wouldn't allow a direct comparison, but it would show how much stronger each team was relative to the average team in these areas for their particular year.
You could compare years this way as well, using league average offense and defense, but if teams stopped playing defense then offenses would look outrageous, and if they stopped shooting then defenses would look unbeatable. I don't know of a way to directly or indirectly compare. Perhaps if you were to take years when there were rule changes and work out how much those impacted the game with the same general pool of players the following few years, then factor that in, maybe grow the difference slowly over time as the game optimizes for the new rule set until a plateau or new rules, then you could standardize the playing field in some way. I don't know. It's complicated.
This method allows to compare competitivity but it cannot answer the question as stated. Because it cannot estimate technicity (or team and personal performances, each time you try, you'll confront their data to some sets of numbers of their adversaries in their era). Each generation its own. Young pele at his peak couldnt shine in today's football the same. However you choose your dataset, choosen numbers are anchored to their generational context. So after such a time gap, it's mostly possible competitors became stronger in their art. Adversity grows, competitivity oscillates (it's also circumstantial), and technicity grows. Thus statistics are super important and powerful on medium periods.
Bradman practiced with a gold ball and a wicket as the bat
Genius
Liverpool's Z score the year they won the league. Nice.
I would like to see an average of the Z factors between several stats, from important like wins to seemingly unimportant like hand size and team average height.
Also Pele’s 3.8 is amazing but what about Michael Phelps and his Gold medals?
Sounds to me that to have an ultimate Z-factor for sports you should combine a number of Z-factors in all kinds of statistics which are really key for a sport and work that out in 1 Z-factor.
Acknowledging, as you said yourself, a limitation of this analysis is "based on level of skill at that time," let's just assume that the average level of competition in different times is equal, then maybe you could do the following statistical calculation: Given z1 = 2.06 (Bulls) and z2 = 2.54 (Warriors), what is the probability that Warriors would win over Bulls? Or for another way of looking at it, if the two teams would play 100 games, what's the most likely split of wins between them?
Where does Joe Dimaggio's 56-game hitting streak fit in the firmament of sports outliers?
Years ago (early 2000's) I saw an article about this for bat and ball sports batting stats where I believe they normalized the z-factors of cricket and baseball to see who was the best batter/batsman. Bradman and Dimaggio were the answers. The problem was exact methods and measures used mattered so it was possible to rig the answer if they wished. But both Bradman's average score and Dimaggio's hitting streak were mentioned as the only 2 stats unlikely to be matched or bettered within the next 150 years using their methods.
How about Ted Williams .406 batting average in 1941 (143/154 games played)? Closest to that is .394 by Tony Gwynn in 1994 but he only played 110/162 games. Next, .390 by George Brett in 1980 but he similarly only played 117/162 games. A similar batting average to games played as Williams is Rod Carew in 1977 155/161* with a .388. *should be 162 games but MIN and other teams didn’t play a 162 for one reason or another 🤷♂️
My statistics brain is a bit rusty... Does it make a difference that data points for "Win %" are very much not independent? In that, for one value to go up, some one else's value *must* go down? Compared to say, "Total Points" which could be higher or lower for a given team without necessarily affecting another team's outcome.
This doesn't count rule changes. I think it depends on what rule set the twi teams have to play with.
That skyscraper solve at the end was dire.
"The numbers outside the grid indicate how many skyscrapers are visible in that direction" has nothing to do with height - I was wondering if that makes the outside numbers completely meaningless!?
@@maxid87 It's a standard puzzle type but the way it's presented here is really confusing. The number indicates how many absolute increases are in that row or column. Or how many times a new height record is broken from that direction. So when the 6 is outside the grid that should indicate that there are 6 increases in that direction which is forced to be 1 2 3 4 5 6. But in this solve they've put a 6 in the first position so it can't increase any further.
Yeah, putting a 6 inside the grid next to the 6 outside makes it clear they're not actually attempting to solve the puzzle. Also, they have two 2s in that left column, which breaks the "Latin Square" rule..
Beautiful as always!
I wonder how Max Verstappen's '23 season looks on the Normal Standard.
Pele mentioned, Brazil mentioned. Let's go!
This is incorrect. The 96/97 Bulls with Jordan are the greatest team to every play the game
This doesn't actually measure the actual skill of each player/team, only the comparative skill compared to their respective eras' players/teams. So it's completely meaningless in comparing two players from different eras on who would win.
If no one has corrected Tom yet re. his knowledge of the total regular season games in an NFL season, a couple/few years ago the League decided to add an additional game to the regular season; so, now it's 17 rather than 16 games -- not that that additional data point for the sample size is all that much more statistically significant, of course
I always wondered how the positive values like human height can be normally distributed.
it's not, it's really close because the mean is high and the standard deviation is small, so the non-negative effect is negligible, it's just another instance of central limit theorem
The only thing I can think of that has happened in baseball that might have a Z=3.5+ is two grand slam home runs, in the same inning, by the same player off the same pitcher. Fernando Tatis Sr in 1999. Even that isn’t 70 goals in 20 games level.
Maths Schmaths - Lovely t-shirt!
As someone who grew up in Chicago-land in the 90's, it's obviously the Bulls.
You can pretty much make any team the best by picking the right data to base the Z-factor on. As my statistics professor said: "Statistics are like a bikini, it shows a lot but hides the most important parts"
In MLB, there 162 games every year. Its said each team wins 54, loses 54. Its what you do with the other 54 that matters.
Could you check the Z factor of all teams in the league for each season and then measure who had better win records against the rest of their respective leagues' Z factors? That could determine a more answer the question better.
Who's better Ninja Turtles or Street Sharks. Figure that out math boy
He ran out of paper so had to scribble on his arms.
@@flickingbollocks5542 On his arms and probably elsewhere too, but I'm not sure I want to know...
Tom, can we set sigma not just as real value. But as Divisor function(perfect number?)
Is Ole Einar Bjørndalen the King of Biathlon? I wonder if there is a calculation of biathletes z-factor per year and since they are individual athletes, whether the evolution of the z-factor for all athletes can allow us to do two things: normalize an athletes performance over time and hence make an actual inter-generational comparison (without the mess of players changing teams). Trouble is, I would have to work with ranks in races not wins...