I shared this with several of my former professors and they loved it. By far the one who had the funniest reaction was my sociology professor, who said "ensure the false positive rate is less than 0.1%." I asked him why, at which point he smiled this giddy evil smile and said "Then have the cheaters get labeled as mangoes."
@@coruscaregames The blobs eat mangoes. My sociology prof was implying that, upon seeing a cheater eaten alive by other blobs, the cheating would stop because now the risk of being caught cheating was no longer just being called a cheater but a horrible gruesome end.
I love how you take (sometimes boring) statistics concepts and explain them in a way that makes it simple and enjoyable for non-maths guys. Good job, my dude.
@@NorroTaku Well, so do I since I'm a statistician myself. The thing about learning stats, tho, is that while it has some nice applications and ideas to explore, the more dense and theory-heavy subjects are really difficult to grasp and, as such, require a lot of effort to have a ok knowledge, let alone really understand. These subjects, that are brought up in Primer's videos are really cool, but basic. When you go down the rabbit hole about learning a lot of probability, inference, stochastic processes and such, you can get kinda annoyed by the elitist nature of the maths behind stats.
@@aurumble Basically, yes. Stats, as any maths field, is difficult by itself, but it becomes even harder to understand when 99/100 books/papers are written in the most complex and incomprehensible ways possible. To actually understand something about maths (by reading books or papers) you, most often than not, need to already have an understatement of the subject or to be a genius. Unfortunately, maths is written not to be understood but to "wow" the reader about how unbelievably difficult the subject you studied is.
Love this video. I teach a graduate-level class on experimental design, but many of the students have forgotten a lot of the basic statistics by the time they take it. I'm bookmarking this as a potential reminder/class aid. Thanks!
Let's say: if a cheater feels that someone is onto them, they will secretly switch to a fair coin. Then, we're suddenly very deep in the domain of games theory. A simple crossover that shows how quickly mathematical models become complex.
we could model this by a random chance that the cheater get this feeling, since it isnt based on anything that is contained in the game. Then it is again easy to model. But the mathematical model mentioned in the video IS already so complex that it defies the expectation of normal people who havent calculated this through. This is called the fallacy of small probability, or just the fallacy of small numbers.
@@activediamond7894 if this is a process that has a finite variance, it could be modelled just the same as the probability. If it doesnt, then theres no reason to even look for cheaters, since u cant get arbitrage from a process that is purely chaotic.
Given that the parameters are the game are: "The blob who flips heads feels happy," it might be possible that there are actually secret (deviant) tail-lovers amongst the blobs, who routinely cheat to "lose" more than expected.
2:08 right off the bat, there’s an extremely important perspective you’ve shared, which is to prioritize the experience of non cheaters over catching the cheaters. After all, why catch the cheaters if the game is no fun? Some video game developers could take a page out of your book!
I agree with the sentiment, but I think that even this 'low chance of accusing fair players of cheating" is not acceptable. The idea of having a false positive in a system that would most likely have very severe punishments is awful, especially because its very hard to prove that you were banned falsely. Sadly this is what a lot of systems do nowadays, and we've seen some of the consequences of people getting falsely banned and having it overturned, but most of them were big youtubers/streamers etc. Any other person without a big following is fucked if it happens to them.
I think it would have been interesting to show an example where there are very few cheaters. The result of this would be that even though we might accuse less than 5% of the non-cheaters of being cheaters and more than 80% of the cheaters of being cheaters, we’d accuse more non-cheaters than cheaters, which can make it feel like the test is bad.
Also i think philosophically if this were a real game in real world if you get 1 fair player as cheater you defeated completely the purpose of the "sport" labeling the real goat of the game as cheater for a communist greater goal.
"If we accuse one person of cheating who isn't it destroys the sport" Thanks, but I'll take the communist greater goal here. The goal should be to give the legitimate players the best chance possible, not to make sure that no individual legitimate players are ever wrongly accused. Calling thinking about the big picture objective rather than being obsessively concerned with liberal individual rights "communism" is an excellent endorsement of communism because there is literally never certainty and you cannot fully eliminate type 1 (or type 2) errors while also usefully detecting cheaters.
In other words this is a nonsense "philosophy" that would, for example, imply that courts would have no choice but to always assume innocence unless the probably of innocence was *exactly mathematically zero.* So i.e. If we have HD video evidence of a murder, your logic says that we should assume that the evidence is random chance and entirely due to compression artifacts and cosmic rays aligning the pixels just so.
Your philosophy forces a standard of evidence which is best described as "beyond unreasonable doubt." There is a reason absolute certainty is not required to convict in any court anywhere, because there is ALWAYS uncertainty.
@@petersmythe6462 Thanks for making that all up. I was about to go to bed and I needed a fairy tail to relax me communist. Communists always have the best fairy tails things like communism works.
@@RedTail1-1this is a really… odd(?) comment Like yeah everyone wants to be entertained, but this video is almost entirely focused on education. It’s not a textbook or a lecture but it’s still an educational video before it is entertainment, and sure you could argue that it’s much easier to digest because of the format it’s in, but that’s just how education kinda has to be if it wants to be effective.
@@RedTail1-1 he's a middle school at his story so i assume he's just starting to get to know mathematics as what it is in that part of the story You can't enjoy something you don't understand so this blobs is definitely a good headstart into understanding math Once math is understood you can now leave out the blobs and entertain yourself with other stuff that can be seen in math itself which interests you like visual geometry or statistics After all we can't enjoy math if math are just a bunch of set of rules without logic and without a way to spark imagination, curiosity, creativity, and many more which sparks emotion which is technically entertainment (But hey it's not like they're going to teach math if it's just a bunch of rules that doesn't do anything nor show logical reasons of its existence but that besides my point) So in summary we look for different ways of entertainment that is present in different fields of study or maybe just entertainment as a form of headstart to understanding and use that understanding to create your own choice of entertainment that can be found in that field of study Even expert mathematicians sometimes create new math problems that doesn't even exist in real life not because they need it but because it entertain their craving to look for more insight in math which is basically to satisfy their curiosity but not to change something for the better
This is what I call excellent science communication. You clearly explained the topic and used understandable terms. You got yourself a subscriber, and I look forward to more videos.
I was thinking of using stages where getting a certain percentage of heads puts you in a group that tests with significantly more flips. That brings the total number of flips needed down massively over having a whole group flip 50+ times.
I was thinking the same thing and saw this comment, it's like "random checks" they do at airports for suspicious individuals. Granted, the definition of our suspicious varies from the rather racist definition of "suspicious" that people who work security at an airport do.
In this case, maybe it would be good to ask them to flip their coin until they get 5 (or whatever number) tails. Then we check how many flips they needed. Blobs that got unlucky will quickly prove themselves innocent. We'll need to find the right numbers (how many tails do we ask for and which number of flips is our suspicion threshold) in a similar manner as described in the video. We could even simulate which test requires fewer flips overall.
@@dragoncurveenthusiast If an unfair coin has a heads probability of 58%, it's still relatively likely for them to get an unlucky streak. I thought in the video in the last sample they would just make one of the groups small enough that there were too many false positives or not enough true positives, rather than make the cheating odds a lot tougher to catch.
I took AP Stats a couple years ago, and although I did well enough to get college credit, this is really well made and does a great job of explaining statistics. I feel like I learned a couple of things, and got to see things like p-values as less arbitrary numbers. Really cool, you’re a great teacher, thanks for the lesson!
Great video, but I want to point out one thing: the true proportions are also very important. Let's say you have 1000 blobs, and only 10 cheaters. With a 5%/80% test you would get 50 false positive, and 8 true positive. So only ~14% of the accused cheaters are actually cheaters. In the real world big factors for the limits of such tests are the consequences: What happens to a patient with a false negative test? Will he die soon (eg. cancer), or will he just stay two days longer sick at home? And what happens with false positive test results? Will the person go to jail (eg. drug test), or undergo harsh medical treatment (eg. chemotherapy), or will he just take a day off? In praxis these issues are often solved by confirming the results with additional, different tests (in case of the blobs the manipulated coins might have a different weight...), but should still be considered, because these tests again have the same problems, and every additional test costs money and time.
We could consider this a test of the police catching a criminal. If the odds of being caught are high it will act as a deterent so fewer cheaters will actually risk being sent to jail by cheating. But if too many innocent people are sent to jail, the people will rebel. New name of game "Politics"
@@David-sp7gc Thats why he can modify the rates which causes the p-value to be much lower and your confidence rating in catching a cheater. The 80/5 is a generic prediction for the model, but by no means worthy of actual implementation. When they test for cancer, not 1/20 tests are a false positive. Your numbers would be extremely improved if you went to catch 99% of cheaters and accuse less than 1% of fair players. Then, 99% of your predictions will be correct. If you have 1000 fair players, a 14 fair players, the odds of you picking a fair player is extremely low.
I really thought the same as you did! And I first thought he had modified the number of cheater in his 60% exemple (but demonstrating assumption aren't fact was also educative)
That's why the assumed effect size and our acceptable p-values and a-values are really important and require careful thought. For refereeing a game, we'd want to assume a fairly large effect size and we would want a pretty small p-value. The reasoning here is that the test itself serves as a deterrent - you only need to catch the most egregious cheaters and the rest will take warning. You'd also want to assume that in a game setting, players are tested fairly frequently. They're probably 'tested' during every game they play. So when referreeing a game, it might me more effective to measure the false positive rate per tournament or per year (like the way we measure failure rates for contraceptives). Because what we're *really* interested in is "how likely is it for a fair player to be accused of cheating during the entire tournament." In addition, you can risk not catching all the cheaters using only one test, because they'll be tested again and again every match. Even at a 20% true positive rate per match, that amounts to a huge risk throughout an entire tournament.
I leverage significance testing pretty heavily in my day-to-day job, and really loved the way you broke it down here! Particularly appreciate framing statistical power (true positive rate) on the same level as significance (false negative rate), when in most cases the conversation/focus really only extends to significance/p-vals. Going to spend a few minutes scrolling through to purge some of these spambots clogging up the comments, then on to the next vid :)
Isnt there a way to have better results with less flips? Make it conditional? Like after 10 flips observe only the ones with 6 or more heads and then accuse those who had another 7/10. From rest of the group accuse those who had 9/10 in second round.
I was thinking some similar. As the number of flips increases, periodically remove the blobs that are statistically not cheating. The total number of flips (all flips for all blobs combined) would be lower but, since all blobs execute their flips concurrently, the time spent flipping would remain the same. In both methods, there would a group of blobs that flip the same maximum number of times. Using the method in the video, all 1000 blobs would flip that maximum number of times. Doing it in phases would only have a portion of the blobs flipping the maximum number of times. But the maximum number of flips would be the same in both methods and would take the same amount of time to accomplish. I’d be curious to know if the “phases” method would be more accurate, though.
@@robmoffett2700 This is typically how sussing out bad actors works in the real world! Usually to perform verification you need to do a pretty simple step, following that only in the case of likely fraud certain users are passed to a more in depth verification path while others are passed on directly to the service.
I know this is a few years late, but this was a phenomenal demonstration of intro and intermediate concepts of statistics. Absolutely plan on having my students watch this to generally understand the broad statistical test that medical articles/studies use for statistical significance and analysis. Thank you!
I think a good way to expand the simulation a bit would be to have different 'cheater' blobs that cheat at different percentages. Blatant cheaters at 100%, chronic cheaters at 75%, all the way down to situational cheaters at 25%. It's a big reason catching cheaters in games is so difficult; not everyone is always cheating in every scenario, and when they're choosing to cheat is often times super inconsistent. A really skilled player who also cheats is one of the most difficult things to detect, as well!
yeah for they have the skill, knowhow, and reputation to know: what cheating looks like/what legit play looks like, the skill to execute legit play, and the community goodwill to not immediately arise suspicion.
@Test to expand to that setting, it's a bit more complex because you'll need to explain Bayesian statistics - you'll have explain what prior distribution you're assuming for the cheaters is and derive/simulate a posterior distribution for the coin tosses and how exactly to calculate the probability that the player is a cheater. It's an entire topic of its own and really not that straightforward of an extension from the frequentist case, but a framework exists to do that.
You could expand it even farther by adding "scammer", "blatant/subtle", "sore loser" etc blobs to the mix. Make it so that instead of getting a fixed amount of mood improvement or souring per flip, they can make bets. Have a variety of cheaters that use coins that vary from 51% heads all the way up to ones that just blatantly get head 100% of the time. Have blobs that start cheating if they get a streak of bad luck. Have blobs that switch out their coin for a weighted one whenever they bet big. Etc. Then you could also factor in the net loss/gain of their mood over time into the cheating algorithm.
I love the application to real life sciences. Probabilities fill everything we do and it’s a good thing to have a process for determining results from a limited pool of testing. Great video it gives a very intuitive way of understanding probability, and also demonstrates how unintuitive probabilities can be.
okay so I have 3 points to make : 1) I didn't forget that the 75% chance of getting head for a trick coin was an assumption so I feel very proud about that. 2) I was studying binomial and bayesian distributions earlier this year and I would have LOVED to have this video available then because it's 10 times more clear than any lesson I had 3) That's a great video as expected from you, keep up the good work because I fricking love watching them
I wish you'd talked about what happens when most of the population is fair and there's only a handful of cheaters. In that case you end up accusing way more innocent people than guilty ones even if you set the false positive % very low. This is a very poor outcome. Unless you follow up accusations with another, more powerful test examining the coin itself to know for sure if it's fair. The followup test is more disruptive and probably more expensive so the purpose of the first test is to minimize the number of followup tests needed. In such a situation the first priority of the first test should probably be minimizing false negatives. It's inconvenient for the false positives to have to get another test to prove their innocence, so you still want to try and minimize false positives too, but you want to catch as many of the cheaters as possible.
No? We make em drop a coin a coin and record the distribution of it until only few persistent deviations from the rate of normalization remain. Those are the cheaters. 2020 US election stats was exactly that.
i thought the specific problem with that situation was that the cheater effect size was NOT what the test assumed it was, not how many cheaters there were relative to the population size
@@brasilballs yeah that was the issue in the vid, imay is mentioning a variant issue assuming a low percentage of cheaters... essentially it's the issue of reducing false-positives by as much as possible which is a pretty major real-world issue ^^ (false convictions for crimes or unneccessary medical treatments, unfair game bans etc etc)
19:12 F for the blob who lost their coin (bottom, center a bit to the left). He will never prove to anyone that his coin just fell though the floor and his life is ruined by realization of living in a simulation.
If this was a real game, where the players expect fifty percent win rate,this would majorly break the game. Getting accused of cheating is probably allot worse than losing, so every fair player would have to factor in that risk before they played. Depending on how bad it is to be accused, players might even decide to use coins that come up tails more often than heads.
It would be solved by some kind of manual checking of the coin itself on the request. Say, manual checks are too tedious to be done at large scale but efficient for anyone who is too lucky but still not a cheater
I'm in love with idea how you approach the concept of frequentist hypothesis testing, which confuses so many. (from a statistics teacher). I will definitely recommend your video whenever we come to this part of the curriculum.
The title made me laugh, because it reminded me of that truly ridiculous (but very entertaining) speedrunning drama around that minecraft guy, dream. Catching cheaters with math is a useful and very fun skill!
There were people involved with catching speedrunning cheaters that said that he probably was not really cheating (on purpose) because his actions didn't make sense if he was. Regardless, the game was modified (on purpose or by accident). The outcome was way way waaaaaaaayyyyyy too unlikely.
He also misbehaved pretty badly, trying to cast doubts on the statistical methods... With how low the odds were, even if the methods had high uncertainty the game would still be definitely modified
@@didack1419 That's true, you can't really prove evil intent with math - like whether the game was modified on purpose. You can only show _that_ it has been modified.
@@baguettegott3409 The reason it's decently likely that the game was modified by accident was because he already modified the game to make other content, so it's reasonable that he could have left the modification without realising. It's weird that he didn't say that from the beginning, because it was a reasonable excuse. I don't know. I guess he thought the mods were against him or something so they wrote a paper blaming him of cheating. Or he really cheated and thought people would not buy it. Or he was really sure he had the game unmodified, which is naive.
very cool work as always Looking forward to the Bayesian version. Would be cool to be able to estimate exactly what the distribution of cheaters is. Like, what if you don't just get a single type of cheater, but a distribution of them? Perhaps distributed by some Beta-distribution, with most cheaters going close to 50% and very few going for much more audacious near-100%
Bayesians are like vegans of science: they are a minority, but boy are they a vocal one, and seems like all of them gathered in this comment section! Yours sincerely, a frequentist.
It's kind of fun to watch videos like this after hearing all sorts of similar (yet less entertaining) lectures from a friend that's soon going to have a PhD in Biostats. Can't wait to see what you can pull up for the Bayesian method!
I would have love to have you as a teacher. You're just awesome at explaining, this is because you explain the basis and the "easy" stuff and I know how hard it can be for people who knows difficult stuff to forget how the basis can ben hard for novice. You don't, and I admire you for doing so
This is so well put. Can't explain how much I appreciate your ability to slow down and put this into a way anyone can understand. Compare this to creators like Veritasium (as much as I love him) and they're fairly content to speed through complexities, leaving laymen in utter confusion.
One thing you can do to lower the effective number of coin flips is to have dynamic decision making. You can establish a threshold to sort cheaters and non cheaters at every level while allowing uncertain players to continue testing. For example, if someone has the first 6 flips come up tails, you can stop testing because they are most likely not a cheater.
What about a cheater who doesn't always use a cheat coin? Cheats need to balance profit against chance of being caught. Runs of tails would throw algorithms. Just asking
@@tim40gabby25 what you are talking about is effectively changing the cheating effect probability. We aren't testing for cheaters, we are testing for cheating, so as long as they aren't swapping coins in the middle of a test, it doesn't change your approach. If you wanted to monitor cheating in a game in the real world, you would need to adapt your detection strategies to account for subterfuge like the inconsistent use of cheats.
Yeah, you'd need a more sophisticated algorithm to catch players who only cheat occasionally, and eliminating folks due to runs of 'bad luck' often doesn't work in real life scenarios. For a real world example, take Fall Guys, which has had its fair share of cheaters over it's lifespan. (Also using this example because I just lost a final round of Hex-a-Gone in which someone was using a flying cheat so it's fresh in my mind and I'm malding.) It's well known that many of the cheaters in that game often only use cheats if they make it to the final round, meaning there are less people left to report them, and they suffer enough pre-final losses to not look suspicious on paper. Of course, in this scenario where the cheating is obvious to their opponents a reporting system *can* work, but only if it's well implemented and has some protections against false reporting. Like a player is only marked 'sus' once they receive enough reports matching the same parameters, or something like that.
unrelated but i had an eraser shaving on my computer screen in the place of the blue blob's mouth during like around 2:17 so it looked like it was smiling the entire time and for some reason it made me happy :D
I learned more in 10 minutes of this video than in an entire stats class in university. You have a real talent! Thank you for all the amazing content! I have never missed a video and you have taught me so much
What i like most about this is that the various steps of hypothesis testing are not presented as "here's the rules, learn them and apply them" - but rather as a set of designed/engineered choices aimed a a specific goal, and showing what goes wrong when you leave out one of the engineered steps.
I struggle with statistics class and though this video does not directly tackle my questions, I really appreciate the time you take to explain them clearly, especially regarding the false positive ratio. I hope to see more blobs soon
Looking forward to the Bayesian vid - that would be my approach. A nice way to contrast the Bayesian vs frequentist perspective is that.. frequentists condition on the “true” parameter and integrate probabilities over the random data. Bayesians, on the other hand, condition on the data and integrate probabilities over the (assumed) distribution of the unknown parameter. To me, this makes the bayesian approach more attractive, because in reality, we only know the data - not the parameter.
I love this so much! As a game dev I'm desperate for anti-cheat. However gameplay time can prove dedicated people to get really lucky and having 1% false positives means that there's a much higher chance that actual dedicated players get lucky and falsely accused. That's why I believe it shouldn't be as brief as possible, but be constantly looked at. Like a history of average luck over all flips and over the last few flips, if the player has been playing for some time then the test becomes less brief assuming that they'll be more consistent with what they're doing. They won't randomly enable a cheat so theit test is more forgivìng and catches far less cheating scenarios, but if the history sees a sudden spike of luck then that's a sign to be investigated.
in addition to lowering the punishments for false positives, (imagine that content creators will not only be more likely to test positive(regardless of true or false), but also that they will have more runs at the test as they do it for a living) you may also find that you want to make a hidden strike system, and use that in there. some cheaters will cheat for a few runs and then play without them, which will mean that it starts to make it harder to catch them, so instead a trick you could use to catch them is to put strikes on a hidden list, if i get 10 heads by cheating a 100% chance, ill probably not go farther, and will return to normal flipping, to throw you off my scent, so you give me a strike for that, and even though its statistically insignificant for someone with thousands of flips, its still an oddity to be noted. as you perform larger oddities it can add larger sized strikes, until a certain number of points you just say "hey bud its pretty clear that youre cheating and then trying to hide it by playing badly afterwards, come in to our labs and do a game and do as well as you do in those good runs if you wanna not be banned" as an example; 10 wins in a row can be assumed to be about a 1/1024 chance regardless of the game, as long as the chance to win is 50%. if you win 10 times in a row, thats 1 point on your strike file, if you win, i dont know, 15 in a row(1/32768 chance), lets say its a 30 point strike (on top of the fact that 15 flips in a row as heads already will count as 6 points on their record, as it is 10 in a row, and 11, and 12, etc), and lets say that if you get 100 points on your record, then we give you the talk. make each incidence stick around for X flips where X is past the 1% false positive rate, and you will see that even so, the majority of cheaters that try to get around simple probability by playing with cheats very little of the time, they still will proc this method of detection.
You can create a model based on their ELO. Then use a standard distribution relative to the elo to preselect outliers. Then you can run a series of tests on these outliers.
@@nou5440 while that would work, the punishment would also have to be annoying Something like changing the player's name to "I cheated" for an hour, and preventing them from changing the name back for the hour could work
@@lavantant2731 the only thing about that is that it's usually not a flat 50% chance to win or lose. Even with some kind of matchmaking. There are a lot of other factors at play.
I first watched this video before taking a stats class. I now watch it halfway through a stats class and understand it so much more.(first learning about power this week!)
I was just about to suggest a Bayesian approach, and then you said you are going to cover that in the next video. That is good. I'm pretty sure that the best test method won't involve the same number of flips each time. It would look at the trend after each flip and accuse once the probabilities become strongly enough tilted in one direction or another. How far tilted depends on the cost of a wrong guess and the benefit of a right guess.
At the end of the video i was actually shocked that 22 minutes had aldeady passed. You made such a good job at explaining this topic in a very entertaining and instructional way. I learned a lot and you gained a new subscriber!
21:00 Yes, I actually did, which is funny because my brain was also struggling to process the quadrant chart for some reason, but I immediately thought of the assumed cheater coin probability when you said this run would be a real world application of the test. It's the most important variable when setting up the test. IMHO, labling each result quadrant independenly with Ff, Fc, Cf, Cc would be easier to process. This emphasizes the truth with a capital letter and makes it easy to check what result group your looking at
I've literraly had my propability and statistics exam yesterday. It was sooo boring to learn. However i love your videos and how they spark my motivation so i can implement these topics to programming. Thanks :D
My head hurts so much after few flips with those False Positive and False Negative . My mind went blank. Have watch again, and again and again. . . To infinity
I think the best solution is to compromise by having a higher false positive test, and then do another, more rigorous test with the smaller group of accused blobs. That was you can end up with a lower number of false positives overall, but still be expedient
Medical tests will sometimes do this - and because of the different composition of the resulting group of accused blobs, even just running the test a second time can yield pretty good results.
If this was a real world example, I'd expect the cheaters' coin's bias rate to vary from person to person. I wonder how much more complex that'd make the math. Thanks for the video.
14:36 Once a player gets 16 Heads or 8 Tails, there is no need to continue to the full 23 flips. 16 Heads = Cheater, 8 Tails = Fair. * I'm commenting as I watch = you might address this in a second...
I did actually remember the 75% assumption. I was thinking the final test would be to give a more realistic distribution of how cheating works, i.e. not the same for every blob. Of course it still averages out, but the ones cheating 'less' would be harder to catch.
8:00 I think the term you're looking for to describe the false positive rate is specificity - in statistics the specificity of a test is the probability that a test negative case represents a true negative, its inverse would be the false positive rate as defined here (this example test has a specificity of 97.6%)
I think the main outcome is that it is very difficult to catch someone cheating using statistics. You should use statistics to raise suspicions, but you shouldn't call them a cheater without further evidence. 5% or even 1% is just way too high of a false positive rate for 1000 players.
Ok this is amazing...the assumption of cheaters having 75% chance was the most fair. 50% is the norm and 100% would be too stupid from the cheaters, so taking the middle ground (75%) would be smart. Yet that assumption still lead 80% of the cheaters to get away
And yet, one COULD argue that the test still accomplishes its goal! Kinda. As a cheater, you want to get as many wins from your coin as you can. BUT, if you're caught cheating, you can't play anymore, and thus cannot win. Now, assuming universal testing, and because the test is based directly off of how many "wins" you get, the surest way to play and win, is to make sure you don't win TOO MUCH, to avoid being caught. This introduces a sort of "soft-cap" to how much cheating anyone can actually do. So, although more cheaters get through, they still have a notably deflated win-rate, meaning that cheating is less rewarding, even though they aren't caught. Now, is it still a favorable ratio of "Cheated Wins" being prevented? IE, cheaters caught and thus annulling all of their dishonest wins, versus the cheaters who got in anyway but the number of dishonest wins they had to give up to do it? I dunno, sounds like a LOT of math to me! I'll pass on that, but still a fun factor to consider.
I just realized. We dont have to technically make assumptions for rates of cheater coins to make an accurate test that fits our criteria. You could just make a test that tests for fair players since we know the rate at which a non cheating blob will get tails. Namely 50% of the time. So instead you could just design a test that labels a blob as fair more then 80% of the time, and only labels cheaters as fair less then 5% of the time. I mean, it would require more data then testing for cheaters, but you could get something fairly accurate.
This is in fact the most commonly used form of frequentist tests, where you only check against the “null hypothesis” and nothing else is explicitly used in the maths. Implicitly, there is still of course some true effect size, and your test has some power to detect it, depending on your chosen sample size. If you care, you can calibrate that sample size to detect effects of at least some given size. In practice, especially before people rang alarm bells about the “replication crisis”, you sometimes saw little care for sample sizes, beyond convenience or custom, as if the researchers were unaware how this limited their results’ possibility and accuracy.
Well it just depends on your starting point. You could start with the assumption that you're going to take 40 coin flips, and then calculate the power of your test at p=0.05. Or you can start with an effect size and a p-value and determine the sample size you need. Or, you can take a sample size and an effect size and calculate a p-value.
Unfortunately you still have to make the same assumptions about the effect size (probability of cheater coins getting heads) even with this test, as you still need to provide an assumed effect size to determine whether you test is labeling cheaters as fair 5% of the time. In fact, as the shape of the frequency distribution of cheater coins getting heads is dependent upon the effect size, and the probability of a given test not catching the cheater is dependent upon this distribution, it is not possible to provide the probability of not catching a cheater without first using some value for effect size to generate the frequencies of cheater coins getting heads.
i love how everyone says that the coin flipping game is simple and dull, completely forgetting that we were all obsessed with bottle flipping six years ago.
I love your careful explanation of the false positive rate and how you couched it in your experience of mixing up the terminology. A great tool for getting us to pay attention and notice the nuance without making us feel stupid!! Absolutely superb teaching!
Not me thinking this was gonna be an in depth animated analysis of how to find out if ur girl is cheating using math. This video was great but someone should make that video as well
I've just watcher 3Blue1Brown's video about "probability of probabilities" and I think this concept would fit very well for this problem. We can plot a probability density function of "coin's head probability". And then a player will flip the coin until we are satisfied with the graph and comfortable to label them.
Cheating in the cases of online games that don't rely on chance are far easier to catch than those that rely on chance. But because it doesn't boil down to how data skews, there's a lot more ways to hide it
Had a lot of fun with this one! I just took AP Stats this year so this was a nice refresher to what I learned there. I might be wrong on this (and if someone who knows more about stats than a wee high schooler can correct me in that case), would it be better to design a Chi-Squared GOF test for this? Since we don’t know the probability of heads on a biased coin, I feel like it might be slightly more effective even if a tiny bit more complex. Each blob would need to flip at least five times (although if this is being done on a calculator, it should be an even number since decimals make them grumpy for some reason), and we make the expected values half that number. That was my first thought when Primer asked us to make a test, and I know they can be super easy if done on a calculator. I could also be really stupid and that’s what the next video will cover 😂
If i recall correctly, chisq GoF tests are primarily used to determine whether or not our experimental results differ from our theoretical results (and whether it is reasonable then to suggest that an alternative hypothesis is true) In this problem, we already know that there are cheaters abound in our experiment, so our results should, no matter what, be skewed towards having a higher number of heads than tails.
@@derpsalot6853 That’s a good point! GOF might not work then, especially since I think it requires a list with more than one value, which obviously doesn’t work in this case. Chi-Squared could possibly still be used if a single contribution is calculated for a single blob and compare that to known critical values, but that needs a chart with all those written, and I doubt the tester in question would feel like lugging that around!
I love the idea that these blobs are dull enough to be entertained by flipping coins, but clever enough to cheat at it.
why would coin flips be dull?
@@NoNameAtAll2 It depends on how many repetions and individual tolerance.
Every time they get heads they eat 1 mango
You should meet the Australian blobs.
@@pablopereyra7126 Damn mangoers, inserting themselves into every conversation. I bet you like those simple, modern houses too, simpleton.
Imagine being the 3% of cheaters that had a weighted coin and still got only tails
skill issue
luck issue
NelsonMuntzhaha.jpg
get a better gaming chair
@@ozan____ *get (a) better (gaming c)hair
Am I dreaming? A new Primer video? Can’t be.
Ikr
Literally I thought I looked at the notification wrong
in 1 year another one releases
I had to double take when I saw primer in my notifications
It’s true. All of it. The blobs. The doves. It’s all true.
I shared this with several of my former professors and they loved it.
By far the one who had the funniest reaction was my sociology professor, who said "ensure the false positive rate is less than 0.1%." I asked him why, at which point he smiled this giddy evil smile and said "Then have the cheaters get labeled as mangoes."
I don't get it
Cannibalism!
in other videos on this channel the blobs eat mangoes 🥭@@coruscaregames
@@coruscaregames The blobs eat mangoes. My sociology prof was implying that, upon seeing a cheater eaten alive by other blobs, the cheating would stop because now the risk of being caught cheating was no longer just being called a cheater but a horrible gruesome end.
Jesus Christ...
the blobs have really evolved - from effectively wild creatures to gambling addicts
That’s the holy path for everyone who plays gacha games
xqc viewers be like
They are evolving, but backwards
Gambling with their feelings
Can't wait for the monkey/grape prostitution analysis.
I love how you take (sometimes boring) statistics concepts and explain them in a way that makes it simple and enjoyable for non-maths guys.
Good job, my dude.
This would have really helped in ap stats last year lol.
I find statistics pretty interesting on its own
p hacking and all that is pretty fascinating
@@NorroTaku Well, so do I since I'm a statistician myself. The thing about learning stats, tho, is that while it has some nice applications and ideas to explore, the more dense and theory-heavy subjects are really difficult to grasp and, as such, require a lot of effort to have a ok knowledge, let alone really understand.
These subjects, that are brought up in Primer's videos are really cool, but basic. When you go down the rabbit hole about learning a lot of probability, inference, stochastic processes and such, you can get kinda annoyed by the elitist nature of the maths behind stats.
@@matheusborges7944 by elitist do you mean that people keep stats intentionally convoluted in a gatekeeping way or something else?
@@aurumble Basically, yes. Stats, as any maths field, is difficult by itself, but it becomes even harder to understand when 99/100 books/papers are written in the most complex and incomprehensible ways possible.
To actually understand something about maths (by reading books or papers) you, most often than not, need to already have an understatement of the subject or to be a genius.
Unfortunately, maths is written not to be understood but to "wow" the reader about how unbelievably difficult the subject you studied is.
Love this video. I teach a graduate-level class on experimental design, but many of the students have forgotten a lot of the basic statistics by the time they take it. I'm bookmarking this as a potential reminder/class aid. Thanks!
I feel the pain. I'm starting a master in a few months so I'm trying to relearn all the forgotten info from the last 4 years.
Almost like statistics and probability is dumpster information.
I love this blob society. They only have one cop, and that cop’s only job is to find people who cheat at coin-flipping.
Let's say: if a cheater feels that someone is onto them, they will secretly switch to a fair coin.
Then, we're suddenly very deep in the domain of games theory.
A simple crossover that shows how quickly mathematical models become complex.
we could model this by a random chance that the cheater get this feeling, since it isnt based on anything that is contained in the game. Then it is again easy to model. But the mathematical model mentioned in the video IS already so complex that it defies the expectation of normal people who havent calculated this through. This is called the fallacy of small probability, or just the fallacy of small numbers.
what if the cheater has an army of fans that defend him or he admits to cheating
@@lilyliao9521 HAHAHA
May I propose an even simpler case that would complicated this; The weighted coins may change the odds of a heads anywhere within the range [0, 1].
@@activediamond7894 if this is a process that has a finite variance, it could be modelled just the same as the probability. If it doesnt, then theres no reason to even look for cheaters, since u cant get arbitrage from a process that is purely chaotic.
My heart aches for those wrongly accused blobs
Chuckled
@LoveSweetDreams ok then let's hear your better idea.
@@zakir2815 what? They just asked if thats how the system works. They said no critique of it.
That one blob that got 23 heads out of 23 flips but wasn't a cheater
@LoveThemDames yeah basically
Given that the parameters are the game are: "The blob who flips heads feels happy," it might be possible that there are actually secret (deviant) tail-lovers amongst the blobs, who routinely cheat to "lose" more than expected.
blobdsm
Blobdsm
Blobdsm
BlobDSM: flip me harder
@@anastasiaklyuch2746 i regret so much
My Chance of getting Head is 0
Hope you get the best head one day my guy 🙏🏿
same bro
Damn 😭🙏
They (not) like us fr
💀
2:08 right off the bat, there’s an extremely important perspective you’ve shared, which is to prioritize the experience of non cheaters over catching the cheaters. After all, why catch the cheaters if the game is no fun?
Some video game developers could take a page out of your book!
*cough*nintendo*cough*
@@rayquaza1053 **cough** any root-level anti-cheat **cough**
and governments..
I agree with the sentiment, but I think that even this 'low chance of accusing fair players of cheating" is not acceptable. The idea of having a false positive in a system that would most likely have very severe punishments is awful, especially because its very hard to prove that you were banned falsely. Sadly this is what a lot of systems do nowadays, and we've seen some of the consequences of people getting falsely banned and having it overturned, but most of them were big youtubers/streamers etc. Any other person without a big following is fucked if it happens to them.
@@johntravoltage959 “the idea of a false positive is awful”
Literally impossible to avoid
Primer: "How To Catch A Cheater With Math
"
Dream: sweating profusely.
Technoblade: sleeping like a baby.
@@bungeethehuman6292 💀you just did not-
@@bungeethehuman6292 Not funny
@@bungeethehuman6292 bruh moment
@BungeeTheHuman 🤡
I think it would have been interesting to show an example where there are very few cheaters. The result of this would be that even though we might accuse less than 5% of the non-cheaters of being cheaters and more than 80% of the cheaters of being cheaters, we’d accuse more non-cheaters than cheaters, which can make it feel like the test is bad.
Also i think philosophically if this were a real game in real world if you get 1 fair player as cheater you defeated completely the purpose of the "sport" labeling the real goat of the game as cheater for a communist greater goal.
"If we accuse one person of cheating who isn't it destroys the sport"
Thanks, but I'll take the communist greater goal here. The goal should be to give the legitimate players the best chance possible, not to make sure that no individual legitimate players are ever wrongly accused.
Calling thinking about the big picture objective rather than being obsessively concerned with liberal individual rights "communism" is an excellent endorsement of communism because there is literally never certainty and you cannot fully eliminate type 1 (or type 2) errors while also usefully detecting cheaters.
In other words this is a nonsense "philosophy" that would, for example, imply that courts would have no choice but to always assume innocence unless the probably of innocence was *exactly mathematically zero.*
So i.e. If we have HD video evidence of a murder, your logic says that we should assume that the evidence is random chance and entirely due to compression artifacts and cosmic rays aligning the pixels just so.
Your philosophy forces a standard of evidence which is best described as "beyond unreasonable doubt." There is a reason absolute certainty is not required to convict in any court anywhere, because there is ALWAYS uncertainty.
@@petersmythe6462 Thanks for making that all up. I was about to go to bed and I needed a fairy tail to relax me communist.
Communists always have the best fairy tails things like communism works.
Middle school me wishes there was a youtube channel or a teacher that made math as entertaining or as understandable as Primer does
Im in middle school and this help so much in my classes 🤣
Basically you'd rather be entertained than learn
@@RedTail1-1this is a really… odd(?) comment
Like yeah everyone wants to be entertained, but this video is almost entirely focused on education.
It’s not a textbook or a lecture but it’s still an educational video before it is entertainment, and sure you could argue that it’s much easier to digest because of the format it’s in, but that’s just how education kinda has to be if it wants to be effective.
@@RedTail1-1
he's a middle school at his story so i assume he's just starting to get to know mathematics as what it is in that part of the story
You can't enjoy something you don't understand so this blobs is definitely a good headstart into understanding math
Once math is understood you can now leave out the blobs and entertain yourself with other stuff that can be seen in math itself which interests you like visual geometry or statistics
After all we can't enjoy math if math are just a bunch of set of rules without logic and without a way to spark imagination, curiosity, creativity, and many more which sparks emotion which is technically entertainment
(But hey it's not like they're going to teach math if it's just a bunch of rules that doesn't do anything nor show logical reasons of its existence but that besides my point)
So in summary we look for different ways of entertainment that is present in different fields of study or maybe just entertainment as a form of headstart to understanding and use that understanding to create your own choice of entertainment that can be found in that field of study
Even expert mathematicians sometimes create new math problems that doesn't even exist in real life not because they need it but because it entertain their craving to look for more insight in math which is basically to satisfy their curiosity but not to change something for the better
This is what I call excellent science communication. You clearly explained the topic and used understandable terms. You got yourself a subscriber, and I look forward to more videos.
Are you going to watch old videos?
I was thinking of using stages where getting a certain percentage of heads puts you in a group that tests with significantly more flips. That brings the total number of flips needed down massively over having a whole group flip 50+ times.
I was thinking the same thing and saw this comment, it's like "random checks" they do at airports for suspicious individuals. Granted, the definition of our suspicious varies from the rather racist definition of "suspicious" that people who work security at an airport do.
In this case, maybe it would be good to ask them to flip their coin until they get 5 (or whatever number) tails. Then we check how many flips they needed. Blobs that got unlucky will quickly prove themselves innocent. We'll need to find the right numbers (how many tails do we ask for and which number of flips is our suspicion threshold) in a similar manner as described in the video.
We could even simulate which test requires fewer flips overall.
@@dragoncurveenthusiast If an unfair coin has a heads probability of 58%, it's still relatively likely for them to get an unlucky streak. I thought in the video in the last sample they would just make one of the groups small enough that there were too many false positives or not enough true positives, rather than make the cheating odds a lot tougher to catch.
Same
So maybe you wait for the bayesian testing video, which will teach you exactly that....
I took AP Stats a couple years ago, and although I did well enough to get college credit, this is really well made and does a great job of explaining statistics. I feel like I learned a couple of things, and got to see things like p-values as less arbitrary numbers. Really cool, you’re a great teacher, thanks for the lesson!
Came down hear to say something similar, glad somebody did it for me. Took the actual class in college is really only difference lol
I know right. Where was this guy a year ago when I needed him
I dont even watch these to learn a subject better, i just watch and listen to this for fun! Its actually pretty entertaining!
19:24 some poor blobs coin fell off the platform, look at bottom of screen
Lol
noo
*cues the climax scene from Jumanji*
Blob lost his "lucky" coin
Lamo
Great video, but I want to point out one thing:
the true proportions are also very important. Let's say you have 1000 blobs, and only 10 cheaters. With a 5%/80% test you would get 50 false positive, and 8 true positive. So only ~14% of the accused cheaters are actually cheaters.
In the real world big factors for the limits of such tests are the consequences: What happens to a patient with a false negative test? Will he die soon (eg. cancer), or will he just stay two days longer sick at home?
And what happens with false positive test results? Will the person go to jail (eg. drug test), or undergo harsh medical treatment (eg. chemotherapy), or will he just take a day off?
In praxis these issues are often solved by confirming the results with additional, different tests (in case of the blobs the manipulated coins might have a different weight...), but should still be considered, because these tests again have the same problems, and every additional test costs money and time.
We could consider this a test of the police catching a criminal. If the odds of being caught are high it will act as a deterent so fewer cheaters will actually risk being sent to jail by cheating. But if too many innocent people are sent to jail, the people will rebel. New name of game "Politics"
They usually do a second different test. One is
@@David-sp7gc Thats why he can modify the rates which causes the p-value to be much lower and your confidence rating in catching a cheater.
The 80/5 is a generic prediction for the model, but by no means worthy of actual implementation. When they test for cancer, not 1/20 tests are a false positive.
Your numbers would be extremely improved if you went to catch 99% of cheaters and accuse less than 1% of fair players.
Then, 99% of your predictions will be correct. If you have 1000 fair players, a 14 fair players, the odds of you picking a fair player is extremely low.
I really thought the same as you did! And I first thought he had modified the number of cheater in his 60% exemple (but demonstrating assumption aren't fact was also educative)
That's why the assumed effect size and our acceptable p-values and a-values are really important and require careful thought.
For refereeing a game, we'd want to assume a fairly large effect size and we would want a pretty small p-value. The reasoning here is that the test itself serves as a deterrent - you only need to catch the most egregious cheaters and the rest will take warning.
You'd also want to assume that in a game setting, players are tested fairly frequently. They're probably 'tested' during every game they play. So when referreeing a game, it might me more effective to measure the false positive rate per tournament or per year (like the way we measure failure rates for contraceptives). Because what we're *really* interested in is "how likely is it for a fair player to be accused of cheating during the entire tournament."
In addition, you can risk not catching all the cheaters using only one test, because they'll be tested again and again every match. Even at a 20% true positive rate per match, that amounts to a huge risk throughout an entire tournament.
I leverage significance testing pretty heavily in my day-to-day job, and really loved the way you broke it down here! Particularly appreciate framing statistical power (true positive rate) on the same level as significance (false negative rate), when in most cases the conversation/focus really only extends to significance/p-vals. Going to spend a few minutes scrolling through to purge some of these spambots clogging up the comments, then on to the next vid :)
Isnt there a way to have better results with less flips? Make it conditional?
Like after 10 flips observe only the ones with 6 or more heads and then accuse those who had another 7/10.
From rest of the group accuse those who had 9/10 in second round.
I was thinking some similar. As the number of flips increases, periodically remove the blobs that are statistically not cheating. The total number of flips (all flips for all blobs combined) would be lower but, since all blobs execute their flips concurrently, the time spent flipping would remain the same. In both methods, there would a group of blobs that flip the same maximum number of times. Using the method in the video, all 1000 blobs would flip that maximum number of times. Doing it in phases would only have a portion of the blobs flipping the maximum number of times. But the maximum number of flips would be the same in both methods and would take the same amount of time to accomplish. I’d be curious to know if the “phases” method would be more accurate, though.
@@robmoffett2700 This is typically how sussing out bad actors works in the real world! Usually to perform verification you need to do a pretty simple step, following that only in the case of likely fraud certain users are passed to a more in depth verification path while others are passed on directly to the service.
@@erikk17so you just cheat after 10 flips
I know this is a few years late, but this was a phenomenal demonstration of intro and intermediate concepts of statistics. Absolutely plan on having my students watch this to generally understand the broad statistical test that medical articles/studies use for statistical significance and analysis. Thank you!
I think a good way to expand the simulation a bit would be to have different 'cheater' blobs that cheat at different percentages. Blatant cheaters at 100%, chronic cheaters at 75%, all the way down to situational cheaters at 25%. It's a big reason catching cheaters in games is so difficult; not everyone is always cheating in every scenario, and when they're choosing to cheat is often times super inconsistent. A really skilled player who also cheats is one of the most difficult things to detect, as well!
yeah for they have the skill, knowhow, and reputation to know: what cheating looks like/what legit play looks like, the skill to execute legit play, and the community goodwill to not immediately arise suspicion.
@Test to expand to that setting, it's a bit more complex because you'll need to explain Bayesian statistics - you'll have explain what prior distribution you're assuming for the cheaters is and derive/simulate a posterior distribution for the coin tosses and how exactly to calculate the probability that the player is a cheater.
It's an entire topic of its own and really not that straightforward of an extension from the frequentist case, but a framework exists to do that.
You could expand it even farther by adding "scammer", "blatant/subtle", "sore loser" etc blobs to the mix. Make it so that instead of getting a fixed amount of mood improvement or souring per flip, they can make bets. Have a variety of cheaters that use coins that vary from 51% heads all the way up to ones that just blatantly get head 100% of the time. Have blobs that start cheating if they get a streak of bad luck. Have blobs that switch out their coin for a weighted one whenever they bet big. Etc. Then you could also factor in the net loss/gain of their mood over time into the cheating algorithm.
And of course, blobs who are stupidly generous and choose to lose on purpose. Which actually does happen sometimes whilst cheating in video games
I love the application to real life sciences. Probabilities fill everything we do and it’s a good thing to have a process for determining results from a limited pool of testing.
Great video it gives a very intuitive way of understanding probability, and also demonstrates how unintuitive probabilities can be.
okay so I have 3 points to make :
1) I didn't forget that the 75% chance of getting head for a trick coin was an assumption so I feel very proud about that.
2) I was studying binomial and bayesian distributions earlier this year and I would have LOVED to have this video available then because it's 10 times more clear than any lesson I had
3) That's a great video as expected from you, keep up the good work because I fricking love watching them
I literally have my statistics exam tomorrow and this is a beautiful video to sum up one of the chapters wish he posted this earlier
real life nerd emoji
1) You should feel proud! Good for you!
I was super proud of remembering that too :)
2) Same, only it was a couple years ago for me
3) Agreed
Cute visuals+ well explained math+ the idea that it's ok to mix up words even after years+ a well written script= amazing video
I wish you'd talked about what happens when most of the population is fair and there's only a handful of cheaters. In that case you end up accusing way more innocent people than guilty ones even if you set the false positive % very low. This is a very poor outcome. Unless you follow up accusations with another, more powerful test examining the coin itself to know for sure if it's fair. The followup test is more disruptive and probably more expensive so the purpose of the first test is to minimize the number of followup tests needed. In such a situation the first priority of the first test should probably be minimizing false negatives. It's inconvenient for the false positives to have to get another test to prove their innocence, so you still want to try and minimize false positives too, but you want to catch as many of the cheaters as possible.
No? We make em drop a coin a coin and record the distribution of it until only few persistent deviations from the rate of normalization remain. Those are the cheaters. 2020 US election stats was exactly that.
This would be a good follow up video to this one.
i thought the specific problem with that situation was that the cheater effect size was NOT what the test assumed it was, not how many cheaters there were relative to the population size
@@brasilballs yeah that was the issue in the vid, imay is mentioning a variant issue assuming a low percentage of cheaters...
essentially it's the issue of reducing false-positives by as much as possible which is a pretty major real-world issue ^^ (false convictions for crimes or unneccessary medical treatments, unfair game bans etc etc)
That was exactly what I was thinking.. lol
Blob Maths and Blob Biology are cool, but I want to see Blob Society more.
they ask what is blob but they never ask how is blob
@@iamnotahuman2172 We saw happy blobs, sad blobs and dead blobs. I guess that is how blobs can be.
Me too
I want to see more blobs interacting
"We live in a society."
- Blob Joker
just commenting to say I have seen what you have done there with the cheetah at minute 12:50. Great video, enjoying it so far!
19:12 F for the blob who lost their coin (bottom, center a bit to the left). He will never prove to anyone that his coin just fell though the floor and his life is ruined by realization of living in a simulation.
LOL i was going to comment that
If you’re wondering, it’s the red beanie blob at position (12,4).
Also at 19:23, 2 below the previous blob 😭
If this was a real game, where the players expect fifty percent win rate,this would majorly break the game. Getting accused of cheating is probably allot worse than losing, so every fair player would have to factor in that risk before they played. Depending on how bad it is to be accused, players might even decide to use coins that come up tails more often than heads.
The unit on Game Theory and maximal strategies comes later (but is also fascinating) :)
It would be solved by some kind of manual checking of the coin itself on the request. Say, manual checks are too tedious to be done at large scale but efficient for anyone who is too lucky but still not a cheater
And thus cheating aswell...
19:24 Shoutout to this red dude at X=12, Y=2 throwing his coin off the platform.
yea lol thanks for giving out the coordinate of that blob
@@lephantriduc There is also another red blob at 19:11, X=12, Y=4 that yeets it off
I've played this every year for a year, brilliant game.
maybe he is saying that he only played once.
You meant to say "every year for a century"
every year for a day
it's been up for less than 3 months, so... what they said could be technically correct, but not "every day for a year" or "every year for X years" 🤓
With that sample size, the game MUST be brilliant. I'm off to go play it now. Just once.
I'm in love with idea how you approach the concept of frequentist hypothesis testing, which confuses so many. (from a statistics teacher). I will definitely recommend your video whenever we come to this part of the curriculum.
The title made me laugh, because it reminded me of that truly ridiculous (but very entertaining) speedrunning drama around that minecraft guy, dream. Catching cheaters with math is a useful and very fun skill!
There were people involved with catching speedrunning cheaters that said that he probably was not really cheating (on purpose) because his actions didn't make sense if he was.
Regardless, the game was modified (on purpose or by accident). The outcome was way way waaaaaaaayyyyyy too unlikely.
He also misbehaved pretty badly, trying to cast doubts on the statistical methods...
With how low the odds were, even if the methods had high uncertainty the game would still be definitely modified
@@didack1419 That's true, you can't really prove evil intent with math - like whether the game was modified on purpose. You can only show _that_ it has been modified.
@@baguettegott3409 The reason it's decently likely that the game was modified by accident was because he already modified the game to make other content, so it's reasonable that he could have left the modification without realising.
It's weird that he didn't say that from the beginning, because it was a reasonable excuse. I don't know. I guess he thought the mods were against him or something so they wrote a paper blaming him of cheating.
Or he really cheated and thought people would not buy it.
Or he was really sure he had the game unmodified, which is naive.
@@didack1419 he admitted later on that he did purposely cheat.
19:04 Keep an eye slightly below the stages, you'll see a few coins being flipped too far, falling off the stage entirely and into the void below.
lmao
This guy takes the most simple of topics, makes them easy to understand, and makes learning fun!
I thought I was gonna watch blobs with infidelity.
very cool work as always
Looking forward to the Bayesian version. Would be cool to be able to estimate exactly what the distribution of cheaters is. Like, what if you don't just get a single type of cheater, but a distribution of them? Perhaps distributed by some Beta-distribution, with most cheaters going close to 50% and very few going for much more audacious near-100%
Like setting your headshot percentage in a cheat for a shooter game
Yes
Thomas Bayes was a god amongst men, probably.
@@maracachucho8701 *slow claps*
Bayesians are like vegans of science: they are a minority, but boy are they a vocal one, and seems like all of them gathered in this comment section! Yours sincerely, a frequentist.
It's kind of fun to watch videos like this after hearing all sorts of similar (yet less entertaining) lectures from a friend that's soon going to have a PhD in Biostats. Can't wait to see what you can pull up for the Bayesian method!
I would have love to have you as a teacher. You're just awesome at explaining, this is because you explain the basis and the "easy" stuff and I know how hard it can be for people who knows difficult stuff to forget how the basis can ben hard for novice. You don't, and I admire you for doing so
I love how your the only TH-camr that makes the zoom to fit look good! Thank you for making my day!
2:37 that is amazing
This is so well put. Can't explain how much I appreciate your ability to slow down and put this into a way anyone can understand.
Compare this to creators like Veritasium (as much as I love him) and they're fairly content to speed through complexities, leaving laymen in utter confusion.
1:02 that blob with 0 heads be like me fr
IS THAT THE ECHDEATH FROM TERRARIA?
We need a blob history or lore drop about like how blob society works, and how the simulation effects their future life.
15:56 The froggy hat this blob has is absolutely adorable!
One thing you can do to lower the effective number of coin flips is to have dynamic decision making. You can establish a threshold to sort cheaters and non cheaters at every level while allowing uncertain players to continue testing. For example, if someone has the first 6 flips come up tails, you can stop testing because they are most likely not a cheater.
What about a cheater who doesn't always use a cheat coin? Cheats need to balance profit against chance of being caught. Runs of tails would throw algorithms. Just asking
@@tim40gabby25 what you are talking about is effectively changing the cheating effect probability. We aren't testing for cheaters, we are testing for cheating, so as long as they aren't swapping coins in the middle of a test, it doesn't change your approach. If you wanted to monitor cheating in a game in the real world, you would need to adapt your detection strategies to account for subterfuge like the inconsistent use of cheats.
Yeah, you'd need a more sophisticated algorithm to catch players who only cheat occasionally, and eliminating folks due to runs of 'bad luck' often doesn't work in real life scenarios.
For a real world example, take Fall Guys, which has had its fair share of cheaters over it's lifespan. (Also using this example because I just lost a final round of Hex-a-Gone in which someone was using a flying cheat so it's fresh in my mind and I'm malding.) It's well known that many of the cheaters in that game often only use cheats if they make it to the final round, meaning there are less people left to report them, and they suffer enough pre-final losses to not look suspicious on paper.
Of course, in this scenario where the cheating is obvious to their opponents a reporting system *can* work, but only if it's well implemented and has some protections against false reporting. Like a player is only marked 'sus' once they receive enough reports matching the same parameters, or something like that.
@@LowPolyPigeon Interesting. Thanks
@@tylerhale8679 Interesting. Thanks.
When you're a mathematician but but yo man being suspicious:
yo
sus
unrelated but i had an eraser shaving on my computer screen in the place of the blue blob's mouth during like around 2:17 so it looked like it was smiling the entire time and for some reason it made me happy :D
Omg I've watched people treat assumptions as facts and screw themselves.
Great video! Love all your content.
I learned more in 10 minutes of this video than in an entire stats class in university. You have a real talent! Thank you for all the amazing content! I have never missed a video and you have taught me so much
What i like most about this is that the various steps of hypothesis testing are not presented as "here's the rules, learn them and apply them" - but rather as a set of designed/engineered choices aimed a a specific goal, and showing what goes wrong when you leave out one of the engineered steps.
I struggle with statistics class and though this video does not directly tackle my questions, I really appreciate the time you take to explain them clearly, especially regarding the false positive ratio. I hope to see more blobs soon
Looking forward to the Bayesian vid - that would be my approach. A nice way to contrast the Bayesian vs frequentist perspective is that.. frequentists condition on the “true” parameter and integrate probabilities over the random data. Bayesians, on the other hand, condition on the data and integrate probabilities over the (assumed) distribution of the unknown parameter. To me, this makes the bayesian approach more attractive, because in reality, we only know the data - not the parameter.
I love this so much! As a game dev I'm desperate for anti-cheat. However gameplay time can prove dedicated people to get really lucky and having 1% false positives means that there's a much higher chance that actual dedicated players get lucky and falsely accused. That's why I believe it shouldn't be as brief as possible, but be constantly looked at. Like a history of average luck over all flips and over the last few flips, if the player has been playing for some time then the test becomes less brief assuming that they'll be more consistent with what they're doing. They won't randomly enable a cheat so theit test is more forgivìng and catches far less cheating scenarios, but if the history sees a sudden spike of luck then that's a sign to be investigated.
if the false positive rate is high then make the punishment mild like a 5 min ban or something
in addition to lowering the punishments for false positives, (imagine that content creators will not only be more likely to test positive(regardless of true or false), but also that they will have more runs at the test as they do it for a living) you may also find that you want to make a hidden strike system, and use that in there. some cheaters will cheat for a few runs and then play without them, which will mean that it starts to make it harder to catch them, so instead a trick you could use to catch them is to put strikes on a hidden list, if i get 10 heads by cheating a 100% chance, ill probably not go farther, and will return to normal flipping, to throw you off my scent, so you give me a strike for that, and even though its statistically insignificant for someone with thousands of flips, its still an oddity to be noted. as you perform larger oddities it can add larger sized strikes, until a certain number of points you just say "hey bud its pretty clear that youre cheating and then trying to hide it by playing badly afterwards, come in to our labs and do a game and do as well as you do in those good runs if you wanna not be banned" as an example; 10 wins in a row can be assumed to be about a 1/1024 chance regardless of the game, as long as the chance to win is 50%. if you win 10 times in a row, thats 1 point on your strike file, if you win, i dont know, 15 in a row(1/32768 chance), lets say its a 30 point strike (on top of the fact that 15 flips in a row as heads already will count as 6 points on their record, as it is 10 in a row, and 11, and 12, etc), and lets say that if you get 100 points on your record, then we give you the talk. make each incidence stick around for X flips where X is past the 1% false positive rate, and you will see that even so, the majority of cheaters that try to get around simple probability by playing with cheats very little of the time, they still will proc this method of detection.
You can create a model based on their ELO. Then use a standard distribution relative to the elo to preselect outliers. Then you can run a series of tests on these outliers.
@@nou5440 while that would work, the punishment would also have to be annoying
Something like changing the player's name to "I cheated" for an hour, and preventing them from changing the name back for the hour could work
@@lavantant2731 the only thing about that is that it's usually not a flat 50% chance to win or lose. Even with some kind of matchmaking. There are a lot of other factors at play.
I first watched this video before taking a stats class. I now watch it halfway through a stats class and understand it so much more.(first learning about power this week!)
Man I had a feeling you were gonna switch up the cheater odds with the mystery test, also great video mate I love these math lessons.
I was just about to suggest a Bayesian approach, and then you said you are going to cover that in the next video. That is good.
I'm pretty sure that the best test method won't involve the same number of flips each time. It would look at the trend after each flip and accuse once the probabilities become strongly enough tilted in one direction or another. How far tilted depends on the cost of a wrong guess and the benefit of a right guess.
At the end of the video i was actually shocked that 22 minutes had aldeady passed. You made such a good job at explaining this topic in a very entertaining and instructional way. I learned a lot and you gained a new subscriber!
21:00 Yes, I actually did, which is funny because my brain was also struggling to process the quadrant chart for some reason, but I immediately thought of the assumed cheater coin probability when you said this run would be a real world application of the test. It's the most important variable when setting up the test.
IMHO, labling each result quadrant independenly with Ff, Fc, Cf, Cc would be easier to process. This emphasizes the truth with a capital letter and makes it easy to check what result group your looking at
guys can we have a f in chat for those 3% cheaters getting 5 tails , probably purposefully did it like that cuz they depressed
It's like walling in a shooter game but still getting shit on. lol
I've literraly had my propability and statistics exam yesterday. It was sooo boring to learn. However i love your videos and how they spark my motivation so i can implement these topics to programming. Thanks :D
hah I finished that course last week! (5 days ago)
at 19:12 you can see a coin fall off the map XD
I love the idea of these simplistic blobs becoming gambling addicts, relegating the detectives to statisticians
Can’t believe you uploaded what is essentially a really interesting intro to statistics lecture
I need a looping gif of the blob flipping coins
same they're so scrung
Honestly loved the little lesson about assumptions. So easy to forget, yet so impactful on our perception.
My head hurts so much after few flips with those False Positive and False Negative . My mind went blank. Have watch again, and again and again. . . To infinity
I think the best solution is to compromise by having a higher false positive test, and then do another, more rigorous test with the smaller group of accused blobs. That was you can end up with a lower number of false positives overall, but still be expedient
Medical tests will sometimes do this - and because of the different composition of the resulting group of accused blobs, even just running the test a second time can yield pretty good results.
Years have passed, but the videos still have the same kind of value in them. That's really impressive
If this was a real world example, I'd expect the cheaters' coin's bias rate to vary from person to person. I wonder how much more complex that'd make the math.
Thanks for the video.
Another variable would be ratio of cheaters. What if it's only 10%? More false positives for one?
If there are alot less cheater (like less than 5%) you would end up catch more fair player than cheater even though the percentage is still the same.
@@koharaisevo3666 A (much) larger sample set is then required, but conceivable.
14:36 Once a player gets 16 Heads or 8 Tails, there is no need to continue to the full 23 flips. 16 Heads = Cheater, 8 Tails = Fair.
* I'm commenting as I watch = you might address this in a second...
17:11 I just love how some coins fall into the void.
Every video, we’re uncovering the culture and customs of these blobs. I guess in this one, we learn they’re gamblers by nature.
Never seen a more clear explaination of hypothesis testing in my whole life. Thank you.
"It's very easy to forget that assumptions is an assumptions and instead treat is as a fact", it's so true
Detective: are you cheating?
Blob: no
*proceeds to flip 23 edges in a row*
Detective: *understandable, have a nice day*
I did actually remember the 75% assumption. I was thinking the final test would be to give a more realistic distribution of how cheating works, i.e. not the same for every blob. Of course it still averages out, but the ones cheating 'less' would be harder to catch.
Love watching the coins fall off of the tables in the simulations ❤
i love how you make math so accesible and understandable to everyone!
I love how the blob society has a entire detective agency for a coin flipping game 😂
8:00 I think the term you're looking for to describe the false positive rate is specificity - in statistics the specificity of a test is the probability that a test negative case represents a true negative, its inverse would be the false positive rate as defined here (this example test has a specificity of 97.6%)
Are you an actual teacher aswell? These video’s are incredibly comprehensive and engaging. I love this!
Probably not, knowing how TH-cam works
I think the main outcome is that it is very difficult to catch someone cheating using statistics. You should use statistics to raise suspicions, but you shouldn't call them a cheater without further evidence. 5% or even 1% is just way too high of a false positive rate for 1000 players.
Ok this is amazing...the assumption of cheaters having 75% chance was the most fair. 50% is the norm and 100% would be too stupid from the cheaters, so taking the middle ground (75%) would be smart. Yet that assumption still lead 80% of the cheaters to get away
And yet, one COULD argue that the test still accomplishes its goal! Kinda.
As a cheater, you want to get as many wins from your coin as you can. BUT, if you're caught cheating, you can't play anymore, and thus cannot win.
Now, assuming universal testing, and because the test is based directly off of how many "wins" you get, the surest way to play and win, is to make sure you don't win TOO MUCH, to avoid being caught. This introduces a sort of "soft-cap" to how much cheating anyone can actually do.
So, although more cheaters get through, they still have a notably deflated win-rate, meaning that cheating is less rewarding, even though they aren't caught.
Now, is it still a favorable ratio of "Cheated Wins" being prevented? IE, cheaters caught and thus annulling all of their dishonest wins, versus the cheaters who got in anyway but the number of dishonest wins they had to give up to do it?
I dunno, sounds like a LOT of math to me! I'll pass on that, but still a fun factor to consider.
I just realized. We dont have to technically make assumptions for rates of cheater coins to make an accurate test that fits our criteria. You could just make a test that tests for fair players since we know the rate at which a non cheating blob will get tails. Namely 50% of the time. So instead you could just design a test that labels a blob as fair more then 80% of the time, and only labels cheaters as fair less then 5% of the time. I mean, it would require more data then testing for cheaters, but you could get something fairly accurate.
This is in fact the most commonly used form of frequentist tests, where you only check against the “null hypothesis” and nothing else is explicitly used in the maths. Implicitly, there is still of course some true effect size, and your test has some power to detect it, depending on your chosen sample size. If you care, you can calibrate that sample size to detect effects of at least some given size. In practice, especially before people rang alarm bells about the “replication crisis”, you sometimes saw little care for sample sizes, beyond convenience or custom, as if the researchers were unaware how this limited their results’ possibility and accuracy.
Well it just depends on your starting point.
You could start with the assumption that you're going to take 40 coin flips, and then calculate the power of your test at p=0.05.
Or you can start with an effect size and a p-value and determine the sample size you need.
Or, you can take a sample size and an effect size and calculate a p-value.
Unfortunately you still have to make the same assumptions about the effect size (probability of cheater coins getting heads) even with this test, as you still need to provide an assumed effect size to determine whether you test is labeling cheaters as fair 5% of the time. In fact, as the shape of the frequency distribution of cheater coins getting heads is dependent upon the effect size, and the probability of a given test not catching the cheater is dependent upon this distribution, it is not possible to provide the probability of not catching a cheater without first using some value for effect size to generate the frequencies of cheater coins getting heads.
Outstanding video! Assumptions, models, technical knowledge, evaluation, and beautiful communication - a mathematical masterpiece!
i love how everyone says that the coin flipping game is simple and dull, completely forgetting that we were all obsessed with bottle flipping six years ago.
Great review of some of the content I learned in statistics class in college! Love it!
12:56 The cheeta graph 😅😂
I love your careful explanation of the false positive rate and how you couched it in your experience of mixing up the terminology. A great tool for getting us to pay attention and notice the nuance without making us feel stupid!! Absolutely superb teaching!
Not me thinking this was gonna be an in depth animated analysis of how to find out if ur girl is cheating using math. This video was great but someone should make that video as well
I've just watcher 3Blue1Brown's video about "probability of probabilities" and I think this concept would fit very well for this problem. We can plot a probability density function of "coin's head probability". And then a player will flip the coin until we are satisfied with the graph and comfortable to label them.
"İ see that he kills you through the wall but i have to check"
30 minutes later
"Ye i think he cheated"
Cheating in the cases of online games that don't rely on chance are far easier to catch than those that rely on chance. But because it doesn't boil down to how data skews, there's a lot more ways to hide it
i have valorant brain so i just thought 'wdym thats normal' 💀
Can we get a behind-the-scenes video walking us through how you animate these videos? They're so incredibly well done.
Probably Blender (for the blobs) and any usual professional video editor.
5:17 the grid is 40x25, and it equals to 1000
Had a lot of fun with this one! I just took AP Stats this year so this was a nice refresher to what I learned there. I might be wrong on this (and if someone who knows more about stats than a wee high schooler can correct me in that case), would it be better to design a Chi-Squared GOF test for this? Since we don’t know the probability of heads on a biased coin, I feel like it might be slightly more effective even if a tiny bit more complex. Each blob would need to flip at least five times (although if this is being done on a calculator, it should be an even number since decimals make them grumpy for some reason), and we make the expected values half that number. That was my first thought when Primer asked us to make a test, and I know they can be super easy if done on a calculator. I could also be really stupid and that’s what the next video will cover 😂
If i recall correctly, chisq GoF tests are primarily used to determine whether or not our experimental results differ from our theoretical results (and whether it is reasonable then to suggest that an alternative hypothesis is true)
In this problem, we already know that there are cheaters abound in our experiment, so our results should, no matter what, be skewed towards having a higher number of heads than tails.
@@derpsalot6853 That’s a good point! GOF might not work then, especially since I think it requires a list with more than one value, which obviously doesn’t work in this case. Chi-Squared could possibly still be used if a single contribution is calculated for a single blob and compare that to known critical values, but that needs a chart with all those written, and I doubt the tester in question would feel like lugging that around!