I shared this with several of my former professors and they loved it. By far the one who had the funniest reaction was my sociology professor, who said "ensure the false positive rate is less than 0.1%." I asked him why, at which point he smiled this giddy evil smile and said "Then have the cheaters get labeled as mangoes."
@@coruscaregames The blobs eat mangoes. My sociology prof was implying that, upon seeing a cheater eaten alive by other blobs, the cheating would stop because now the risk of being caught cheating was no longer just being called a cheater but a horrible gruesome end.
Love this video. I teach a graduate-level class on experimental design, but many of the students have forgotten a lot of the basic statistics by the time they take it. I'm bookmarking this as a potential reminder/class aid. Thanks!
I love how you take (sometimes boring) statistics concepts and explain them in a way that makes it simple and enjoyable for non-maths guys. Good job, my dude.
@@NorroTaku Well, so do I since I'm a statistician myself. The thing about learning stats, tho, is that while it has some nice applications and ideas to explore, the more dense and theory-heavy subjects are really difficult to grasp and, as such, require a lot of effort to have a ok knowledge, let alone really understand. These subjects, that are brought up in Primer's videos are really cool, but basic. When you go down the rabbit hole about learning a lot of probability, inference, stochastic processes and such, you can get kinda annoyed by the elitist nature of the maths behind stats.
@@aurumble Basically, yes. Stats, as any maths field, is difficult by itself, but it becomes even harder to understand when 99/100 books/papers are written in the most complex and incomprehensible ways possible. To actually understand something about maths (by reading books or papers) you, most often than not, need to already have an understatement of the subject or to be a genius. Unfortunately, maths is written not to be understood but to "wow" the reader about how unbelievably difficult the subject you studied is.
@@RedTail1-1this is a really… odd(?) comment Like yeah everyone wants to be entertained, but this video is almost entirely focused on education. It’s not a textbook or a lecture but it’s still an educational video before it is entertainment, and sure you could argue that it’s much easier to digest because of the format it’s in, but that’s just how education kinda has to be if it wants to be effective.
@@RedTail1-1 he's a middle school at his story so i assume he's just starting to get to know mathematics as what it is in that part of the story You can't enjoy something you don't understand so this blobs is definitely a good headstart into understanding math Once math is understood you can now leave out the blobs and entertain yourself with other stuff that can be seen in math itself which interests you like visual geometry or statistics After all we can't enjoy math if math are just a bunch of set of rules without logic and without a way to spark imagination, curiosity, creativity, and many more which sparks emotion which is technically entertainment (But hey it's not like they're going to teach math if it's just a bunch of rules that doesn't do anything nor show logical reasons of its existence but that besides my point) So in summary we look for different ways of entertainment that is present in different fields of study or maybe just entertainment as a form of headstart to understanding and use that understanding to create your own choice of entertainment that can be found in that field of study Even expert mathematicians sometimes create new math problems that doesn't even exist in real life not because they need it but because it entertain their craving to look for more insight in math which is basically to satisfy their curiosity but not to change something for the better
Let's say: if a cheater feels that someone is onto them, they will secretly switch to a fair coin. Then, we're suddenly very deep in the domain of games theory. A simple crossover that shows how quickly mathematical models become complex.
we could model this by a random chance that the cheater get this feeling, since it isnt based on anything that is contained in the game. Then it is again easy to model. But the mathematical model mentioned in the video IS already so complex that it defies the expectation of normal people who havent calculated this through. This is called the fallacy of small probability, or just the fallacy of small numbers.
@@activediamond7894 if this is a process that has a finite variance, it could be modelled just the same as the probability. If it doesnt, then theres no reason to even look for cheaters, since u cant get arbitrage from a process that is purely chaotic.
Given that the parameters are the game are: "The blob who flips heads feels happy," it might be possible that there are actually secret (deviant) tail-lovers amongst the blobs, who routinely cheat to "lose" more than expected.
I leverage significance testing pretty heavily in my day-to-day job, and really loved the way you broke it down here! Particularly appreciate framing statistical power (true positive rate) on the same level as significance (false negative rate), when in most cases the conversation/focus really only extends to significance/p-vals. Going to spend a few minutes scrolling through to purge some of these spambots clogging up the comments, then on to the next vid :)
Isnt there a way to have better results with less flips? Make it conditional? Like after 10 flips observe only the ones with 6 or more heads and then accuse those who had another 7/10. From rest of the group accuse those who had 9/10 in second round.
I was thinking some similar. As the number of flips increases, periodically remove the blobs that are statistically not cheating. The total number of flips (all flips for all blobs combined) would be lower but, since all blobs execute their flips concurrently, the time spent flipping would remain the same. In both methods, there would a group of blobs that flip the same maximum number of times. Using the method in the video, all 1000 blobs would flip that maximum number of times. Doing it in phases would only have a portion of the blobs flipping the maximum number of times. But the maximum number of flips would be the same in both methods and would take the same amount of time to accomplish. I’d be curious to know if the “phases” method would be more accurate, though.
@@robmoffett2700 This is typically how sussing out bad actors works in the real world! Usually to perform verification you need to do a pretty simple step, following that only in the case of likely fraud certain users are passed to a more in depth verification path while others are passed on directly to the service.
I was thinking of using stages where getting a certain percentage of heads puts you in a group that tests with significantly more flips. That brings the total number of flips needed down massively over having a whole group flip 50+ times.
I was thinking the same thing and saw this comment, it's like "random checks" they do at airports for suspicious individuals. Granted, the definition of our suspicious varies from the rather racist definition of "suspicious" that people who work security at an airport do.
In this case, maybe it would be good to ask them to flip their coin until they get 5 (or whatever number) tails. Then we check how many flips they needed. Blobs that got unlucky will quickly prove themselves innocent. We'll need to find the right numbers (how many tails do we ask for and which number of flips is our suspicion threshold) in a similar manner as described in the video. We could even simulate which test requires fewer flips overall.
@@dragoncurveenthusiast If an unfair coin has a heads probability of 58%, it's still relatively likely for them to get an unlucky streak. I thought in the video in the last sample they would just make one of the groups small enough that there were too many false positives or not enough true positives, rather than make the cheating odds a lot tougher to catch.
This is what I call excellent science communication. You clearly explained the topic and used understandable terms. You got yourself a subscriber, and I look forward to more videos.
2:08 right off the bat, there’s an extremely important perspective you’ve shared, which is to prioritize the experience of non cheaters over catching the cheaters. After all, why catch the cheaters if the game is no fun? Some video game developers could take a page out of your book!
I agree with the sentiment, but I think that even this 'low chance of accusing fair players of cheating" is not acceptable. The idea of having a false positive in a system that would most likely have very severe punishments is awful, especially because its very hard to prove that you were banned falsely. Sadly this is what a lot of systems do nowadays, and we've seen some of the consequences of people getting falsely banned and having it overturned, but most of them were big youtubers/streamers etc. Any other person without a big following is fucked if it happens to them.
I think it would have been interesting to show an example where there are very few cheaters. The result of this would be that even though we might accuse less than 5% of the non-cheaters of being cheaters and more than 80% of the cheaters of being cheaters, we’d accuse more non-cheaters than cheaters, which can make it feel like the test is bad.
Also i think philosophically if this were a real game in real world if you get 1 fair player as cheater you defeated completely the purpose of the "sport" labeling the real goat of the game as cheater for a communist greater goal.
"If we accuse one person of cheating who isn't it destroys the sport" Thanks, but I'll take the communist greater goal here. The goal should be to give the legitimate players the best chance possible, not to make sure that no individual legitimate players are ever wrongly accused. Calling thinking about the big picture objective rather than being obsessively concerned with liberal individual rights "communism" is an excellent endorsement of communism because there is literally never certainty and you cannot fully eliminate type 1 (or type 2) errors while also usefully detecting cheaters.
In other words this is a nonsense "philosophy" that would, for example, imply that courts would have no choice but to always assume innocence unless the probably of innocence was *exactly mathematically zero.* So i.e. If we have HD video evidence of a murder, your logic says that we should assume that the evidence is random chance and entirely due to compression artifacts and cosmic rays aligning the pixels just so.
Your philosophy forces a standard of evidence which is best described as "beyond unreasonable doubt." There is a reason absolute certainty is not required to convict in any court anywhere, because there is ALWAYS uncertainty.
@@petersmythe6462 Thanks for making that all up. I was about to go to bed and I needed a fairy tail to relax me communist. Communists always have the best fairy tails things like communism works.
I took AP Stats a couple years ago, and although I did well enough to get college credit, this is really well made and does a great job of explaining statistics. I feel like I learned a couple of things, and got to see things like p-values as less arbitrary numbers. Really cool, you’re a great teacher, thanks for the lesson!
I'm in love with idea how you approach the concept of frequentist hypothesis testing, which confuses so many. (from a statistics teacher). I will definitely recommend your video whenever we come to this part of the curriculum.
I wish you'd talked about what happens when most of the population is fair and there's only a handful of cheaters. In that case you end up accusing way more innocent people than guilty ones even if you set the false positive % very low. This is a very poor outcome. Unless you follow up accusations with another, more powerful test examining the coin itself to know for sure if it's fair. The followup test is more disruptive and probably more expensive so the purpose of the first test is to minimize the number of followup tests needed. In such a situation the first priority of the first test should probably be minimizing false negatives. It's inconvenient for the false positives to have to get another test to prove their innocence, so you still want to try and minimize false positives too, but you want to catch as many of the cheaters as possible.
No? We make em drop a coin a coin and record the distribution of it until only few persistent deviations from the rate of normalization remain. Those are the cheaters. 2020 US election stats was exactly that.
i thought the specific problem with that situation was that the cheater effect size was NOT what the test assumed it was, not how many cheaters there were relative to the population size
@@brasilballs yeah that was the issue in the vid, imay is mentioning a variant issue assuming a low percentage of cheaters... essentially it's the issue of reducing false-positives by as much as possible which is a pretty major real-world issue ^^ (false convictions for crimes or unneccessary medical treatments, unfair game bans etc etc)
Great video, but I want to point out one thing: the true proportions are also very important. Let's say you have 1000 blobs, and only 10 cheaters. With a 5%/80% test you would get 50 false positive, and 8 true positive. So only ~14% of the accused cheaters are actually cheaters. In the real world big factors for the limits of such tests are the consequences: What happens to a patient with a false negative test? Will he die soon (eg. cancer), or will he just stay two days longer sick at home? And what happens with false positive test results? Will the person go to jail (eg. drug test), or undergo harsh medical treatment (eg. chemotherapy), or will he just take a day off? In praxis these issues are often solved by confirming the results with additional, different tests (in case of the blobs the manipulated coins might have a different weight...), but should still be considered, because these tests again have the same problems, and every additional test costs money and time.
We could consider this a test of the police catching a criminal. If the odds of being caught are high it will act as a deterent so fewer cheaters will actually risk being sent to jail by cheating. But if too many innocent people are sent to jail, the people will rebel. New name of game "Politics"
@@David-sp7gc Thats why he can modify the rates which causes the p-value to be much lower and your confidence rating in catching a cheater. The 80/5 is a generic prediction for the model, but by no means worthy of actual implementation. When they test for cancer, not 1/20 tests are a false positive. Your numbers would be extremely improved if you went to catch 99% of cheaters and accuse less than 1% of fair players. Then, 99% of your predictions will be correct. If you have 1000 fair players, a 14 fair players, the odds of you picking a fair player is extremely low.
I really thought the same as you did! And I first thought he had modified the number of cheater in his 60% exemple (but demonstrating assumption aren't fact was also educative)
That's why the assumed effect size and our acceptable p-values and a-values are really important and require careful thought. For refereeing a game, we'd want to assume a fairly large effect size and we would want a pretty small p-value. The reasoning here is that the test itself serves as a deterrent - you only need to catch the most egregious cheaters and the rest will take warning. You'd also want to assume that in a game setting, players are tested fairly frequently. They're probably 'tested' during every game they play. So when referreeing a game, it might me more effective to measure the false positive rate per tournament or per year (like the way we measure failure rates for contraceptives). Because what we're *really* interested in is "how likely is it for a fair player to be accused of cheating during the entire tournament." In addition, you can risk not catching all the cheaters using only one test, because they'll be tested again and again every match. Even at a 20% true positive rate per match, that amounts to a huge risk throughout an entire tournament.
I love the application to real life sciences. Probabilities fill everything we do and it’s a good thing to have a process for determining results from a limited pool of testing. Great video it gives a very intuitive way of understanding probability, and also demonstrates how unintuitive probabilities can be.
I think a good way to expand the simulation a bit would be to have different 'cheater' blobs that cheat at different percentages. Blatant cheaters at 100%, chronic cheaters at 75%, all the way down to situational cheaters at 25%. It's a big reason catching cheaters in games is so difficult; not everyone is always cheating in every scenario, and when they're choosing to cheat is often times super inconsistent. A really skilled player who also cheats is one of the most difficult things to detect, as well!
yeah for they have the skill, knowhow, and reputation to know: what cheating looks like/what legit play looks like, the skill to execute legit play, and the community goodwill to not immediately arise suspicion.
@Test to expand to that setting, it's a bit more complex because you'll need to explain Bayesian statistics - you'll have explain what prior distribution you're assuming for the cheaters is and derive/simulate a posterior distribution for the coin tosses and how exactly to calculate the probability that the player is a cheater. It's an entire topic of its own and really not that straightforward of an extension from the frequentist case, but a framework exists to do that.
You could expand it even farther by adding "scammer", "blatant/subtle", "sore loser" etc blobs to the mix. Make it so that instead of getting a fixed amount of mood improvement or souring per flip, they can make bets. Have a variety of cheaters that use coins that vary from 51% heads all the way up to ones that just blatantly get head 100% of the time. Have blobs that start cheating if they get a streak of bad luck. Have blobs that switch out their coin for a weighted one whenever they bet big. Etc. Then you could also factor in the net loss/gain of their mood over time into the cheating algorithm.
okay so I have 3 points to make : 1) I didn't forget that the 75% chance of getting head for a trick coin was an assumption so I feel very proud about that. 2) I was studying binomial and bayesian distributions earlier this year and I would have LOVED to have this video available then because it's 10 times more clear than any lesson I had 3) That's a great video as expected from you, keep up the good work because I fricking love watching them
very cool work as always Looking forward to the Bayesian version. Would be cool to be able to estimate exactly what the distribution of cheaters is. Like, what if you don't just get a single type of cheater, but a distribution of them? Perhaps distributed by some Beta-distribution, with most cheaters going close to 50% and very few going for much more audacious near-100%
Bayesians are like vegans of science: they are a minority, but boy are they a vocal one, and seems like all of them gathered in this comment section! Yours sincerely, a frequentist.
If this was a real game, where the players expect fifty percent win rate,this would majorly break the game. Getting accused of cheating is probably allot worse than losing, so every fair player would have to factor in that risk before they played. Depending on how bad it is to be accused, players might even decide to use coins that come up tails more often than heads.
It would be solved by some kind of manual checking of the coin itself on the request. Say, manual checks are too tedious to be done at large scale but efficient for anyone who is too lucky but still not a cheater
Looking forward to the Bayesian vid - that would be my approach. A nice way to contrast the Bayesian vs frequentist perspective is that.. frequentists condition on the “true” parameter and integrate probabilities over the random data. Bayesians, on the other hand, condition on the data and integrate probabilities over the (assumed) distribution of the unknown parameter. To me, this makes the bayesian approach more attractive, because in reality, we only know the data - not the parameter.
What i like most about this is that the various steps of hypothesis testing are not presented as "here's the rules, learn them and apply them" - but rather as a set of designed/engineered choices aimed a a specific goal, and showing what goes wrong when you leave out one of the engineered steps.
I was just about to suggest a Bayesian approach, and then you said you are going to cover that in the next video. That is good. I'm pretty sure that the best test method won't involve the same number of flips each time. It would look at the trend after each flip and accuse once the probabilities become strongly enough tilted in one direction or another. How far tilted depends on the cost of a wrong guess and the benefit of a right guess.
One thing you can do to lower the effective number of coin flips is to have dynamic decision making. You can establish a threshold to sort cheaters and non cheaters at every level while allowing uncertain players to continue testing. For example, if someone has the first 6 flips come up tails, you can stop testing because they are most likely not a cheater.
What about a cheater who doesn't always use a cheat coin? Cheats need to balance profit against chance of being caught. Runs of tails would throw algorithms. Just asking
@@tim40gabby25 what you are talking about is effectively changing the cheating effect probability. We aren't testing for cheaters, we are testing for cheating, so as long as they aren't swapping coins in the middle of a test, it doesn't change your approach. If you wanted to monitor cheating in a game in the real world, you would need to adapt your detection strategies to account for subterfuge like the inconsistent use of cheats.
Yeah, you'd need a more sophisticated algorithm to catch players who only cheat occasionally, and eliminating folks due to runs of 'bad luck' often doesn't work in real life scenarios. For a real world example, take Fall Guys, which has had its fair share of cheaters over it's lifespan. (Also using this example because I just lost a final round of Hex-a-Gone in which someone was using a flying cheat so it's fresh in my mind and I'm malding.) It's well known that many of the cheaters in that game often only use cheats if they make it to the final round, meaning there are less people left to report them, and they suffer enough pre-final losses to not look suspicious on paper. Of course, in this scenario where the cheating is obvious to their opponents a reporting system *can* work, but only if it's well implemented and has some protections against false reporting. Like a player is only marked 'sus' once they receive enough reports matching the same parameters, or something like that.
I would have love to have you as a teacher. You're just awesome at explaining, this is because you explain the basis and the "easy" stuff and I know how hard it can be for people who knows difficult stuff to forget how the basis can ben hard for novice. You don't, and I admire you for doing so
I first watched this video before taking a stats class. I now watch it halfway through a stats class and understand it so much more.(first learning about power this week!)
I love this so much! As a game dev I'm desperate for anti-cheat. However gameplay time can prove dedicated people to get really lucky and having 1% false positives means that there's a much higher chance that actual dedicated players get lucky and falsely accused. That's why I believe it shouldn't be as brief as possible, but be constantly looked at. Like a history of average luck over all flips and over the last few flips, if the player has been playing for some time then the test becomes less brief assuming that they'll be more consistent with what they're doing. They won't randomly enable a cheat so theit test is more forgivìng and catches far less cheating scenarios, but if the history sees a sudden spike of luck then that's a sign to be investigated.
in addition to lowering the punishments for false positives, (imagine that content creators will not only be more likely to test positive(regardless of true or false), but also that they will have more runs at the test as they do it for a living) you may also find that you want to make a hidden strike system, and use that in there. some cheaters will cheat for a few runs and then play without them, which will mean that it starts to make it harder to catch them, so instead a trick you could use to catch them is to put strikes on a hidden list, if i get 10 heads by cheating a 100% chance, ill probably not go farther, and will return to normal flipping, to throw you off my scent, so you give me a strike for that, and even though its statistically insignificant for someone with thousands of flips, its still an oddity to be noted. as you perform larger oddities it can add larger sized strikes, until a certain number of points you just say "hey bud its pretty clear that youre cheating and then trying to hide it by playing badly afterwards, come in to our labs and do a game and do as well as you do in those good runs if you wanna not be banned" as an example; 10 wins in a row can be assumed to be about a 1/1024 chance regardless of the game, as long as the chance to win is 50%. if you win 10 times in a row, thats 1 point on your strike file, if you win, i dont know, 15 in a row(1/32768 chance), lets say its a 30 point strike (on top of the fact that 15 flips in a row as heads already will count as 6 points on their record, as it is 10 in a row, and 11, and 12, etc), and lets say that if you get 100 points on your record, then we give you the talk. make each incidence stick around for X flips where X is past the 1% false positive rate, and you will see that even so, the majority of cheaters that try to get around simple probability by playing with cheats very little of the time, they still will proc this method of detection.
You can create a model based on their ELO. Then use a standard distribution relative to the elo to preselect outliers. Then you can run a series of tests on these outliers.
@@nou5440 while that would work, the punishment would also have to be annoying Something like changing the player's name to "I cheated" for an hour, and preventing them from changing the name back for the hour could work
@@lavantant2731 the only thing about that is that it's usually not a flat 50% chance to win or lose. Even with some kind of matchmaking. There are a lot of other factors at play.
I think the main outcome is that it is very difficult to catch someone cheating using statistics. You should use statistics to raise suspicions, but you shouldn't call them a cheater without further evidence. 5% or even 1% is just way too high of a false positive rate for 1000 players.
This is so well put. Can't explain how much I appreciate your ability to slow down and put this into a way anyone can understand. Compare this to creators like Veritasium (as much as I love him) and they're fairly content to speed through complexities, leaving laymen in utter confusion.
The title made me laugh, because it reminded me of that truly ridiculous (but very entertaining) speedrunning drama around that minecraft guy, dream. Catching cheaters with math is a useful and very fun skill!
There were people involved with catching speedrunning cheaters that said that he probably was not really cheating (on purpose) because his actions didn't make sense if he was. Regardless, the game was modified (on purpose or by accident). The outcome was way way waaaaaaaayyyyyy too unlikely.
He also misbehaved pretty badly, trying to cast doubts on the statistical methods... With how low the odds were, even if the methods had high uncertainty the game would still be definitely modified
@@didack1419 That's true, you can't really prove evil intent with math - like whether the game was modified on purpose. You can only show _that_ it has been modified.
@@baguettegott3409 The reason it's decently likely that the game was modified by accident was because he already modified the game to make other content, so it's reasonable that he could have left the modification without realising. It's weird that he didn't say that from the beginning, because it was a reasonable excuse. I don't know. I guess he thought the mods were against him or something so they wrote a paper blaming him of cheating. Or he really cheated and thought people would not buy it. Or he was really sure he had the game unmodified, which is naive.
This is great. I would like to use this to introduce my kids to frequentist statistical tests and related notions. It also seems like it could be used to make a pretty good illustration of the scientific method if it were retooled just a bit at the end to explicitly state a hypothesis that you were testing and to explicitly discuss the steps (already implied) that you take to comply with the scientific method. Apologies if there is something in your catalog along those lines that I've missed. But I would love it if you could do this. Also, am I right to assume that there will someday be a Bayesian version of this video? Waiting with bated breath. These are great conceptual frameworks to use for education. Thanks so much for doing them.
It's kind of fun to watch videos like this after hearing all sorts of similar (yet less entertaining) lectures from a friend that's soon going to have a PhD in Biostats. Can't wait to see what you can pull up for the Bayesian method!
I learned more in 10 minutes of this video than in an entire stats class in university. You have a real talent! Thank you for all the amazing content! I have never missed a video and you have taught me so much
It might be a cool idea to narrow down the blobs after a certain amount of heads, that being if their initial 5 or so coin flips is suspicious and what a cheater might have, place them into a group for further testing, which would make it not bother so many of the innocent blobs while at the same time providing more accurate results
Don't forget that (even at 0.75) only 24% of cheaters get 5 heads in a row... so if you use 5 heads then your method can NEVER catch more than 24%, even if subsequent tests are perfect. If you widen the net to include 3 out of 5 you get the 80%+, but also half the fair players. Ultimately you will need about as many iterations to get to your result... in fact without doing the math I'm guessing you cannot reduce the total number of flips in this way, but would be interesting to see for sure.
@@fabio_fco Did you link this timestamp with no additional context as an arrogant way to "educate" the person you replied to? Or maybe you did this to showcase future reads, like me, the moment in the video which helps support Gren's post. JJ's idea is great - what's if we cut the group after a certain threshold? We stop watching the people who are doing poorly because they couldn't possibly be cheating. This is a good idea because our next group of players will have more cheaters than fair players, and we can increase our "cheater catcher" threshold for this second group because we know for a fact there's more cheaters. Gren replied with a confirmation that cutting the poorly performing participants will not INCREASE our True Negatives or True Positives - it will only help narrow those down. So if we were to focus on the 5 out of 5 table, we'll have a group consist of many cheaters and a few fair players, but we've already excused a few cheaters because they were performing "poorly" (enough). I like both of these ideas, but feel like 3 out of 5 is too inclusive. In an even group of 1000 blobs (50 : 50 ratio), performing this split will leave us with 700 blobs in a 36:64 ratio. That's 36% of the 700 are fairplayers and 64% are cheaters. With this new group, we can perform a less forgiving threshold to the fair players that catch more cheaters, but that's still a lot of fair players being caught. I think if we increase the initial threshold (instead of 3 out of 5, maybe 4 out of 7) and we create a less forgiving second threshold (11 out of 16, for example, I just randomly picked) - we can try to fit within our 5% and 80% rules. Notably, we if we combine the maximum number of flips for these two groups, we'll get 23 flips. I'll do the math later, but we could already pass our test with 23 flips - so, as Gren mentioned, this is only a valid strategy if we can use it to reduce the number of total flips.
@@grenvthompson or for the first round, 3/5 heads leads to further analasys. or use the first 10 flips, and if someone gets 7/10 heads, then investigate further.
I did actually remember the 75% assumption. I was thinking the final test would be to give a more realistic distribution of how cheating works, i.e. not the same for every blob. Of course it still averages out, but the ones cheating 'less' would be harder to catch.
Your out-of-the-box thinking and unique perspective turned an otherwise mediocre presentation into a fantastic one *johnson spy* . You did a good job of catching the mistakes and keeping us from wasting time and by taking the wrong path. Your attention to detail really sets you apart from the crowd. Great work! Jack, Your great work has resulted in tangible, beneficial results to me. You’re a force to be reckoned
Don't go through your partner's phone. Not only is it a huge violation of your partner's autonomy but it's also just plain pathetic. Besides what will it get you? You'll either ruin your relationship by finding out they're cheating or ruin your relationship because they weren't but you still didn't trust them and went behind their back. If you feel the need to do this *JohnsonSpy* all this I saw it's just heart wrecking.
@@saturnsandjupiters358Probably because it has ‘catch a cheater’ in the title so the bots are trying to capitalise on people who want to know if their partner is cheating
I struggle with statistics class and though this video does not directly tackle my questions, I really appreciate the time you take to explain them clearly, especially regarding the false positive ratio. I hope to see more blobs soon
I think the best solution is to compromise by having a higher false positive test, and then do another, more rigorous test with the smaller group of accused blobs. That was you can end up with a lower number of false positives overall, but still be expedient
Medical tests will sometimes do this - and because of the different composition of the resulting group of accused blobs, even just running the test a second time can yield pretty good results.
My head hurts so much after few flips with those False Positive and False Negative . My mind went blank. Have watch again, and again and again. . . To infinity
If this was a real world example, I'd expect the cheaters' coin's bias rate to vary from person to person. I wonder how much more complex that'd make the math. Thanks for the video.
I've literraly had my propability and statistics exam yesterday. It was sooo boring to learn. However i love your videos and how they spark my motivation so i can implement these topics to programming. Thanks :D
8:00 I think the term you're looking for to describe the false positive rate is specificity - in statistics the specificity of a test is the probability that a test negative case represents a true negative, its inverse would be the false positive rate as defined here (this example test has a specificity of 97.6%)
I love your careful explanation of the false positive rate and how you couched it in your experience of mixing up the terminology. A great tool for getting us to pay attention and notice the nuance without making us feel stupid!! Absolutely superb teaching!
At the end of the video i was actually shocked that 22 minutes had aldeady passed. You made such a good job at explaining this topic in a very entertaining and instructional way. I learned a lot and you gained a new subscriber!
19:12 F for the blob who lost their coin (bottom, center a bit to the left). He will never prove to anyone that his coin just fell though the floor and his life is ruined by realization of living in a simulation.
I know this is a few years late, but this was a phenomenal demonstration of intro and intermediate concepts of statistics. Absolutely plan on having my students watch this to generally understand the broad statistical test that medical articles/studies use for statistical significance and analysis. Thank you!
I've just watcher 3Blue1Brown's video about "probability of probabilities" and I think this concept would fit very well for this problem. We can plot a probability density function of "coin's head probability". And then a player will flip the coin until we are satisfied with the graph and comfortable to label them.
Had a lot of fun with this one! I just took AP Stats this year so this was a nice refresher to what I learned there. I might be wrong on this (and if someone who knows more about stats than a wee high schooler can correct me in that case), would it be better to design a Chi-Squared GOF test for this? Since we don’t know the probability of heads on a biased coin, I feel like it might be slightly more effective even if a tiny bit more complex. Each blob would need to flip at least five times (although if this is being done on a calculator, it should be an even number since decimals make them grumpy for some reason), and we make the expected values half that number. That was my first thought when Primer asked us to make a test, and I know they can be super easy if done on a calculator. I could also be really stupid and that’s what the next video will cover 😂
If i recall correctly, chisq GoF tests are primarily used to determine whether or not our experimental results differ from our theoretical results (and whether it is reasonable then to suggest that an alternative hypothesis is true) In this problem, we already know that there are cheaters abound in our experiment, so our results should, no matter what, be skewed towards having a higher number of heads than tails.
@@derpsalot6853 That’s a good point! GOF might not work then, especially since I think it requires a list with more than one value, which obviously doesn’t work in this case. Chi-Squared could possibly still be used if a single contribution is calculated for a single blob and compare that to known critical values, but that needs a chart with all those written, and I doubt the tester in question would feel like lugging that around!
Thinking about it, this video also demonstrates why math isn’t always the answer, because perfectly spinning the coin around the center on its side would always reveal if the coin was fair, because the weighted ones would wobble.
could you work with 2 thresholds, 1 where if you get below it you're assumed to not be a cheater, and 1 if you get above it you are assumed to be a cheater; if you fall between them, additional coins will be flipped. This way, the amount of coins that need to be flipped can be lower for the very suspicious and very unsuspicious groups, and only the more uncertain group needs to flip additional coins.
Came here to say this, but thresholding would be more difficult as we have to figure out an "optimal" way of distributing the 80% true positive and 5% false positive
Absolutely, and this is indeed how we design tests in the real world. The simplest example would be, if we have someone who hits 16 heads in a row from the start, we don't need to continue with any more flips :)
You add a second step retesting the coin after accusing a player of cheating to pull false positives lower at the 16-17 threshold. The chance of hitting 16+ more than once is so small that it would essentially be a 99% chance they were a cheater, and if it was just a lucky go there would be about a 2% chance you would falsely accuse someone.
I really enjoyed seeing a lot of the stuff again I learned back in middle and high school in mathmatics. Statistics can be tricky sometimes, but they are so much fun, too. Such an essential skill if you are interested in any kind of science. Also fun to learn the english terms as well, since I actually did not know a handful of them.
Yup, the most treacherous thing about statistics is that every instance is independent from all others. To take the coins, yes they would have a 50/50 chance for either side (or rather both slightly below 50% since a coin is a cylinder and has three sides) and if you flip it a million times, each side should come up equally, but each individual flip happens independent of the others (each flip doesn't change the odds for subsequent flips), so having 1000 heads followed by 1000 tails is improbable, but not impossible.
I just realized. We dont have to technically make assumptions for rates of cheater coins to make an accurate test that fits our criteria. You could just make a test that tests for fair players since we know the rate at which a non cheating blob will get tails. Namely 50% of the time. So instead you could just design a test that labels a blob as fair more then 80% of the time, and only labels cheaters as fair less then 5% of the time. I mean, it would require more data then testing for cheaters, but you could get something fairly accurate.
This is in fact the most commonly used form of frequentist tests, where you only check against the “null hypothesis” and nothing else is explicitly used in the maths. Implicitly, there is still of course some true effect size, and your test has some power to detect it, depending on your chosen sample size. If you care, you can calibrate that sample size to detect effects of at least some given size. In practice, especially before people rang alarm bells about the “replication crisis”, you sometimes saw little care for sample sizes, beyond convenience or custom, as if the researchers were unaware how this limited their results’ possibility and accuracy.
Well it just depends on your starting point. You could start with the assumption that you're going to take 40 coin flips, and then calculate the power of your test at p=0.05. Or you can start with an effect size and a p-value and determine the sample size you need. Or, you can take a sample size and an effect size and calculate a p-value.
Unfortunately you still have to make the same assumptions about the effect size (probability of cheater coins getting heads) even with this test, as you still need to provide an assumed effect size to determine whether you test is labeling cheaters as fair 5% of the time. In fact, as the shape of the frequency distribution of cheater coins getting heads is dependent upon the effect size, and the probability of a given test not catching the cheater is dependent upon this distribution, it is not possible to provide the probability of not catching a cheater without first using some value for effect size to generate the frequencies of cheater coins getting heads.
I love these illustrations :) And yes, I remembered that assumption of 75%, so I expected the results to be skewed in that regard in the final example. Maybe it was just because that was a question that bugged me from the very beginning, before you even mentioned the 75% assumption, how one would go about figuring out that threshold Salute to the 4 poor blobs, that genuinely got a lucky streak of coin tosses, but were accused of cheating anyways ;_;7
To get the threshold, you just gotta do a lot of flips on some blobs until you get someone who is definitely cheating. Maybe create a watchlist of the blobs who are most suspicious (won too many competitions) and pick those. If you flip a coin like 1000 times and the guy gets heads 550 times, there's a very high chance they're cheating. You also have a very good approximate of how rigged their coin is (around 55% in this case). Find a couple of cheaters and see if they also land heads around 55% of the time or if different cheaters rig their coins differently. If you find out it's the same for most cheaters, you're set. If it turns out that different cheaters are rigging by different amounts, you would have to adjust your test to catch the most obvious cheaters. Logically, it's more important to rid the blob world of a cheater who rigged his coin to get heads 100% of the times than one who rigged their coin to get heads 51% of the time. Lots of video games take this approach. Cheaters have to use subtler cheats to avoid getting banned. Can't enable god mode or something these days in most games.
In the real world those blobs would get a closer examination. Like the anti cheating blob checking if their coins are legit physically. You should never face real punishment on a 5% threshold, but it's reasonable to face an inconvenient closer examination on that basis.
Lovely video. Missed opportunity to examine what happens when 99% of the blobs are playing fairly. What is the probability that a blob who is accused is a cheater?
I think in these case the overestimated side becomes the positive one, you will get like 95-5 instead of 99-1? That means that if you have no true cheaters you will still get a result of 95-5 even if you actually have 100-0
@@PlaXer Not the case! When half the blobs are cheating, an accused blob is usually guilty. But when only 1% of the blobs are cheating, then even a fairly "good" test (i.e. one with low Type I and Type II error rates) will accuse more honest blobs than cheaters. This is why, famously, women in their forties who get positive mammogram results (with no family history) usually do not in fact have breast cancer -- it's not that mammograms are terrible, it's just that breast cancer is a rare condition in that age group.
I had previously misinterpreted your question. If there are few cheaters I guess almost all the accused are not, so the probability gets quite low, also considering that many cheaters will be overlooked. Odds are you would have more false positives than actual cheaters. Like, times more
Ok this is amazing...the assumption of cheaters having 75% chance was the most fair. 50% is the norm and 100% would be too stupid from the cheaters, so taking the middle ground (75%) would be smart. Yet that assumption still lead 80% of the cheaters to get away
And yet, one COULD argue that the test still accomplishes its goal! Kinda. As a cheater, you want to get as many wins from your coin as you can. BUT, if you're caught cheating, you can't play anymore, and thus cannot win. Now, assuming universal testing, and because the test is based directly off of how many "wins" you get, the surest way to play and win, is to make sure you don't win TOO MUCH, to avoid being caught. This introduces a sort of "soft-cap" to how much cheating anyone can actually do. So, although more cheaters get through, they still have a notably deflated win-rate, meaning that cheating is less rewarding, even though they aren't caught. Now, is it still a favorable ratio of "Cheated Wins" being prevented? IE, cheaters caught and thus annulling all of their dishonest wins, versus the cheaters who got in anyway but the number of dishonest wins they had to give up to do it? I dunno, sounds like a LOT of math to me! I'll pass on that, but still a fun factor to consider.
I love the idea that these blobs are dull enough to be entertained by flipping coins, but clever enough to cheat at it.
why would coin flips be dull?
@@NoNameAtAll2 It depends on how many repetions and individual tolerance.
Every time they get heads they eat 1 mango
You should meet the Australian blobs.
@@pablopereyra7126 Damn mangoers, inserting themselves into every conversation. I bet you like those simple, modern houses too, simpleton.
Imagine being the 3% of cheaters that had a weighted coin and still got only tails
skill issue
luck issue
NelsonMuntzhaha.jpg
get a better gaming chair
@@ozan____ *get (a) better (gaming c)hair
Am I dreaming? A new Primer video? Can’t be.
Ikr
Literally I thought I looked at the notification wrong
in 1 year another one releases
I had to double take when I saw primer in my notifications
It’s true. All of it. The blobs. The doves. It’s all true.
I shared this with several of my former professors and they loved it.
By far the one who had the funniest reaction was my sociology professor, who said "ensure the false positive rate is less than 0.1%." I asked him why, at which point he smiled this giddy evil smile and said "Then have the cheaters get labeled as mangoes."
I don't get it
Cannibalism!
in other videos on this channel the blobs eat mangoes 🥭@@coruscaregames
@@coruscaregames The blobs eat mangoes. My sociology prof was implying that, upon seeing a cheater eaten alive by other blobs, the cheating would stop because now the risk of being caught cheating was no longer just being called a cheater but a horrible gruesome end.
Jesus Christ...
Love this video. I teach a graduate-level class on experimental design, but many of the students have forgotten a lot of the basic statistics by the time they take it. I'm bookmarking this as a potential reminder/class aid. Thanks!
I feel the pain. I'm starting a master in a few months so I'm trying to relearn all the forgotten info from the last 4 years.
Almost like statistics and probability is dumpster information.
I love how you take (sometimes boring) statistics concepts and explain them in a way that makes it simple and enjoyable for non-maths guys.
Good job, my dude.
This would have really helped in ap stats last year lol.
I find statistics pretty interesting on its own
p hacking and all that is pretty fascinating
@@NorroTaku Well, so do I since I'm a statistician myself. The thing about learning stats, tho, is that while it has some nice applications and ideas to explore, the more dense and theory-heavy subjects are really difficult to grasp and, as such, require a lot of effort to have a ok knowledge, let alone really understand.
These subjects, that are brought up in Primer's videos are really cool, but basic. When you go down the rabbit hole about learning a lot of probability, inference, stochastic processes and such, you can get kinda annoyed by the elitist nature of the maths behind stats.
@@matheusborges7944 by elitist do you mean that people keep stats intentionally convoluted in a gatekeeping way or something else?
@@aurumble Basically, yes. Stats, as any maths field, is difficult by itself, but it becomes even harder to understand when 99/100 books/papers are written in the most complex and incomprehensible ways possible.
To actually understand something about maths (by reading books or papers) you, most often than not, need to already have an understatement of the subject or to be a genius.
Unfortunately, maths is written not to be understood but to "wow" the reader about how unbelievably difficult the subject you studied is.
the blobs have really evolved - from effectively wild creatures to gambling addicts
That’s the holy path for everyone who plays gacha games
xqc viewers be like
They are evolving, but backwards
Gambling with their feelings
Can't wait for the monkey/grape prostitution analysis.
Middle school me wishes there was a youtube channel or a teacher that made math as entertaining or as understandable as Primer does
Im in middle school and this help so much in my classes 🤣
Basically you'd rather be entertained than learn
@@RedTail1-1this is a really… odd(?) comment
Like yeah everyone wants to be entertained, but this video is almost entirely focused on education.
It’s not a textbook or a lecture but it’s still an educational video before it is entertainment, and sure you could argue that it’s much easier to digest because of the format it’s in, but that’s just how education kinda has to be if it wants to be effective.
@@RedTail1-1
he's a middle school at his story so i assume he's just starting to get to know mathematics as what it is in that part of the story
You can't enjoy something you don't understand so this blobs is definitely a good headstart into understanding math
Once math is understood you can now leave out the blobs and entertain yourself with other stuff that can be seen in math itself which interests you like visual geometry or statistics
After all we can't enjoy math if math are just a bunch of set of rules without logic and without a way to spark imagination, curiosity, creativity, and many more which sparks emotion which is technically entertainment
(But hey it's not like they're going to teach math if it's just a bunch of rules that doesn't do anything nor show logical reasons of its existence but that besides my point)
So in summary we look for different ways of entertainment that is present in different fields of study or maybe just entertainment as a form of headstart to understanding and use that understanding to create your own choice of entertainment that can be found in that field of study
Even expert mathematicians sometimes create new math problems that doesn't even exist in real life not because they need it but because it entertain their craving to look for more insight in math which is basically to satisfy their curiosity but not to change something for the better
My heart aches for those wrongly accused blobs
Chuckled
@LoveSweetDreams ok then let's hear your better idea.
@@zakir2815 what? They just asked if thats how the system works. They said no critique of it.
That one blob that got 23 heads out of 23 flips but wasn't a cheater
@LoveThemDames yeah basically
Let's say: if a cheater feels that someone is onto them, they will secretly switch to a fair coin.
Then, we're suddenly very deep in the domain of games theory.
A simple crossover that shows how quickly mathematical models become complex.
we could model this by a random chance that the cheater get this feeling, since it isnt based on anything that is contained in the game. Then it is again easy to model. But the mathematical model mentioned in the video IS already so complex that it defies the expectation of normal people who havent calculated this through. This is called the fallacy of small probability, or just the fallacy of small numbers.
what if the cheater has an army of fans that defend him or he admits to cheating
@@lilyliao9521 HAHAHA
May I propose an even simpler case that would complicated this; The weighted coins may change the odds of a heads anywhere within the range [0, 1].
@@activediamond7894 if this is a process that has a finite variance, it could be modelled just the same as the probability. If it doesnt, then theres no reason to even look for cheaters, since u cant get arbitrage from a process that is purely chaotic.
Given that the parameters are the game are: "The blob who flips heads feels happy," it might be possible that there are actually secret (deviant) tail-lovers amongst the blobs, who routinely cheat to "lose" more than expected.
blobdsm
Blobdsm
Blobdsm
BlobDSM: flip me harder
@@anastasiaklyuch2746 i regret so much
I leverage significance testing pretty heavily in my day-to-day job, and really loved the way you broke it down here! Particularly appreciate framing statistical power (true positive rate) on the same level as significance (false negative rate), when in most cases the conversation/focus really only extends to significance/p-vals. Going to spend a few minutes scrolling through to purge some of these spambots clogging up the comments, then on to the next vid :)
Isnt there a way to have better results with less flips? Make it conditional?
Like after 10 flips observe only the ones with 6 or more heads and then accuse those who had another 7/10.
From rest of the group accuse those who had 9/10 in second round.
I was thinking some similar. As the number of flips increases, periodically remove the blobs that are statistically not cheating. The total number of flips (all flips for all blobs combined) would be lower but, since all blobs execute their flips concurrently, the time spent flipping would remain the same. In both methods, there would a group of blobs that flip the same maximum number of times. Using the method in the video, all 1000 blobs would flip that maximum number of times. Doing it in phases would only have a portion of the blobs flipping the maximum number of times. But the maximum number of flips would be the same in both methods and would take the same amount of time to accomplish. I’d be curious to know if the “phases” method would be more accurate, though.
@@robmoffett2700 This is typically how sussing out bad actors works in the real world! Usually to perform verification you need to do a pretty simple step, following that only in the case of likely fraud certain users are passed to a more in depth verification path while others are passed on directly to the service.
I was thinking of using stages where getting a certain percentage of heads puts you in a group that tests with significantly more flips. That brings the total number of flips needed down massively over having a whole group flip 50+ times.
I was thinking the same thing and saw this comment, it's like "random checks" they do at airports for suspicious individuals. Granted, the definition of our suspicious varies from the rather racist definition of "suspicious" that people who work security at an airport do.
In this case, maybe it would be good to ask them to flip their coin until they get 5 (or whatever number) tails. Then we check how many flips they needed. Blobs that got unlucky will quickly prove themselves innocent. We'll need to find the right numbers (how many tails do we ask for and which number of flips is our suspicion threshold) in a similar manner as described in the video.
We could even simulate which test requires fewer flips overall.
@@dragoncurveenthusiast If an unfair coin has a heads probability of 58%, it's still relatively likely for them to get an unlucky streak. I thought in the video in the last sample they would just make one of the groups small enough that there were too many false positives or not enough true positives, rather than make the cheating odds a lot tougher to catch.
Same
So maybe you wait for the bayesian testing video, which will teach you exactly that....
This is what I call excellent science communication. You clearly explained the topic and used understandable terms. You got yourself a subscriber, and I look forward to more videos.
Are you going to watch old videos?
2:08 right off the bat, there’s an extremely important perspective you’ve shared, which is to prioritize the experience of non cheaters over catching the cheaters. After all, why catch the cheaters if the game is no fun?
Some video game developers could take a page out of your book!
*cough*nintendo*cough*
@@rayquaza1053 **cough** any root-level anti-cheat **cough**
and governments..
I agree with the sentiment, but I think that even this 'low chance of accusing fair players of cheating" is not acceptable. The idea of having a false positive in a system that would most likely have very severe punishments is awful, especially because its very hard to prove that you were banned falsely. Sadly this is what a lot of systems do nowadays, and we've seen some of the consequences of people getting falsely banned and having it overturned, but most of them were big youtubers/streamers etc. Any other person without a big following is fucked if it happens to them.
@@johntravoltage959 “the idea of a false positive is awful”
Literally impossible to avoid
I dont even watch these to learn a subject better, i just watch and listen to this for fun! Its actually pretty entertaining!
I think it would have been interesting to show an example where there are very few cheaters. The result of this would be that even though we might accuse less than 5% of the non-cheaters of being cheaters and more than 80% of the cheaters of being cheaters, we’d accuse more non-cheaters than cheaters, which can make it feel like the test is bad.
Also i think philosophically if this were a real game in real world if you get 1 fair player as cheater you defeated completely the purpose of the "sport" labeling the real goat of the game as cheater for a communist greater goal.
"If we accuse one person of cheating who isn't it destroys the sport"
Thanks, but I'll take the communist greater goal here. The goal should be to give the legitimate players the best chance possible, not to make sure that no individual legitimate players are ever wrongly accused.
Calling thinking about the big picture objective rather than being obsessively concerned with liberal individual rights "communism" is an excellent endorsement of communism because there is literally never certainty and you cannot fully eliminate type 1 (or type 2) errors while also usefully detecting cheaters.
In other words this is a nonsense "philosophy" that would, for example, imply that courts would have no choice but to always assume innocence unless the probably of innocence was *exactly mathematically zero.*
So i.e. If we have HD video evidence of a murder, your logic says that we should assume that the evidence is random chance and entirely due to compression artifacts and cosmic rays aligning the pixels just so.
Your philosophy forces a standard of evidence which is best described as "beyond unreasonable doubt." There is a reason absolute certainty is not required to convict in any court anywhere, because there is ALWAYS uncertainty.
@@petersmythe6462 Thanks for making that all up. I was about to go to bed and I needed a fairy tail to relax me communist.
Communists always have the best fairy tails things like communism works.
19:24 some poor blobs coin fell off the platform, look at bottom of screen
Lol
noo
*cues the climax scene from Jumanji*
Blob lost his "lucky" coin
Lamo
I took AP Stats a couple years ago, and although I did well enough to get college credit, this is really well made and does a great job of explaining statistics. I feel like I learned a couple of things, and got to see things like p-values as less arbitrary numbers. Really cool, you’re a great teacher, thanks for the lesson!
Came down hear to say something similar, glad somebody did it for me. Took the actual class in college is really only difference lol
I know right. Where was this guy a year ago when I needed him
I'm in love with idea how you approach the concept of frequentist hypothesis testing, which confuses so many. (from a statistics teacher). I will definitely recommend your video whenever we come to this part of the curriculum.
I wish you'd talked about what happens when most of the population is fair and there's only a handful of cheaters. In that case you end up accusing way more innocent people than guilty ones even if you set the false positive % very low. This is a very poor outcome. Unless you follow up accusations with another, more powerful test examining the coin itself to know for sure if it's fair. The followup test is more disruptive and probably more expensive so the purpose of the first test is to minimize the number of followup tests needed. In such a situation the first priority of the first test should probably be minimizing false negatives. It's inconvenient for the false positives to have to get another test to prove their innocence, so you still want to try and minimize false positives too, but you want to catch as many of the cheaters as possible.
No? We make em drop a coin a coin and record the distribution of it until only few persistent deviations from the rate of normalization remain. Those are the cheaters. 2020 US election stats was exactly that.
This would be a good follow up video to this one.
i thought the specific problem with that situation was that the cheater effect size was NOT what the test assumed it was, not how many cheaters there were relative to the population size
@@brasilballs yeah that was the issue in the vid, imay is mentioning a variant issue assuming a low percentage of cheaters...
essentially it's the issue of reducing false-positives by as much as possible which is a pretty major real-world issue ^^ (false convictions for crimes or unneccessary medical treatments, unfair game bans etc etc)
That was exactly what I was thinking.. lol
Great video, but I want to point out one thing:
the true proportions are also very important. Let's say you have 1000 blobs, and only 10 cheaters. With a 5%/80% test you would get 50 false positive, and 8 true positive. So only ~14% of the accused cheaters are actually cheaters.
In the real world big factors for the limits of such tests are the consequences: What happens to a patient with a false negative test? Will he die soon (eg. cancer), or will he just stay two days longer sick at home?
And what happens with false positive test results? Will the person go to jail (eg. drug test), or undergo harsh medical treatment (eg. chemotherapy), or will he just take a day off?
In praxis these issues are often solved by confirming the results with additional, different tests (in case of the blobs the manipulated coins might have a different weight...), but should still be considered, because these tests again have the same problems, and every additional test costs money and time.
We could consider this a test of the police catching a criminal. If the odds of being caught are high it will act as a deterent so fewer cheaters will actually risk being sent to jail by cheating. But if too many innocent people are sent to jail, the people will rebel. New name of game "Politics"
They usually do a second different test. One is
@@David-sp7gc Thats why he can modify the rates which causes the p-value to be much lower and your confidence rating in catching a cheater.
The 80/5 is a generic prediction for the model, but by no means worthy of actual implementation. When they test for cancer, not 1/20 tests are a false positive.
Your numbers would be extremely improved if you went to catch 99% of cheaters and accuse less than 1% of fair players.
Then, 99% of your predictions will be correct. If you have 1000 fair players, a 14 fair players, the odds of you picking a fair player is extremely low.
I really thought the same as you did! And I first thought he had modified the number of cheater in his 60% exemple (but demonstrating assumption aren't fact was also educative)
That's why the assumed effect size and our acceptable p-values and a-values are really important and require careful thought.
For refereeing a game, we'd want to assume a fairly large effect size and we would want a pretty small p-value. The reasoning here is that the test itself serves as a deterrent - you only need to catch the most egregious cheaters and the rest will take warning.
You'd also want to assume that in a game setting, players are tested fairly frequently. They're probably 'tested' during every game they play. So when referreeing a game, it might me more effective to measure the false positive rate per tournament or per year (like the way we measure failure rates for contraceptives). Because what we're *really* interested in is "how likely is it for a fair player to be accused of cheating during the entire tournament."
In addition, you can risk not catching all the cheaters using only one test, because they'll be tested again and again every match. Even at a 20% true positive rate per match, that amounts to a huge risk throughout an entire tournament.
I love the application to real life sciences. Probabilities fill everything we do and it’s a good thing to have a process for determining results from a limited pool of testing.
Great video it gives a very intuitive way of understanding probability, and also demonstrates how unintuitive probabilities can be.
I love this blob society. They only have one cop, and that cop’s only job is to find people who cheat at coin-flipping.
I think a good way to expand the simulation a bit would be to have different 'cheater' blobs that cheat at different percentages. Blatant cheaters at 100%, chronic cheaters at 75%, all the way down to situational cheaters at 25%. It's a big reason catching cheaters in games is so difficult; not everyone is always cheating in every scenario, and when they're choosing to cheat is often times super inconsistent. A really skilled player who also cheats is one of the most difficult things to detect, as well!
yeah for they have the skill, knowhow, and reputation to know: what cheating looks like/what legit play looks like, the skill to execute legit play, and the community goodwill to not immediately arise suspicion.
@Test to expand to that setting, it's a bit more complex because you'll need to explain Bayesian statistics - you'll have explain what prior distribution you're assuming for the cheaters is and derive/simulate a posterior distribution for the coin tosses and how exactly to calculate the probability that the player is a cheater.
It's an entire topic of its own and really not that straightforward of an extension from the frequentist case, but a framework exists to do that.
You could expand it even farther by adding "scammer", "blatant/subtle", "sore loser" etc blobs to the mix. Make it so that instead of getting a fixed amount of mood improvement or souring per flip, they can make bets. Have a variety of cheaters that use coins that vary from 51% heads all the way up to ones that just blatantly get head 100% of the time. Have blobs that start cheating if they get a streak of bad luck. Have blobs that switch out their coin for a weighted one whenever they bet big. Etc. Then you could also factor in the net loss/gain of their mood over time into the cheating algorithm.
And of course, blobs who are stupidly generous and choose to lose on purpose. Which actually does happen sometimes whilst cheating in video games
I've played this every year for a year, brilliant game.
maybe he is saying that he only played once.
You meant to say "every year for a century"
every year for a day
it's been up for less than 3 months, so... what they said could be technically correct, but not "every day for a year" or "every year for X years" 🤓
With that sample size, the game MUST be brilliant. I'm off to go play it now. Just once.
okay so I have 3 points to make :
1) I didn't forget that the 75% chance of getting head for a trick coin was an assumption so I feel very proud about that.
2) I was studying binomial and bayesian distributions earlier this year and I would have LOVED to have this video available then because it's 10 times more clear than any lesson I had
3) That's a great video as expected from you, keep up the good work because I fricking love watching them
I literally have my statistics exam tomorrow and this is a beautiful video to sum up one of the chapters wish he posted this earlier
real life nerd emoji
1) You should feel proud! Good for you!
I was super proud of remembering that too :)
2) Same, only it was a couple years ago for me
3) Agreed
I love how your the only TH-camr that makes the zoom to fit look good! Thank you for making my day!
very cool work as always
Looking forward to the Bayesian version. Would be cool to be able to estimate exactly what the distribution of cheaters is. Like, what if you don't just get a single type of cheater, but a distribution of them? Perhaps distributed by some Beta-distribution, with most cheaters going close to 50% and very few going for much more audacious near-100%
Like setting your headshot percentage in a cheat for a shooter game
Yes
Thomas Bayes was a god amongst men, probably.
@@maracachucho8701 *slow claps*
Bayesians are like vegans of science: they are a minority, but boy are they a vocal one, and seems like all of them gathered in this comment section! Yours sincerely, a frequentist.
If this was a real game, where the players expect fifty percent win rate,this would majorly break the game. Getting accused of cheating is probably allot worse than losing, so every fair player would have to factor in that risk before they played. Depending on how bad it is to be accused, players might even decide to use coins that come up tails more often than heads.
The unit on Game Theory and maximal strategies comes later (but is also fascinating) :)
It would be solved by some kind of manual checking of the coin itself on the request. Say, manual checks are too tedious to be done at large scale but efficient for anyone who is too lucky but still not a cheater
And thus cheating aswell...
Primer: "How To Catch A Cheater With Math
"
Dream: sweating profusely.
Technoblade: sleeping like a baby.
@@bungeethehuman6292 💀you just did not-
@@bungeethehuman6292 Not funny
@@bungeethehuman6292 bruh moment
@BungeeTheHuman 🤡
We need a blob history or lore drop about like how blob society works, and how the simulation effects their future life.
just commenting to say I have seen what you have done there with the cheetah at minute 12:50. Great video, enjoying it so far!
Looking forward to the Bayesian vid - that would be my approach. A nice way to contrast the Bayesian vs frequentist perspective is that.. frequentists condition on the “true” parameter and integrate probabilities over the random data. Bayesians, on the other hand, condition on the data and integrate probabilities over the (assumed) distribution of the unknown parameter. To me, this makes the bayesian approach more attractive, because in reality, we only know the data - not the parameter.
19:04 Keep an eye slightly below the stages, you'll see a few coins being flipped too far, falling off the stage entirely and into the void below.
lmao
What i like most about this is that the various steps of hypothesis testing are not presented as "here's the rules, learn them and apply them" - but rather as a set of designed/engineered choices aimed a a specific goal, and showing what goes wrong when you leave out one of the engineered steps.
I was just about to suggest a Bayesian approach, and then you said you are going to cover that in the next video. That is good.
I'm pretty sure that the best test method won't involve the same number of flips each time. It would look at the trend after each flip and accuse once the probabilities become strongly enough tilted in one direction or another. How far tilted depends on the cost of a wrong guess and the benefit of a right guess.
One thing you can do to lower the effective number of coin flips is to have dynamic decision making. You can establish a threshold to sort cheaters and non cheaters at every level while allowing uncertain players to continue testing. For example, if someone has the first 6 flips come up tails, you can stop testing because they are most likely not a cheater.
What about a cheater who doesn't always use a cheat coin? Cheats need to balance profit against chance of being caught. Runs of tails would throw algorithms. Just asking
@@tim40gabby25 what you are talking about is effectively changing the cheating effect probability. We aren't testing for cheaters, we are testing for cheating, so as long as they aren't swapping coins in the middle of a test, it doesn't change your approach. If you wanted to monitor cheating in a game in the real world, you would need to adapt your detection strategies to account for subterfuge like the inconsistent use of cheats.
Yeah, you'd need a more sophisticated algorithm to catch players who only cheat occasionally, and eliminating folks due to runs of 'bad luck' often doesn't work in real life scenarios.
For a real world example, take Fall Guys, which has had its fair share of cheaters over it's lifespan. (Also using this example because I just lost a final round of Hex-a-Gone in which someone was using a flying cheat so it's fresh in my mind and I'm malding.) It's well known that many of the cheaters in that game often only use cheats if they make it to the final round, meaning there are less people left to report them, and they suffer enough pre-final losses to not look suspicious on paper.
Of course, in this scenario where the cheating is obvious to their opponents a reporting system *can* work, but only if it's well implemented and has some protections against false reporting. Like a player is only marked 'sus' once they receive enough reports matching the same parameters, or something like that.
@@LowPolyPigeon Interesting. Thanks
@@tylerhale8679 Interesting. Thanks.
I would have love to have you as a teacher. You're just awesome at explaining, this is because you explain the basis and the "easy" stuff and I know how hard it can be for people who knows difficult stuff to forget how the basis can ben hard for novice. You don't, and I admire you for doing so
I first watched this video before taking a stats class. I now watch it halfway through a stats class and understand it so much more.(first learning about power this week!)
Blob Maths and Blob Biology are cool, but I want to see Blob Society more.
they ask what is blob but they never ask how is blob
@@iamnotahuman2172 We saw happy blobs, sad blobs and dead blobs. I guess that is how blobs can be.
Me too
I want to see more blobs interacting
"We live in a society."
- Blob Joker
19:24 Shoutout to this red dude at X=12, Y=2 throwing his coin off the platform.
yea lol thanks for giving out the coordinate of that blob
@@lephantriduc There is also another red blob at 19:11, X=12, Y=4 that yeets it off
2:37 that is amazing
I thought I was gonna watch blobs with infidelity.
I love this so much! As a game dev I'm desperate for anti-cheat. However gameplay time can prove dedicated people to get really lucky and having 1% false positives means that there's a much higher chance that actual dedicated players get lucky and falsely accused. That's why I believe it shouldn't be as brief as possible, but be constantly looked at. Like a history of average luck over all flips and over the last few flips, if the player has been playing for some time then the test becomes less brief assuming that they'll be more consistent with what they're doing. They won't randomly enable a cheat so theit test is more forgivìng and catches far less cheating scenarios, but if the history sees a sudden spike of luck then that's a sign to be investigated.
if the false positive rate is high then make the punishment mild like a 5 min ban or something
in addition to lowering the punishments for false positives, (imagine that content creators will not only be more likely to test positive(regardless of true or false), but also that they will have more runs at the test as they do it for a living) you may also find that you want to make a hidden strike system, and use that in there. some cheaters will cheat for a few runs and then play without them, which will mean that it starts to make it harder to catch them, so instead a trick you could use to catch them is to put strikes on a hidden list, if i get 10 heads by cheating a 100% chance, ill probably not go farther, and will return to normal flipping, to throw you off my scent, so you give me a strike for that, and even though its statistically insignificant for someone with thousands of flips, its still an oddity to be noted. as you perform larger oddities it can add larger sized strikes, until a certain number of points you just say "hey bud its pretty clear that youre cheating and then trying to hide it by playing badly afterwards, come in to our labs and do a game and do as well as you do in those good runs if you wanna not be banned" as an example; 10 wins in a row can be assumed to be about a 1/1024 chance regardless of the game, as long as the chance to win is 50%. if you win 10 times in a row, thats 1 point on your strike file, if you win, i dont know, 15 in a row(1/32768 chance), lets say its a 30 point strike (on top of the fact that 15 flips in a row as heads already will count as 6 points on their record, as it is 10 in a row, and 11, and 12, etc), and lets say that if you get 100 points on your record, then we give you the talk. make each incidence stick around for X flips where X is past the 1% false positive rate, and you will see that even so, the majority of cheaters that try to get around simple probability by playing with cheats very little of the time, they still will proc this method of detection.
You can create a model based on their ELO. Then use a standard distribution relative to the elo to preselect outliers. Then you can run a series of tests on these outliers.
@@nou5440 while that would work, the punishment would also have to be annoying
Something like changing the player's name to "I cheated" for an hour, and preventing them from changing the name back for the hour could work
@@lavantant2731 the only thing about that is that it's usually not a flat 50% chance to win or lose. Even with some kind of matchmaking. There are a lot of other factors at play.
1:02 that blob with 0 heads be like me fr
IS THAT THE ECHDEATH FROM TERRARIA?
This guy takes the most simple of topics, makes them easy to understand, and makes learning fun!
I think the main outcome is that it is very difficult to catch someone cheating using statistics. You should use statistics to raise suspicions, but you shouldn't call them a cheater without further evidence. 5% or even 1% is just way too high of a false positive rate for 1000 players.
This is so well put. Can't explain how much I appreciate your ability to slow down and put this into a way anyone can understand.
Compare this to creators like Veritasium (as much as I love him) and they're fairly content to speed through complexities, leaving laymen in utter confusion.
The title made me laugh, because it reminded me of that truly ridiculous (but very entertaining) speedrunning drama around that minecraft guy, dream. Catching cheaters with math is a useful and very fun skill!
There were people involved with catching speedrunning cheaters that said that he probably was not really cheating (on purpose) because his actions didn't make sense if he was.
Regardless, the game was modified (on purpose or by accident). The outcome was way way waaaaaaaayyyyyy too unlikely.
He also misbehaved pretty badly, trying to cast doubts on the statistical methods...
With how low the odds were, even if the methods had high uncertainty the game would still be definitely modified
@@didack1419 That's true, you can't really prove evil intent with math - like whether the game was modified on purpose. You can only show _that_ it has been modified.
@@baguettegott3409 The reason it's decently likely that the game was modified by accident was because he already modified the game to make other content, so it's reasonable that he could have left the modification without realising.
It's weird that he didn't say that from the beginning, because it was a reasonable excuse. I don't know. I guess he thought the mods were against him or something so they wrote a paper blaming him of cheating.
Or he really cheated and thought people would not buy it.
Or he was really sure he had the game unmodified, which is naive.
@@didack1419 he admitted later on that he did purposely cheat.
"It's very easy to forget that assumptions are assumptions and instead just treat them as facts."
Truer words have never been spoken.
This is great. I would like to use this to introduce my kids to frequentist statistical tests and related notions. It also seems like it could be used to make a pretty good illustration of the scientific method if it were retooled just a bit at the end to explicitly state a hypothesis that you were testing and to explicitly discuss the steps (already implied) that you take to comply with the scientific method. Apologies if there is something in your catalog along those lines that I've missed. But I would love it if you could do this. Also, am I right to assume that there will someday be a Bayesian version of this video? Waiting with bated breath. These are great conceptual frameworks to use for education. Thanks so much for doing them.
If you force it on them they’ll hate it 😂
It's kind of fun to watch videos like this after hearing all sorts of similar (yet less entertaining) lectures from a friend that's soon going to have a PhD in Biostats. Can't wait to see what you can pull up for the Bayesian method!
15:56 The froggy hat this blob has is absolutely adorable!
I learned more in 10 minutes of this video than in an entire stats class in university. You have a real talent! Thank you for all the amazing content! I have never missed a video and you have taught me so much
It might be a cool idea to narrow down the blobs after a certain amount of heads, that being if their initial 5 or so coin flips is suspicious and what a cheater might have, place them into a group for further testing, which would make it not bother so many of the innocent blobs while at the same time providing more accurate results
Don't forget that (even at 0.75) only 24% of cheaters get 5 heads in a row... so if you use 5 heads then your method can NEVER catch more than 24%, even if subsequent tests are perfect. If you widen the net to include 3 out of 5 you get the 80%+, but also half the fair players. Ultimately you will need about as many iterations to get to your result... in fact without doing the math I'm guessing you cannot reduce the total number of flips in this way, but would be interesting to see for sure.
@@grenvthompson 13:00
@@fabio_fco Did you link this timestamp with no additional context as an arrogant way to "educate" the person you replied to? Or maybe you did this to showcase future reads, like me, the moment in the video which helps support Gren's post.
JJ's idea is great - what's if we cut the group after a certain threshold? We stop watching the people who are doing poorly because they couldn't possibly be cheating. This is a good idea because our next group of players will have more cheaters than fair players, and we can increase our "cheater catcher" threshold for this second group because we know for a fact there's more cheaters.
Gren replied with a confirmation that cutting the poorly performing participants will not INCREASE our True Negatives or True Positives - it will only help narrow those down. So if we were to focus on the 5 out of 5 table, we'll have a group consist of many cheaters and a few fair players, but we've already excused a few cheaters because they were performing "poorly" (enough).
I like both of these ideas, but feel like 3 out of 5 is too inclusive. In an even group of 1000 blobs (50 : 50 ratio), performing this split will leave us with 700 blobs in a 36:64 ratio. That's 36% of the 700 are fairplayers and 64% are cheaters. With this new group, we can perform a less forgiving threshold to the fair players that catch more cheaters, but that's still a lot of fair players being caught.
I think if we increase the initial threshold (instead of 3 out of 5, maybe 4 out of 7) and we create a less forgiving second threshold (11 out of 16, for example, I just randomly picked) - we can try to fit within our 5% and 80% rules. Notably, we if we combine the maximum number of flips for these two groups, we'll get 23 flips. I'll do the math later, but we could already pass our test with 23 flips - so, as Gren mentioned, this is only a valid strategy if we can use it to reduce the number of total flips.
@@grenvthompson or for the first round, 3/5 heads leads to further analasys. or use the first 10 flips, and if someone gets 7/10 heads, then investigate further.
guys can we have a f in chat for those 3% cheaters getting 5 tails , probably purposefully did it like that cuz they depressed
It's like walling in a shooter game but still getting shit on. lol
I did actually remember the 75% assumption. I was thinking the final test would be to give a more realistic distribution of how cheating works, i.e. not the same for every blob. Of course it still averages out, but the ones cheating 'less' would be harder to catch.
Omg I've watched people treat assumptions as facts and screw themselves.
Great video! Love all your content.
Your out-of-the-box thinking and unique perspective turned an otherwise mediocre presentation into a fantastic one *johnson spy* . You did a good job of catching the mistakes and keeping us from wasting time and by taking the wrong path. Your attention to detail really sets you apart from the crowd. Great work! Jack, Your great work has resulted in tangible, beneficial results to me. You’re a force to be reckoned
Don't go through your partner's phone. Not only is it a huge violation of your partner's autonomy but it's also just plain pathetic. Besides what will it get you? You'll either ruin your relationship by finding out they're cheating or ruin your relationship because they weren't but you still didn't trust them and went behind their back. If you feel the need to do this *JohnsonSpy* all this I saw it's just heart wrecking.
@@samuelbennett6981 are we still talking about blobs here?
Wtf why are there so many of these bot comments on this video????
How tf does this have so many likes lmao
@@saturnsandjupiters358Probably because it has ‘catch a cheater’ in the title so the bots are trying to capitalise on people who want to know if their partner is cheating
I struggle with statistics class and though this video does not directly tackle my questions, I really appreciate the time you take to explain them clearly, especially regarding the false positive ratio. I hope to see more blobs soon
I think the best solution is to compromise by having a higher false positive test, and then do another, more rigorous test with the smaller group of accused blobs. That was you can end up with a lower number of false positives overall, but still be expedient
Medical tests will sometimes do this - and because of the different composition of the resulting group of accused blobs, even just running the test a second time can yield pretty good results.
My head hurts so much after few flips with those False Positive and False Negative . My mind went blank. Have watch again, and again and again. . . To infinity
If this was a real world example, I'd expect the cheaters' coin's bias rate to vary from person to person. I wonder how much more complex that'd make the math.
Thanks for the video.
Another variable would be ratio of cheaters. What if it's only 10%? More false positives for one?
If there are alot less cheater (like less than 5%) you would end up catch more fair player than cheater even though the percentage is still the same.
@@koharaisevo3666 A (much) larger sample set is then required, but conceivable.
Honestly loved the little lesson about assumptions. So easy to forget, yet so impactful on our perception.
I've literraly had my propability and statistics exam yesterday. It was sooo boring to learn. However i love your videos and how they spark my motivation so i can implement these topics to programming. Thanks :D
hah I finished that course last week! (5 days ago)
i love how you make math so accesible and understandable to everyone!
8:00 I think the term you're looking for to describe the false positive rate is specificity - in statistics the specificity of a test is the probability that a test negative case represents a true negative, its inverse would be the false positive rate as defined here (this example test has a specificity of 97.6%)
I love your careful explanation of the false positive rate and how you couched it in your experience of mixing up the terminology. A great tool for getting us to pay attention and notice the nuance without making us feel stupid!! Absolutely superb teaching!
At the end of the video i was actually shocked that 22 minutes had aldeady passed. You made such a good job at explaining this topic in a very entertaining and instructional way. I learned a lot and you gained a new subscriber!
19:12 F for the blob who lost their coin (bottom, center a bit to the left). He will never prove to anyone that his coin just fell though the floor and his life is ruined by realization of living in a simulation.
LOL i was going to comment that
If you’re wondering, it’s the red beanie blob at position (12,4).
Also at 19:23, 2 below the previous blob 😭
Man I had a feeling you were gonna switch up the cheater odds with the mystery test, also great video mate I love these math lessons.
I need a looping gif of the blob flipping coins
same they're so scrung
Can’t believe you uploaded what is essentially a really interesting intro to statistics lecture
I know this is a few years late, but this was a phenomenal demonstration of intro and intermediate concepts of statistics. Absolutely plan on having my students watch this to generally understand the broad statistical test that medical articles/studies use for statistical significance and analysis. Thank you!
When you're a mathematician but but yo man being suspicious:
yo
sus
I've just watcher 3Blue1Brown's video about "probability of probabilities" and I think this concept would fit very well for this problem. We can plot a probability density function of "coin's head probability". And then a player will flip the coin until we are satisfied with the graph and comfortable to label them.
Had a lot of fun with this one! I just took AP Stats this year so this was a nice refresher to what I learned there. I might be wrong on this (and if someone who knows more about stats than a wee high schooler can correct me in that case), would it be better to design a Chi-Squared GOF test for this? Since we don’t know the probability of heads on a biased coin, I feel like it might be slightly more effective even if a tiny bit more complex. Each blob would need to flip at least five times (although if this is being done on a calculator, it should be an even number since decimals make them grumpy for some reason), and we make the expected values half that number. That was my first thought when Primer asked us to make a test, and I know they can be super easy if done on a calculator. I could also be really stupid and that’s what the next video will cover 😂
If i recall correctly, chisq GoF tests are primarily used to determine whether or not our experimental results differ from our theoretical results (and whether it is reasonable then to suggest that an alternative hypothesis is true)
In this problem, we already know that there are cheaters abound in our experiment, so our results should, no matter what, be skewed towards having a higher number of heads than tails.
@@derpsalot6853 That’s a good point! GOF might not work then, especially since I think it requires a list with more than one value, which obviously doesn’t work in this case. Chi-Squared could possibly still be used if a single contribution is calculated for a single blob and compare that to known critical values, but that needs a chart with all those written, and I doubt the tester in question would feel like lugging that around!
Thinking about it, this video also demonstrates why math isn’t always the answer, because perfectly spinning the coin around the center on its side would always reveal if the coin was fair, because the weighted ones would wobble.
Never seen a more clear explaination of hypothesis testing in my whole life. Thank you.
could you work with 2 thresholds, 1 where if you get below it you're assumed to not be a cheater, and 1 if you get above it you are assumed to be a cheater; if you fall between them, additional coins will be flipped.
This way, the amount of coins that need to be flipped can be lower for the very suspicious and very unsuspicious groups, and only the more uncertain group needs to flip additional coins.
Came here to say this, but thresholding would be more difficult as we have to figure out an "optimal" way of distributing the 80% true positive and 5% false positive
Absolutely, and this is indeed how we design tests in the real world. The simplest example would be, if we have someone who hits 16 heads in a row from the start, we don't need to continue with any more flips :)
Sequential analysis is a whole other can of worms, especially in the frequentist realm.
@@harrytsang1501 I agree that this is harder to optimize, but you don't have to go very far to make it better than the test mentioned in the video.
at 19:12 you can see a coin fall off the map XD
You add a second step retesting the coin after accusing a player of cheating to pull false positives lower at the 16-17 threshold. The chance of hitting 16+ more than once is so small that it would essentially be a 99% chance they were a cheater, and if it was just a lucky go there would be about a 2% chance you would falsely accuse someone.
Every video, we’re uncovering the culture and customs of these blobs. I guess in this one, we learn they’re gamblers by nature.
Great review of some of the content I learned in statistics class in college! Love it!
Years have passed, but the videos still have the same kind of value in them. That's really impressive
I love the idea of these simplistic blobs becoming gambling addicts, relegating the detectives to statisticians
I really enjoyed seeing a lot of the stuff again I learned back in middle and high school in mathmatics. Statistics can be tricky sometimes, but they are so much fun, too.
Such an essential skill if you are interested in any kind of science.
Also fun to learn the english terms as well, since I actually did not know a handful of them.
Yup, the most treacherous thing about statistics is that every instance is independent from all others.
To take the coins, yes they would have a 50/50 chance for either side (or rather both slightly below 50% since a coin is a cylinder and has three sides) and if you flip it a million times, each side should come up equally, but each individual flip happens independent of the others (each flip doesn't change the odds for subsequent flips), so having 1000 heads followed by 1000 tails is improbable, but not impossible.
I just realized. We dont have to technically make assumptions for rates of cheater coins to make an accurate test that fits our criteria. You could just make a test that tests for fair players since we know the rate at which a non cheating blob will get tails. Namely 50% of the time. So instead you could just design a test that labels a blob as fair more then 80% of the time, and only labels cheaters as fair less then 5% of the time. I mean, it would require more data then testing for cheaters, but you could get something fairly accurate.
This is in fact the most commonly used form of frequentist tests, where you only check against the “null hypothesis” and nothing else is explicitly used in the maths. Implicitly, there is still of course some true effect size, and your test has some power to detect it, depending on your chosen sample size. If you care, you can calibrate that sample size to detect effects of at least some given size. In practice, especially before people rang alarm bells about the “replication crisis”, you sometimes saw little care for sample sizes, beyond convenience or custom, as if the researchers were unaware how this limited their results’ possibility and accuracy.
Well it just depends on your starting point.
You could start with the assumption that you're going to take 40 coin flips, and then calculate the power of your test at p=0.05.
Or you can start with an effect size and a p-value and determine the sample size you need.
Or, you can take a sample size and an effect size and calculate a p-value.
Unfortunately you still have to make the same assumptions about the effect size (probability of cheater coins getting heads) even with this test, as you still need to provide an assumed effect size to determine whether you test is labeling cheaters as fair 5% of the time. In fact, as the shape of the frequency distribution of cheater coins getting heads is dependent upon the effect size, and the probability of a given test not catching the cheater is dependent upon this distribution, it is not possible to provide the probability of not catching a cheater without first using some value for effect size to generate the frequencies of cheater coins getting heads.
17:11 I just love how some coins fall into the void.
Title goes with your user picture so well
Outstanding video! Assumptions, models, technical knowledge, evaluation, and beautiful communication - a mathematical masterpiece!
Love watching the coins fall off of the tables in the simulations ❤
Are you an actual teacher aswell? These video’s are incredibly comprehensive and engaging. I love this!
Probably not, knowing how TH-cam works
I want to see what this guy's studio looks like. He does this with just a laptop?! Unbelievable work.
I love these illustrations :)
And yes, I remembered that assumption of 75%, so I expected the results to be skewed in that regard in the final example. Maybe it was just because that was a question that bugged me from the very beginning, before you even mentioned the 75% assumption, how one would go about figuring out that threshold
Salute to the 4 poor blobs, that genuinely got a lucky streak of coin tosses, but were accused of cheating anyways ;_;7
To get the threshold, you just gotta do a lot of flips on some blobs until you get someone who is definitely cheating. Maybe create a watchlist of the blobs who are most suspicious (won too many competitions) and pick those. If you flip a coin like 1000 times and the guy gets heads 550 times, there's a very high chance they're cheating. You also have a very good approximate of how rigged their coin is (around 55% in this case). Find a couple of cheaters and see if they also land heads around 55% of the time or if different cheaters rig their coins differently. If you find out it's the same for most cheaters, you're set. If it turns out that different cheaters are rigging by different amounts, you would have to adjust your test to catch the most obvious cheaters. Logically, it's more important to rid the blob world of a cheater who rigged his coin to get heads 100% of the times than one who rigged their coin to get heads 51% of the time. Lots of video games take this approach. Cheaters have to use subtler cheats to avoid getting banned. Can't enable god mode or something these days in most games.
In the real world those blobs would get a closer examination. Like the anti cheating blob checking if their coins are legit physically. You should never face real punishment on a 5% threshold, but it's reasonable to face an inconvenient closer examination on that basis.
Lovely video. Missed opportunity to examine what happens when 99% of the blobs are playing fairly. What is the probability that a blob who is accused is a cheater?
I think in these case the overestimated side becomes the positive one, you will get like 95-5 instead of 99-1? That means that if you have no true cheaters you will still get a result of 95-5 even if you actually have 100-0
the probability is the same no matter the number of cheaters
@@PlaXer Not the case! When half the blobs are cheating, an accused blob is usually guilty. But when only 1% of the blobs are cheating, then even a fairly "good" test (i.e. one with low Type I and Type II error rates) will accuse more honest blobs than cheaters. This is why, famously, women in their forties who get positive mammogram results (with no family history) usually do not in fact have breast cancer -- it's not that mammograms are terrible, it's just that breast cancer is a rare condition in that age group.
I had previously misinterpreted your question. If there are few cheaters I guess almost all the accused are not, so the probability gets quite low, also considering that many cheaters will be overlooked. Odds are you would have more false positives than actual cheaters. Like, times more
Can we get a behind-the-scenes video walking us through how you animate these videos? They're so incredibly well done.
Probably Blender (for the blobs) and any usual professional video editor.
"It's very easy to forget that assumptions is an assumptions and instead treat is as a fact", it's so true
Ok this is amazing...the assumption of cheaters having 75% chance was the most fair. 50% is the norm and 100% would be too stupid from the cheaters, so taking the middle ground (75%) would be smart. Yet that assumption still lead 80% of the cheaters to get away
And yet, one COULD argue that the test still accomplishes its goal! Kinda.
As a cheater, you want to get as many wins from your coin as you can. BUT, if you're caught cheating, you can't play anymore, and thus cannot win.
Now, assuming universal testing, and because the test is based directly off of how many "wins" you get, the surest way to play and win, is to make sure you don't win TOO MUCH, to avoid being caught. This introduces a sort of "soft-cap" to how much cheating anyone can actually do.
So, although more cheaters get through, they still have a notably deflated win-rate, meaning that cheating is less rewarding, even though they aren't caught.
Now, is it still a favorable ratio of "Cheated Wins" being prevented? IE, cheaters caught and thus annulling all of their dishonest wins, versus the cheaters who got in anyway but the number of dishonest wins they had to give up to do it?
I dunno, sounds like a LOT of math to me! I'll pass on that, but still a fun factor to consider.