Thanks for putting this together. It's very informative. Having created several sports games I think the best we can do as designers is be true to the type of player or team on each card or roster sheet. In the end even if we can replicate each and every situation the variability inherent in throwing dice or drawing random numbers will break any ability to absolutely recreate a moment. Then, when the first variability happens it throws off all others after that. Even if we could recreate all conditions and outcomes on the field there is gamer variability. Each gamer will choose differently how to play the game. Just think of all the things that could have been different in one week of baseball for one team. Then how does that impact who is on the mound, in the field, or in the box. It's mind boggling. So I quickly learned that relative accuracy is worth shooting for. That said, how do you distinguish between how a player performed and who he is? How do you make a Pitcher with a 3.80 ERA different than a 4.00? They are essentially the same pitcher. Either one could have easily switched places with the other with one bad or good start. I think it's good to distinguish between absolute and relative accuracy. Gamers must realize that absolute accuracy is not really possible--if nothing just based on statistical variability, which you previously mentioned. Even if it was, is the goal to completely recreate the game exactly as it was? That is a historical recreation, not a game. We love games because of the unknown. I think many gamers get that. I think the only problems are when there are consistent, glaring discrepencies from reality that are the fault of the game engine, not user error or variability. Good stuff. It's enjoyable to hear this sort of breakdown.
If one based a game on sampling without replacement the sequences would be different than real life. For example, one has all the 400 PA results of a Bonds and one drew from those results the order wouldn't be in the same sequence as they were in real life and the same for pitchers. The coverage of the parameters for the sampling method (pitcher/batter) to be used is for another day.
Regarding ERA, it has a strong correlation to how a pitcher pitched with men in scoring position. The correlation can't be avoided since 67-70% of all runs are scored with RISP. The importance of hitting with RISP, yes a smaller sample, with a higher chance of variance, and has a high win probability effect, can't be avoided when applicable. If avoided, you can see the complaints about how the 1969 Mets have trouble performing as they did in real life. Due to big differences in their hitting with RISP compared to their overall average. The accurate variance of the real life RISP is missed more than hit, so their win probability is lost to a degree. So for accuracy the variance off the smaller sample RISP is a better representation of the 1969 Mets.
A fine analysis. Something I wanted to see for a year. I would think you’d want to do that again but with something more like 25 or 30 random pitchers and their averages. Then analyze a few batter cards against those averages.
Thanks for your comment. Even though I had/have no intention of making my own cards, this specific analysis started out to see if it was possible to do so. (I think the answer is "maybe".) I decided not to spend a whole lot more time on the analysis because I concluded that the cards appeared to be directionally accurate. You may also want to take a look at my two other videos that I did on Payoff Pitch, where the primary goal was to see if the pitcher cards are accurate.
ive played 80 games now with payoff pitch and for accuracy, flow and ease of play, it is the best ever. i am a 60 year old man who started playing apba in 1978 and strat in 2012. I have enjoyed payoff pitch to the point where I now have the 1963, 1969, 1976, 2022 and the fictional season of 1983. I really enjoy how well the cards are made, how nice they look and how fun the game is to play. do yourself a favor and grab your favorite year and give it about 20 games and you will be hooked for life.
It is on my project list to do the Pittsburgh Pirates 1972 replay. I think once I get over the learning curve of baserunner advancement and a few other aspects, I’ll be good to go. My current thinking is to use physical dice for the pitcher-batter interaction and the FACs for everything else, to the extent possible.
Thanks. It's a pretty good game, especially the pitcher-batter interaction. I roll the dice for the results but have been relying on the Fast Action Cards for baserunner advancement and Error resolution.
Steve what do you think about the outfield arm ratings for Payoff Pitch? Apparently they are based on how often a runner tried to take an extra base on the fielder…to me if they were based on actual assist data (with reputation factored in) it would make more sense….if you have a great arm nobody will try to take an extra base on you, and he would rate that fielder a 10 because of that….for instance Caesar Geronimo (in 1976) was know to have a cannon and he rates him a 10….Mickey Rivers was known to have an awful arm and he rates him also a 10….makes no sense …..
In fairness, I think outfield arm is the hardest thing to rate. I haven't studied POP arm ratings but I have heard that there are at least a few questionable defensive ratings in the game. While there is some logic to the POP approach, as you pointed out, it can lead to some unusual ratings. I think part of the issue is related to small sample size. When making my own cards from other sims, I'll either use the Strat-O-Matic rating for reference, or rely on outfield assist data. Using assist data isn't perfect. In summary, rating arms based only on statistics is problematic, as you suggest. Unfortunately, when there you are rating literally a thousand players for a season, using stats may be your only choice.
The differences could also be due to how you played with the Brewers because the smaller the project the more the gamer is susceptible to the dice. A bigger project, like a full MLB replay, would give the "dice" more of a chance to go through the hot and cold spells and even out. Also, with any replay, there will be some players that will overperform and some that will underperform and the only way to see if the set is accurate is by comparing the averages (like H/9, BB/9, SO/9) of the MLB totals.
Thanks for your comment. In a small simulation, even a few thousand plate appearances, statistical variability will certainly be a factor. I believe you saw my follow-up video where I showed how you can determine the outcome of any pitcher-batter matchup with direct formulas. At least that approach takes the “dice” element out of it. From all I’ve read, POP is pretty accurate. The designer is a smart guy and has put a lot of thought into the game. I saw a post where a guy did a replay of all of Yaz’s plate appearances for a season. The stats came out pretty close. Thanks for watching and subscribing.
I appreciate your analysis. It’s a fine game. Very enjoyable. Relative accuracy is more important to me than anything else. In other words, are the game events plausible to what might have happened in real life? It would be foolish to try and make a game that replicates the real season exactly, not to mention impossible. And more importantly for me, how is the player experience? If a game is going to grind me down with multiple rolls and charts for every at bat then I’m not going to bother with it no matter how accurate it might be.
Well said. Although, I tend to do a deep dive into the math behind these games, in the end all I want to know is whether they are directionally accurate enough in an absolute sense to produce plausible results (to use your words). And I concur about the importance of relative accuracy. Thanks for your comment.
The largest issue that I have with this game is the limited variability chances on the Pitchers Card, Your reading off that card’s single column of two 6-sided dice roll every at bat. On Steve Carlton’s card over half the results are the same, That’s to much predictability for me, Especially when the wheelhouse is a number 4 or 9 on some pitcher cards, I had 4 HR’s off a Pitcher in one Inning. They need another column of results or use the 2 D-6 dice with a different way of reading them like 11-66, To allow for more randomness and less predictable results off of the Pitchers Card. The Batters card allocation’s are fine, There based on actual data and displayed that way. I don’t know what formula is used for the pitchers card, It doesn’t work in my opinion.
The game needs to be proven by running 20 to 30 replays via a pc. Of course, this game isn't a pc game, yet. The 50/50 model or a game based on pitcher faced influence is about as accurate as one can get. For example purposes, a Carlton gives up a .215 average in a .250 average league. Every time he faces a batter he would reduce their normal season average by .035. In this case, Monday's average would be dropped .035. This second method isn't exactly 50/50, but always applies the pitcher strength or weakness in every PA. Plus, assures his pitching performance can't be skewed, at least, by dice rolls on loaded batter cards. I don't see this in their way of game design, thus far.
Hi. Thanks for your insightful comments; they are much appreciated as I'm always trying to learn more. By replays, I assume you may mean season replays. In any case, I am familiar with the math behind the 50/50 model and agree with your comments about the PoP engine. Based on what you've said, I believe you have seen my other videos on PoP. In one of the videos (th-cam.com/video/-_PHvLSDHQM/w-d-xo.html) I figured out that a simulation wasn't required to compute the expected outcomes of a specific pitcher-batter matchup. Regarding the game engine, my primary conclusion was that there may be more going on behind the scenes with the design/forumulas than I could discern. The developer has not shared his formulas, but I respect that, because guys like me would be tempted to build their own cards otherwise. One other thing I did find was an online article where a guy (Ron Bernier?) did some sort of Yastrzemski replay analysis. His results were directionally accurate, although some specific stats were a bit "out of whack".
@Steve Etzel Another example missing in game design is the actual importance of hitting with RISP. It's not a great weakness or discrepancy, but it is important in win probability. For example, a team like the 1969 Mets that hit so much better with RISP than other at bats get hurt by non-specific RISP hitting. Teams with the most hits in a game with RISP win 70% of the time plus roughly 67-70% of all runs are scored with RISP. The other 30% of wins are won on solo or runner on 1st base only homers in a game. Another weakness of the 1969 Mets not being a homer laden team with that discrepancy in game design with RISP.
If we take Stratomatic as an example, then the 50-50 system is inadequate in many ways. On the "basic" side, batters who never hit a homerun all season can easily hit several off the pitchers' cards during the season. Even going to "super advanced", there is no accounting for the individual matchup being accurate whatsoever as to strikeouts, walks, or extra base hits (except for making some hitters "Weak" so a homerun becomes a single off the pitcher's card). A 50-50 system is too "flat" without some cross adjustments. That a 50-50 system can average a pitcher's and batter's batting average isn't saying much.
@somguy3872 The super advanced game addresses 50/50 anomalies in the pc game and some on the C & D game. If a game isn't 50/50 to address anomalies it has to take from one and give back to another. If this isn't done correctly it too will have flaws and data skewing. Like I said before the pitcher cards can use 4 columns and the batters just 2 columns. In this case the batters would have cards loaded with hits on just two columns. That is a 67 to 33 system, but requires an adjustment in formulation to obtain the same accuracy as if based on a 50/50 system. One could have a 5 column pitcher with 1 column batters, then some batter performances require the luck of getting more of their performance from the pitcher card rolls, not good. Now, you have batter A a .400 hitter at a disadvantage of repeating his performance compared to batter B a .250 hitter. If a game gives more control to pitchers for the non BABIP data then the batters have to control the K's, BB's, and HR when addressed to their card. However; there will still be variance in trying to get the right pitcher matchup to the right batter at the right time of the rolls.
@@johnnysmoke612 Essentially, in Strat, one (figuratively) flips a coin to decide whether to read off the batter or pitcher card. And then all you get is a result of how the batter (or pitcher) did against the league average pitcher (or batter). Strat may be statistically the most accurate game out there (or so claimed by its fans), but it does not try to do much at all. It is a very flat system. In other games, which do try to capture some meaningful interaction, even if they aren't as statistically accurate (let's just say), makes the batter-pitcher interaction much more meaningful.
In situations like this, you have a hard stop on AB's (76) in season replays, or a 17% chance to play using percentile dice in a single game replay. Easy. I'd rather have all the fringe players than not have them.
Great game, the only wish I have is that there were righty/lefty splits on the pitchers. Fantastic game though
Thanks for putting this together. It's very informative.
Having created several sports games I think the best we can do as designers is be true to the type of player or team on each card or roster sheet. In the end even if we can replicate each and every situation the variability inherent in throwing dice or drawing random numbers will break any ability to absolutely recreate a moment. Then, when the first variability happens it throws off all others after that.
Even if we could recreate all conditions and outcomes on the field there is gamer variability. Each gamer will choose differently how to play the game.
Just think of all the things that could have been different in one week of baseball for one team. Then how does that impact who is on the mound, in the field, or in the box. It's mind boggling.
So I quickly learned that relative accuracy is worth shooting for. That said, how do you distinguish between how a player performed and who he is? How do you make a Pitcher with a 3.80 ERA different than a 4.00? They are essentially the same pitcher. Either one could have easily switched places with the other with one bad or good start.
I think it's good to distinguish between absolute and relative accuracy. Gamers must realize that absolute accuracy is not really possible--if nothing just based on statistical variability, which you previously mentioned.
Even if it was, is the goal to completely recreate the game exactly as it was? That is a historical recreation, not a game. We love games because of the unknown. I think many gamers get that.
I think the only problems are when there are consistent, glaring discrepencies from reality that are the fault of the game engine, not user error or variability.
Good stuff. It's enjoyable to hear this sort of breakdown.
Thank you so much for your comment. I agree with everything you said. It's nice to find someone who is of a similar mind.
If one based a game on sampling without replacement the sequences would be different than real life. For example, one has all the 400 PA results of a Bonds and one drew from those results the order wouldn't be in the same sequence as they were in real life and the same for pitchers. The coverage of the parameters for the sampling method (pitcher/batter) to be used is for another day.
Regarding ERA, it has a strong correlation to how a pitcher pitched with men in scoring position. The correlation can't be avoided since 67-70% of all runs are scored with RISP. The importance of hitting with RISP, yes a smaller sample, with a higher chance of variance, and has a high win probability effect, can't be avoided when applicable. If avoided, you can see the complaints about how the 1969 Mets have trouble performing as they did in real life. Due to big differences in their hitting with RISP compared to their overall average. The accurate variance of the real life RISP is missed more than hit, so their win probability is lost to a degree. So for accuracy the variance off the smaller sample RISP is a better representation of the 1969 Mets.
Interesting analysis looking at the basic probabilities.
I love the game. Especially the card quality if ordered from the company.
A fine analysis. Something I wanted to see for a year. I would think you’d want to do that again but with something more like 25 or 30 random pitchers and their averages. Then analyze a few batter cards against those averages.
Thanks for your comment. Even though I had/have no intention of making my own cards, this specific analysis started out to see if it was possible to do so. (I think the answer is "maybe".) I decided not to spend a whole lot more time on the analysis because I concluded that the cards appeared to be directionally accurate. You may also want to take a look at my two other videos that I did on Payoff Pitch, where the primary goal was to see if the pitcher cards are accurate.
ive played 80 games now with payoff pitch and for accuracy, flow and ease of play, it is the best ever. i am a 60 year old man who started playing apba in 1978 and strat in 2012. I have enjoyed payoff pitch to the point where I now have the 1963, 1969, 1976, 2022 and the fictional season of 1983. I really enjoy how well the cards are made, how nice they look and how fun the game is to play. do yourself a favor and grab your favorite year and give it about 20 games and you will be hooked for life.
It is on my project list to do the Pittsburgh Pirates 1972 replay. I think once I get over the learning curve of baserunner advancement and a few other aspects, I’ll be good to go. My current thinking is to use physical dice for the pitcher-batter interaction and the FACs for everything else, to the extent possible.
Thanks for the overview as I know
Nothing about this game.
Thanks. It's a pretty good game, especially the pitcher-batter interaction. I roll the dice for the results but have been relying on the Fast Action Cards for baserunner advancement and Error resolution.
Steve what do you think about the outfield arm ratings for Payoff Pitch? Apparently they are based on how often a runner tried to take an extra base on the fielder…to me if they were based on actual assist data (with reputation factored in) it would make more sense….if you have a great arm nobody will try to take an extra base on you, and he would rate that fielder a 10 because of that….for instance Caesar Geronimo (in 1976) was know to have a cannon and he rates him a 10….Mickey Rivers was known to have an awful arm and he rates him also a 10….makes no sense …..
In fairness, I think outfield arm is the hardest thing to rate. I haven't studied POP arm ratings but I have heard that there are at least a few questionable defensive ratings in the game. While there is some logic to the POP approach, as you pointed out, it can lead to some unusual ratings. I think part of the issue is related to small sample size. When making my own cards from other sims, I'll either use the Strat-O-Matic rating for reference, or rely on outfield assist data. Using assist data isn't perfect. In summary, rating arms based only on statistics is problematic, as you suggest. Unfortunately, when there you are rating literally a thousand players for a season, using stats may be your only choice.
Some of the best things for me while playing with PoP are the automatic options for stolen base attempts and sacrifice bunting.
The differences could also be due to how you played with the Brewers because the smaller the project the more the gamer is susceptible to the dice. A bigger project, like a full MLB replay, would give the "dice" more of a chance to go through the hot and cold spells and even out. Also, with any replay, there will be some players that will overperform and some that will underperform and the only way to see if the set is accurate is by comparing the averages (like H/9, BB/9, SO/9) of the MLB totals.
Thanks for your comment. In a small simulation, even a few thousand plate appearances, statistical variability will certainly be a factor. I believe you saw my follow-up video where I showed how you can determine the outcome of any pitcher-batter matchup with direct formulas. At least that approach takes the “dice” element out of it. From all I’ve read, POP is pretty accurate. The designer is a smart guy and has put a lot of thought into the game. I saw a post where a guy did a replay of all of Yaz’s plate appearances for a season. The stats came out pretty close. Thanks for watching and subscribing.
I appreciate your analysis. It’s a fine game. Very enjoyable. Relative accuracy is more important to me than anything else. In other words, are the game events plausible to what might have happened in real life? It would be foolish to try and make a game that replicates the real season exactly, not to mention impossible. And more importantly for me, how is the player experience? If a game is going to grind me down with multiple rolls and charts for every at bat then I’m not going to bother with it no matter how accurate it might be.
Well said. Although, I tend to do a deep dive into the math behind these games, in the end all I want to know is whether they are directionally accurate enough in an absolute sense to produce plausible results (to use your words). And I concur about the importance of relative accuracy. Thanks for your comment.
The largest issue that I have with this game is the limited variability chances on the Pitchers Card, Your reading off that card’s single column of two 6-sided dice roll every at bat. On Steve Carlton’s card over half the results are the same, That’s to much predictability for me, Especially when the wheelhouse is a number 4 or 9 on some pitcher cards, I had 4 HR’s off a Pitcher in one Inning. They need another column of results or use the 2 D-6 dice with a different way of reading them like 11-66, To allow for more randomness and less predictable results off of the Pitchers Card. The Batters card allocation’s are fine, There based on actual data and displayed that way. I don’t know what formula is used for the pitchers card, It doesn’t work in my opinion.
Verlander allowed 4 HRs in one inning in 2016.
@@wombat7366 Funny how people playing just ONE game of any simulation is enough to claim the game's "broken"
No righty lefty for batter and pitcher??
The cards come in two versions: with and without L/R splits. Note, however, that only the batter cards have splits, not the pitchers.
The game needs to be proven by running 20 to 30 replays via a pc. Of course, this game isn't a pc game, yet. The 50/50 model or a game based on pitcher faced influence is about as accurate as one can get. For example purposes, a Carlton gives up a .215 average in a .250 average league. Every time he faces a batter he would reduce their normal season average by .035. In this case, Monday's average would be dropped .035. This second method isn't exactly 50/50, but always applies the pitcher strength or weakness in every PA. Plus, assures his pitching performance can't be skewed, at least, by dice rolls on loaded batter cards. I don't see this in their way of game design, thus far.
Hi. Thanks for your insightful comments; they are much appreciated as I'm always trying to learn more. By replays, I assume you may mean season replays. In any case, I am familiar with the math behind the 50/50 model and agree with your comments about the PoP engine. Based on what you've said, I believe you have seen my other videos on PoP. In one of the videos (th-cam.com/video/-_PHvLSDHQM/w-d-xo.html) I figured out that a simulation wasn't required to compute the expected outcomes of a specific pitcher-batter matchup. Regarding the game engine, my primary conclusion was that there may be more going on behind the scenes with the design/forumulas than I could discern. The developer has not shared his formulas, but I respect that, because guys like me would be tempted to build their own cards otherwise. One other thing I did find was an online article where a guy (Ron Bernier?) did some sort of Yastrzemski replay analysis. His results were directionally accurate, although some specific stats were a bit "out of whack".
@Steve Etzel Another example missing in game design is the actual importance of hitting with RISP. It's not a great weakness or discrepancy, but it is important in win probability. For example, a team like the 1969 Mets that hit so much better with RISP than other at bats get hurt by non-specific RISP hitting. Teams with the most hits in a game with RISP win 70% of the time plus roughly 67-70% of all runs are scored with RISP. The other 30% of wins are won on solo or runner on 1st base only homers in a game. Another weakness of the 1969 Mets not being a homer laden team with that discrepancy in game design with RISP.
If we take Stratomatic as an example, then the 50-50 system is inadequate in many ways. On the "basic" side, batters who never hit a homerun all season can easily hit several off the pitchers' cards during the season. Even going to "super advanced", there is no accounting for the individual matchup being accurate whatsoever as to strikeouts, walks, or extra base hits (except for making some hitters "Weak" so a homerun becomes a single off the pitcher's card). A 50-50 system is too "flat" without some cross adjustments. That a 50-50 system can average a pitcher's and batter's batting average isn't saying much.
@somguy3872 The super advanced game addresses 50/50 anomalies in the pc game and some on the C & D game. If a game isn't 50/50 to address anomalies it has to take from one and give back to another. If this isn't done correctly it too will have flaws and data skewing. Like I said before the pitcher cards can use 4 columns and the batters just 2 columns. In this case the batters would have cards loaded with hits on just two columns. That is a 67 to 33 system, but requires an adjustment in formulation to obtain the same accuracy as if based on a 50/50 system. One could have a 5 column pitcher with 1 column batters, then some batter performances require the luck of getting more of their performance from the pitcher card rolls, not good. Now, you have batter A a .400 hitter at a disadvantage of repeating his performance compared to batter B a .250 hitter. If a game gives more control to pitchers for the non BABIP data then the batters have to control the K's, BB's, and HR when addressed to their card. However; there will still be variance in trying to get the right pitcher matchup to the right batter at the right time of the rolls.
@@johnnysmoke612 Essentially, in Strat, one (figuratively) flips a coin to decide whether to read off the batter or pitcher card. And then all you get is a result of how the batter (or pitcher) did against the league average pitcher (or batter). Strat may be statistically the most accurate game out there (or so claimed by its fans), but it does not try to do much at all. It is a very flat system.
In other games, which do try to capture some meaningful interaction, even if they aren't as statistically accurate (let's just say), makes the batter-pitcher interaction much more meaningful.
Not a good thing to print the fringe players. Just look at Cesar Cedeno for the 1985 Cardinals.
Yes, if I were printing from the PDF, I would not bother with the fringe players either. When you buy the pre-printed cards, you get everybody.
Yeah, I liked how statis pro had their fringe players on a sheet of paper. I like having all players available so it worked great for me
In situations like this, you have a hard stop on AB's (76) in season replays, or a 17% chance to play using percentile dice in a single game replay. Easy. I'd rather have all the fringe players than not have them.