As always Steve, nice analysis for pitching in a sim, enjoyed. The largest samples for when runs score on what event: Era early 70's to early 90's Single 34% Homer 29% Double 15% Generic Out 13% Triple 4% Error only 2% Etc. 13% small sample events Risp 78% Not - Risp 22% Runs scored on a given play 1 - 64% 2 - 26% 3 - 8% 4 - 2% Home run effect by number of runs scored on each HR event. R - R% - HR% 2 - 37% - 29% 1 - 36% - 57% 3 - 22% - 11% 4 - 6% - 2% Of course, this data means if using ERA to analyze pitchers 78% is how they pitch with RISP. If a game isn't going to be developed off of 24 different base out situations then it's best to use the whole and just randomize or vary from the whole. Of course, a lot of these new game designs do design and create on the basis of smaller samples like FIP, BABIP, ERA, and etc., hoping to get back to the whole over a whole season. Yes, stratomatic does try to randomize or vary from the whole to get the whole, the outliers created by pure nature of randomness, always do need to be slightly addressed. My only objection to Dips, Fips, or BABIP game creation is it takes both the pitcher and batter to create the result of that given matchup. No matter, what the outcome it took both to get that given result at that given time. For example, a pitcher may show giving up a triple, but a batter may get that result, but he didn't hit a triple. In the majority of cases the batters or pitchers with the most triples as results will get them from the whole. Of course, the outliers can be caught with game excess adjustments moreso than trying to put the genie back in the bottle. In all fairness, the overall RISP stat being correct will get a more realistic performance out of a team like the 69 Mets or etc,. Take note, 78% of all runs are scored with RISP. The teams or players that perform as an outlier from their whole with RISP, up or down, will come out in the wash in the overall in a 50/50 system. However; a team like the 69 Mets or their players may perform as should in the whole, but may skew their RISP stats into the larger non RISP sample. Thus teams like the 69 Mets or etc., will mis-perform up or down depending on the comparison of their overall performance to RISP. Yes, some call it clutch luck, but there is no way around dealing with RISP performance is the key to win probability 70% or more. So, in the grand scheme of things other nuances are interesting, but not really necessary in changing win probability in the proper places.
Thanks Steve. I find the Aaron data most interesting. I’ll have to poke around BBref & see if they have this data aggregated in any way to support any larger conclusions. Appreciate the always great research…
Thank you. Apparently there is still much for me to learn. Fortunately, this topic has been studied by many people and they have chosen to document their thoughts.
Good stuff, Steve. One point re multiple regression. Simply regression is noted as y = a + bx, where y is the dependent (outcome) variable, a is the constant, b is the coefficient, and x is the independent (predictor) variable. Excel reverses the constant and the independent variable when placing it in the plot. Multiple regression is noted as y = a + bx1 + bx2, where x1 is the first independent variable, and x2 is the second variable. What you did was multiply x1 * x2, and treat the product as a single variable. That shows interaction.
Thanks for the feedback. I am probably out on a limb when it comes to multiple regression. For this video I just went to the answer. To do the WHIP + HR% correlation, I found an online tool that would process two independent variables. I explain this further starting at about 9:20 of this video: th-cam.com/video/M6FfmYtXeZA/w-d-xo.htmlsi=Ydx78-otx4p9L04B It's been quite a while since I did that work, so I may not have portrayed it correctly in this video.
As always Steve, nice analysis for pitching in a sim, enjoyed.
The largest samples for when runs score on what event:
Era early 70's to early 90's
Single 34%
Homer 29%
Double 15%
Generic Out 13%
Triple 4%
Error only 2%
Etc. 13% small sample events
Risp 78%
Not - Risp 22%
Runs scored on a given play
1 - 64%
2 - 26%
3 - 8%
4 - 2%
Home run effect by number of runs scored on each HR event.
R - R% - HR%
2 - 37% - 29%
1 - 36% - 57%
3 - 22% - 11%
4 - 6% - 2%
Of course, this data means if using ERA to analyze pitchers 78% is how they pitch with RISP.
If a game isn't going to be developed off of 24 different base out situations then it's best to use the whole and just randomize or vary from the whole.
Of course, a lot of these new game designs do design and create on the basis of smaller samples like FIP, BABIP, ERA, and etc., hoping to get back to the whole over a whole season.
Yes, stratomatic does try to randomize or vary from the whole to get the whole, the outliers created by pure nature of randomness, always do need to be slightly addressed.
My only objection to Dips, Fips, or BABIP game creation is it takes both the pitcher and batter to create the result of that given matchup. No matter, what the outcome it took both to get that given result at that given time.
For example, a pitcher may show giving up a triple, but a batter may get that result, but he didn't hit a triple. In the majority of cases the batters or pitchers with the most triples as results will get them from the whole. Of course, the outliers can be caught with game excess adjustments moreso than trying to put the genie back in the bottle.
In all fairness, the overall RISP stat being correct will get a more realistic performance out of a team like the 69 Mets or etc,. Take note, 78% of all runs are scored with RISP. The teams or players that perform as an outlier from their whole with RISP, up or down, will come out in the wash in the overall in a 50/50 system. However; a team like the 69 Mets or their players may perform as should in the whole, but may skew their RISP stats into the larger non RISP sample. Thus teams like the 69 Mets or etc., will mis-perform up or down depending on the comparison of their overall performance to RISP.
Yes, some call it clutch luck, but there is no way around dealing with RISP performance is the key to win probability 70% or more.
So, in the grand scheme of things other nuances are interesting, but not really necessary in changing win probability in the proper places.
Very helpful-thanks Steve!
Thanks Steve. I find the Aaron data most interesting. I’ll have to poke around BBref & see if they have this data aggregated in any way to support any larger conclusions. Appreciate the always great research…
Thank you. Apparently there is still much for me to learn. Fortunately, this topic has been studied by many people and they have chosen to document their thoughts.
Good stuff, Steve. One point re multiple regression. Simply regression is noted as y = a + bx, where y is the dependent (outcome) variable, a is the constant, b is the coefficient, and x is the independent (predictor) variable. Excel reverses the constant and the independent variable when placing it in the plot. Multiple regression is noted as y = a + bx1 + bx2, where x1 is the first independent variable, and x2 is the second variable. What you did was multiply x1 * x2, and treat the product as a single variable. That shows interaction.
Thanks for the feedback. I am probably out on a limb when it comes to multiple regression. For this video I just went to the answer. To do the WHIP + HR% correlation, I found an online tool that would process two independent variables. I explain this further starting at about 9:20 of this video: th-cam.com/video/M6FfmYtXeZA/w-d-xo.htmlsi=Ydx78-otx4p9L04B It's been quite a while since I did that work, so I may not have portrayed it correctly in this video.
@@steve_etzel You can do MLR in Excel. Send me a note
Wonderful. Thanks. Have you analyzed replay on pitching? That’d be interesting from your perspective
Thanks. Unfortunately, I haven't done any analysis on Replay at all. Same for Inside Pitch.