Rockets fan left the following comment on Andres Perezchica’s recent article for the Wages of Wins Journal concerning the D-League:
To echo a question I’ve raised elsewhere — but haven’t seen addressed — what is a reasonable estimate of the [Wins Produced model’s] error margin? There are some obvious problems with the metric. (For example, it can’t attach a number to plays where a defender’s defense makes an offensive player miss a shot, it values all assists the same, and it does not account for charges drawn.) To be clear, I’m not saying the metric is bunk. But, I think it’s beyond dispute, that it isn’t perfect. Given that it’s not perfect, how imperfect is it? Is [.07]1 really worse than [.09]1? Can the WS make such fine distinctions? I don’t know, and I’d be interested in reading an answer.
One of my biggest complaints about [The Wages of Wins Journal] is that, even though we all accept there’s some [margin of error], in nearly all posts the implicit assumption is that a higher [wins produced] necessarily means the individual contributed more wins. In other words, a player with a .150 [WP48] will be treated as obviously better than a player with a .135. I’m not sure that’s the case. Sorry to (try to) highjack a thread, but I feel like my question comes up in nearly every post — including this one.
Perhaps I can offer a bit of clarification to Rockets fan and anyone else who is unsure of the implications involved in comparing players using the Wins Produced family of production metrics.
The effect of minutes played
As a sample of minutes that a player has played increase, the WP48 as calculated for that period will more closely reflect that players ability, and it’s implications become larger.
To show this point, here’s a table that lists the difference in production (Δ Wins Produced) between two players for a given number of minutes (assumed to be the same) at various differences in their rates of production (Δ WP48).
This table shows that subtle differences in WP48 (Δ WP48 of .020 and less) don’t have a large effect on wins produced until the two players approach starters minutes. So if two starters play 2800 minutes each, and the first of the two has a WP48 of 0.100 and the second has a WP48 of 0.120, then the second player will produce 1.17 more wins over the course of the season, which I would argue is significant. But if those same players have the same WP48, but only play 400 minutes, then the second player will produce only 0.17 more wins, which certainly is not very significant.
On the precision of WP48
WP48 is a precise calculation. All else being equal, having a WP48 of 0.150 is (very slightly) preferable to a WP48 of 0.149. The reason for this is that WP48 is the best model available to describe the rate of production of players in the NBA, and an increase in WP48 in isolation is very likely to lead to more wins2 . To say that player a produced at a WP48 of .150 instead of .149 over the course of a season is akin to saying that he got 1003 rebounds rather than 1000. It’s not a big difference, but everything else being equal, you would take the 1003 over the 1000.
One of the strengths of WP48 however, is that over a season’s worth of minutes, player production as expressed in terms of WP48 is relatively consistent (unlike adjusted plus/minus, for example). This tells us that if a player is productive this year, he is likely to be productive next year3. In practical application the Wins Produced model will generally explain a teams win/loss record for a given season to within 2 wins. Usually there will be a couple outliers that under/over perform the win/loss record predicted by the Wins Produced model by about 4 wins. For more on this, see the following posts by Dr. Berri: Proof and the NBA and The Differing Stories on Durant – and a Brief Thunder Review.
In summary, to say that player 1 has a higher WP48 than player 2 is to say that, when considering only the factors included in the Wages of Wins model, that player 1 was more productive on a per 48 minute basis than player 2. This is true regardless of whether there is a difference of .001 WP48 or of .300 WP48 between the two players. There are other factors outside the scope of WP48 that could mean that player 2 is more productive than player 1 in absolute terms, but these factors are both unknown and of relatively small impact. Therefore, when evaluating the production of players in the NBA, it is best to assume that player 1 is more productive than player two, at least until the Wages of Wins model is improved to have a smaller error, or until another model with a smaller margin of error becomes available.
1 Rockets fans question actually used the numbers .7 and .9, but I’m assuming that .07 and .09 were meant as the former numbers only come about in very small sample sizes and are not really reflective of a players actual ability.
2Note that I am using this number for pedagogical purposes and in reality, if a player increases his WP48 by .001 in say 2400 minutes of play, he will have helped his team by 0.058 wins which is not likely to have any parctical effect on the teams win/loss record.
3There are some well know caveats to this generalization. Very early career production (i. e. the first couple seasons a player plays in the NBA) is often much more volatile than production from mid-career seasons. Players are also less likely to maintain production after the age of 30, and especially after the age of 32.
Update:
Alex asks:
I’m assuming that Rockets fan’s question was actually in regards to the statistical error associated with wp48. For example, not only does Dr. Berri not like adjusted plus/minus because it doesn’t correlate well across seasons, but within a season the errors are so large that it’s difficult to compare players. I’m making numbers up, but Kevin Durant might be a +6 but the error term is +- 5, meaning he could be anywhere from amazing to average. What is that number, the +-5, like for wp48? If a player posts a .100 one year, what would he have to post the next year for me to be pretty sure he got better, as opposed to there being a good chance he played just as well? .101 seems non-significant to me, but .105? .110?
Interesting question, Alex. I don’t think that there’s a really solid answer to that. Mostly, WP48 is a summation of individual player production, so I think that my assertion that any increase in WP48 is good, all else being equal, stands. To find an area of the Wins Produced model that would allow for the possibility that a player with a .100 WP48 is really more productive than a player with a .101 WP48, you would have to look at the parts of the model that are not specifically tied to the box score numbers produced by a particular player.
The area of the model which has the largest potential to lead to some inaccuracy in a players WP48, in my opinion, is the way that individual defense is incorporated. In case you are unaware, WP48 does incorporate team defense, and distributes this among the teams players based on minutes. It should be noted however, that adding individual defense has a relatively small affect, even in extreme cases (i.e. if a player has a WP48 of 0.000, then even if that player is the best individual defender in the league, he would not be able to approach an average WP48 of .100 if individual defense were incorporated into WP48, in fact, defense in general has a relatively small impact compared to shooting efficiency, rebounds, and turnovers, all of which are well accounted for in WP48). All of the factors that most affect wins are incorporated into WP48 already. The reason that individual defense is left out of WP48 is that it would add a lot of complexity to the model without increasing it’s explanatory power by much. For more discussion on this topic, see Dr. Berri’s article Incorporating Defense from The Wages of Wins Journal. Here is a relevant excerpt:
Models are not supposed to be “perfect” (whatever that means). When I and my colleagues construct models, we are trying to construct a simplified version of reality that allows us to focus on what is important (and answer the various questions we pose in our research).That is what I think Wins Produced does. It is a simple and accurate measure of performance, based on the theoretically sound idea that wins are determined by a team’s offensive and defensive efficiency. This model ultimately tells us that wins are primarily determined by shooting efficiency, rebounds, and turnovers. Yes, other issues matter. But players who do not score efficiently, who fail to rebound (given their position), and/or turn the ball over excessively, will not help you win games.
So, my answer is that we might conservatively estimate that a players WP48 is within 0.030 of his “true” win production per 48 minutes for players who excel in, or conversely are extremely poor with regard to, all of the areas that are not considered in the calculation of WP48. Any given player’s WP48 will necessarily be close to his “true” win production per 48 minutes. If he is a great individual defender, then WP48 may slightly undervalue him. If his assists are better than the the average assists, then again, WP48 may (very, very slightly) undervalue him. If one wishes to take those areas which are not explained by WP48 into account, then it is ones prerogative to do so, but caveat emptor that you are deviating from the science, and unless you know the true impact on wins of the variable you are adjusting, you are more likely to get a less accurate picture of the player’s true production than if you had assumed that WP48 was the player’s true production.


14 comments
Comments feed for this article
July 26, 2010 at 4:23 pm
robbieomalley
Dr. Berri
FTFY
July 26, 2010 at 5:13 pm
Shawn Ryan
Oops not sure how that happened. Thanks for the correction Robbie.
July 28, 2010 at 1:37 am
Alex
I’m assuming that Rockets fan’s question was actually in regards to the statistical error associated with wp48. For example, not only does Dr. Berri not like adjusted plus/minus because it doesn’t correlate well across seasons, but within a season the errors are so large that it’s difficult to compare players. I’m making numbers up, but Kevin Durant might be a +6 but the error term is +- 5, meaning he could be anywhere from amazing to average. What is that number, the +-5, like for wp48? If a player posts a .100 one year, what would he have to post the next year for me to be pretty sure he got better, as opposed to there being a good chance he played just as well? .101 seems non-significant to me, but .105? .110?
July 28, 2010 at 8:31 am
Shawn Ryan
-Alex
Interesting question. I don’t think that there is a really solid answer to that. Mostly, WP48 is a summation of individual player production, so I think that my assertion that any increase is good all else being equal stands. To find an area of the Wins Produced model that would allow for the possibility that a player with a .100 WP48 is really more productive than a player with a .101 WP48, you would have to look at the parts of the model that are not specifically tied to the box score numbers produced by a particular player.
The area of the model which has the largest potential to lead to some inaccuracy in a players WP48, in my opinion, is the way that individual defense is incorporated. In case you are unaware, WP48 does incorporate team defense, and distributes this among the teams players based on minutes. It should be noted however, that adding individual defense has a relatively small affect, even in extreme cases (i.e. if a player has a WP48 of 0.000, then even if that player is the best individual defender in the league, he would not be able to approach an average WP48 of .100 if individual defense were incorporated into WP48, in fact, defense in general has a relatively small impact compared to shooting efficiency, rebounds, and turnovers, all of which are well accounted for in WP48). All of the factors that most affect wins are incorporated into WP48 already. The reason that individual defense is left out of WP48 is that it would add a lot of complexity to the model without increasing it’s explanatory power by much. For more discussion on this topic, see Dr. Berri’s article Incorporating Defense from The Wages of Wins Journal. Here is a relevant excerpt:
So, my answer is that we might conservatively estimate that a players WP48 is within 0.030 of his “true” win production per 48 minutes for players who excel in, or conversely are extremely poor with regard to, all of the areas that are not considered in the calculation of WP48. Any given player’s WP48 will necessarily be close to his “true” win production per 48 minutes. If he is a great individual defender, then WP48 may slightly undervalue him. If his assists are better than the the average assists, then again, WP48 may (very, very slightly) undervalue him. If one wishes to take those areas which are not explained by WP48 into account, then it is ones prerogative to do so, but caveat emptor that you are deviating from the science, and unless you know the true impact on wins of the variable you are adjusting, you are more likely to get a less accurate picture of the player’s true production than if you had assumed that WP48 was the player’s true production.
July 28, 2010 at 8:33 am
Shawn Ryan
Thanks for the question Alex, I’m going to add it and my answer to the article.
July 28, 2010 at 7:52 pm
Jimbo
Hi, this article and the comments touches on something that I’d been meaning to ask Prof Berri – how does Shane Battier rate on the Wins Produced model ? He is someone that seems to win wherever he goes, without putting up great “conventional” stats. Thanks, James.
August 4, 2010 at 9:06 pm
Shawn Ryan
-Jimbo, sorry for taking so long to approve your comment, I hadn’t seen it.
The best year of his that I have numbers for was his last season with Memphis. He had a roughly a 0.171 which is of course 71% better than an average player. That’s a pretty good mark and certainly above what was generally perceived about him at the time. In Houston, he hasn’t reproduced that mark, but he’s basically been in the 0.120 to 0.150 range. Battier also seems to be able to cause the man he defends to lose a lot of productivity, but there hasn’t been a Wages of Wins based analysis of this, so I can’t really give specific numbers. He seems to be one of the players that would benefit most if individual defensive stats were implemented into the WoW model. Hope that helps, and sorry I can’t offer more on his tenure in Memphis, because I’m pretty sure he was consistently above 0.150 if memory serves.
August 3, 2010 at 11:29 am
tgt
You missed the point again, and you can’t just throw up .030 without any evidence. It doesn’t appear you have the statistical background to answer this question, so you might not want to represent that you did answer it or any values for the error margin.
Completely subjectively, i think the margin of error will be dependent on the value. I think it’s likely that the outliers (.300, -.100) will have a larger margin of error than the people at .075, but that belief has as much statistical backing as your .030 number.
August 4, 2010 at 8:47 pm
Shawn Ryan
-tgt
**sigh** It is not a question of statistical background, it is a question of logic. When you say that the output of a model has a statistical error of x, you are saying that in aggregate, the results of that model will deviate from reality by x percentage. To say such a thing, and this is what that whole “I don’t think that there is a really solid answer to that” bit was about, you have to have a frame of reference within reality.
When you look at the Wages of Wins model on the whole there is such a frame of reference within reality to compare the model to, and that is how many wins the various teams in the NBA achieve. So it is possible to calculate how much the output of the Wages of Wins model deviates from that reality. WP48 however is a derived metric. It has no objective mark against which it can be compared, and thus it is quite impossible to calculate the statistical error of WP48. Put another way, and rest assured that this implies quite the same thing, WP48 is a metric derived from a model of a specific domain of reality, it is not in itself a model of reality. This is the argument I was pursuing in my comments to Alex, but I suppose it was not explicit enough. I’ll try to keep that in mind in the future.
On a more personal note, and I’ll be brief here: Come on, it’s really not good form to call into question someones credibility on an issue when you don’t firm handle on the logic of that issue. By all means, seek other sources for your answers, but you really do yourself a disservice by so quickly going the ad hominem route.
November 3, 2010 at 7:57 pm
Man of Steele
I’ll try to contribute yet another interpretation of Rockets fan’s question. It seems to me that he is asking about both the precision and the accuracy of the win score model. As you have said, Dr. Berri’s model is very precise. We can say that, over a season’s worth of minutes, player 1 (.150 wp48) is better than player 2 (.149) on the basis of box score statistics. I think this point is basically beyond dispute, at least among members of the WoW community.
Accuracy, on the other hand, is a different matter. Since the box score statistics do not reflect everything that happens in a basketball game, we cannot know that the numbers .150 and .149 perfectly reflect the value of these two players.
The matter of accuracy is of course a matter that is open to dispute, especially as pertains to the relevance of the information/events which the model does not take into account. Where many of the conversations on the WoW and sister blogs come to loggerheads is at precisely this point. While we may all agree that Dr. Berri’s model is the most accurate one available, there are some who also think it is not sufficiently accurate so as to be beyond the possibility of improvement.
Finally, I feel as though I should emphasize that all these opinions are more about addressing inadequacies in a model that everyone likes, not attacks on a totally inadequate system.
November 3, 2010 at 9:14 pm
Shawn Ryan
Thanks for the comment Greg, it inspired some thoughts
I agree with all of your points, and of course Berri’s metrics aren’t perfect. One point that often seems to get lost though is that modelling a complex system is as much about figuring out which factors to ignore as it is about accounting for every detail. There becomes a point of greatly diminishing returns when accounting for more factors. As for the prospect of improving Wins Produced, I think that you probably have to go outside of box score statistics, meaning you would have to do a lot more work. One of the great boons of Wins Produced is that it is derived purely from box score statistics.
Really, I think that Wins Produced is as good as it needs to be and is useful in its current state. I think there are some much more glaring gaps in our knowledge of the NBA that we should be moving on to, and so I do find it a bit counterproductive to try to figure out how to improve Wins Produced at the margins.
I think that more research should be done on the NCAA, Euroleagues, and D-League and how numbers produced in those leagues can translate to productivity in the NBA. The question of injury prone-ness is another area that I think needs to be borne out quantitatively. Arturo is obviously doing interesting work addressing these sort of issues, and I believe that that is where our mental effort should tend to go.
**************************
Maybe I can also clarify some of my thinking on the precision/accuracy thing. In my previous arguments, I was going for clarity of logic, so I was somewhat tightly confined by that. In reality, if you have 2 players, one of which had a WP48 of .149 last year, the other of which had a WP48 of .150, then WP48 is not going to be the factor that leads you to a decision between the two players. There is no shortage of factors that go into personnel decisions in the NBA. There is year to year fluctuation, and past performance never proves future performance. I believe that it is pretty clear that in the above case, however, last years production doesn’t really need to be considered in the decision, so you would look at previous years and how consistent the player has been in the past, how each player gets his production, the relative ages of the players, whether one of the players appears to be an injury risk, if they play different positions, the relative productivity of the player that each will be replacing, and the list goes on. You might also look at defensive footage and try to figure out how often each player makes an error on defense, since that is an area that WP doesn’t speak to. There are so many other considerations that really you have to look at everything, and it’s still a judgement call.
November 4, 2010 at 2:36 pm
Man of Steele
I agree that our knowledge of other leagues (NCAA, Europe, NBDL) is severely lacking. Here is one area where, if we could correct model to achieve greater accuracy, a team could make much better draft decisions. In the NBA, wins produced is basically adequate for the vast majority of the population. If it misrates a couple of players at the margins, we can live with that kind of error. Compare the size of the NBA with the size of the prospective talent pool, though. The NBA has somewhere in the neighborhood of 450 players (30 teams * 15 players), while NCAA (div. 1 alone) has something like 3000 (300 teams *10 players), not to mention several hundred players from junior colleges, the NBDL, and Euroleague. If there is significant error at the margins of the model on a sample size that large, it can be a problem. Something like failing to take defense in account can become a big problem when you project the model out across such a large population. So, perhaps what we have said goes together: if we can find ways to increase the accuracy of Dr. Berri’s metric, we can vastly improve the way we analyze the correspondence between performance in “minor league” ball (meaning all the leagues listed above) and performance in the NBA. Any team with a very clear idea of who among the 3000-4000 “minor league” players could succeed in the NBA would seem (to me at least) to have a huge leg up on the competition. Dr. Berri’s recent article on Landry Fields brings to mind a whole host of guys like Ben Wallace and Horace Grant who were simply overlooked by the NBA talent evaluation process.
All that being said, if it turns out that we cannot improve the accuracy of the model without significantly more information (which I suspect is the case), using win score for talent and performance evaluation is certainly the best way to go. I have thought for some time that if a team began using win score/wins produced as the only rubric for player evaluation and personnel decisions, they could build a championship team in 3 years (I guess this is kind of the point of some of Arturo’s work, really).
As a footnote, I understand what you mean about the precision between .150 and .149 being basically irrelevant in player evaluation. I was simply using your example to (try to) demonstrate the difference between precision and accuracy. I completely agree with you response to that point.
June 19, 2011 at 8:53 pm
calc of variation
Late question, but still:
What about sensitivity? For example if the average player has a variation in rebounds of some % per 48 per season say, the model (which is linear right?) would have a variation in wp of some % per 48 per season say.
You might then argue that this natural variation is not due to skill say, but random factors, and that a player of fixed skill would be expected to get x+- some number of rebounds per 48 per season.
Then any number within this range would indicate a player of the same skill. For example 1 rebound per 48 per season variation might produce a 0.05 variation in wp/48 (no idea if these are accurate values). So two players, one with wp/48 of 0 .095 and the other of 0.100 could be said to have the same skill within the natural variation in production of a typical player.
June 19, 2011 at 11:38 pm
Shawn Ryan
Hey calc,
Had to reread the article, it’s been quite a while since I wrote it. A couple points regarding your comment:
-yes, the Wins Produced model was arrived at using linear regression. It has become nearly a truism that the model over-estimates wins for very good teams, and underestimates wins for very bad teams. But for the vast majority of teams it will predict wins within +/- 2 for any given season.
-A little bit of the reasoning for this article has been lost due to the extent to which it’s been removed from context. Basically, I was responding to a very specific confusion that was common among commentors at The Wages of Wins blog at the time this was written. Your point is correct. A 0.95 player is roughly equivalent to a 0.100 player, and you’ll even see some players that vary by +/-0.100 or more between two seasons. One point that Dr. Berri often makes is that Wins Produced isn’t meant to quantify a players, as you put it, skill, but rather to summarize a player’s production. So a player that attains a wp48 of 0.95 for a given season could very well be more skilled than one that attains a 0.100 for the same season, but the 0.100 player was slightly more productive. There could be many reasons for the disparity in skill and production, and the disparity very well may correct itself in the following season.
Personally I still don’t like saying something akin to “Player X earned a WP48 of 0.100 +/-0.010. I think that this gives a false since of certainty because it implies that we know the affect, on a player level, of the forces that remain unmeasured by the model, and this is simply not the case as far as I’m concerned. The Wins Produced model explains roughly 95% of the variation in wins on a team level. Giving credit to particular players for those wins introduces more uncertainty (most notably the affect of individual defense).
-One more note on production vs. skill. Say that the gambling fairy gives gambler x and gambler y each $100 to gamble with. The two gamblers promptly jaunt over to the roulette table. Gambler x let’s his $100 ride on black, while gambler y puts his on red. After the spin, it turns out that black wins, and gambler x doubles his money, while gambler y loses his. Gambler x’s production for the night (he stopped while he was ahead of course :-D ) was $100, while gambler y’s production was -$100. Neither gambler was more skilled than the other, but it would not be correct to say that both gambler’s production was +/- $100, because it is a provable proposition. It has happened, and at the end of the night, gambler x has winnings to cash, and gambler y had better come up with some quick cash, because the gambling fairy has guys that take care of guys who renig on their debts (one should never gamble with other people’s money).
Anyway, production as revealed through wins produced should be though of in the same way. Each of the figures in that wins produced or wp48 number can be tracked back to actual production (a shot made, a steal, a rebound, etc..) and therefore each increase is good and each decrease is bad.
If you want to get scientifically precise about it though, one standard is to use rules of significant digits. Since players accumulate stats over the season, the more a player plays, the more precision you would expect to be significant, and it would likely be capped at 2 or 3 significant figures. I think however that this approach is rare in models such as this, and significance is laid more wholly at the feat of sample size with out regard to whether or not there is a term that would limit significant figures of the entire calculation. Plus, there are a lot of terms that go into the calculation of these models, so that wouldn’t really make much sense, speaking in practical terms, anyway.