Introduction
This report will demonstrate how I used Standard Deviation, team offence and defence to produce an accurate model for predicting the results of NBA playoff series, and subsequently probabilities for the entire playoffs. This work builds on similar work in basketball and ice hockey, and adding my own previous work in playoff series probability and standard deviation. Predictions are then subsequently made about the results of the NBA playoffs and an overall NBA champion is predicted. Limitations of the model and possible ways of improving it are then discussed. The spreadsheet used can be found as a numbers document here, as an excel spreadsheet (unsure about formatting) here, and as a pdf (view only) here. You can also listen to a podcast where I walk through and explain the workings of the model here
Developing the Model
This analysis is based on an article published by basketball researcher Dean Oliver and has been built on based on the principles discussed in this article. Oliver has taken the scoring average and standard deviation of a team, and compared it to the defensive average and standard deviation of that team and calculated how many wins they should have earned. By using a normal distribution Oliver has calculated the probability of a team scoring a particular amount of points in one game, and compared that to the probability of their opponent scoring more points than them. This can then be boiled down into one number that produces the overall win probability for a team on any given night.
The one problem in this model is that it does not account for a team playing up or down to the level of their competition, for example garbage time. This allows one team to close the margin of victory without the result ever being in doubt. But this effect is small, so it is reasonable to exclude it. What is relevant to my research from this article is that one can extend this to individual game results. If the scoring average and standard deviation of two teams can be calculated, principles from Oliver’s paper can be used to determine the probability of either team winning.
Alone, this model does not account for the effect of each team’s defence, in order to do so, I used the same technique as is used in the following article . Simply adjust the average for each team’s scoring output based on their opponent’s defence compared to the league mean. A team with above average defence that holds their opponent to on average 97 points (compared to league average 100) means they would have a defensive impact of 3 points, so 3 points would be subtracted from their opponent’s scoring average.
From these calculations, Adjusted offensive average and standard deviation can be determined, and these are the figures used to calculate the win probability for either team. The next step is to account for home-court advantage. By collating the scoring outputs of every playoff team and separating home performances and away performances, one can easy break down the effect of HCA on every team. By doing this, and treating Team A on the road as a separate entity to Team A at home, the win probability for a team at home and away can be calculated. For example, in a series between Boston and Cleveland, Boston at home is pitted against Cleveland away, and alternately, Cleveland at home is pitted against Boston away. From this, the effective win probability for every game of a playoff series can be calculated.
The formulae and techniques used can be seen in the spreadsheet in the sheet named “Bell Curve Analysis”. After all of those stats have been collated, probabilities can be calculated for a game against any two teams, accounting for home or away. Once these WPs have been calculated for both scenarios, a model can be constructed to calculate how these would play out over a 7-game series in the format HH-AA-H-A-H (other formats could be used but this is the model for all current NBA playoff series).
The calculations are a simple weighted probability tree, which can be found on the sheet named “7 Game Series”. This model has various benefits, namely it properly accounts for the effect of the change in HCA throughout a series, and it also provides a good estimate of what the exact result for the series will be (how many games will be played). The one drawback of the model is that it will almost never predict a 4-0 series sweep, because the 4-1 result will almost always be more likely as the favourite will be back on their home court.
However this is not as much of a limitation as previously thought, as it makes sense to think that 4-1 is almost always the more likely result, 4-0 is a surprise no matter the quality difference of the two teams. The spreadsheet has also been modified for ease of making these series calculations. On the “Bell Curve Analysis” sheet the two teams up the top can be chosen using a drop-down menu, meaning it only takes two team selections to calculate the probability of a series between these two teams. This allows for quick calculations to be made at the click of a button, speeding up the following processes for calculating overall post-season results.
As playoff match-ups were determined, actual predictions about the post-season could be attained. It was decided that there would be two models used to predict the outcomes of the playoffs, an absolute winner system and a probabilistic system. In the absolute winner system, each round would be analysed as per the match-up system devised earlier, and from those results a winner of that match-up would be decided upon. From there, the winner is assumed to advance to the next round of the playoffs, and this process is continued until the final. The probabilistic system however, refrains from choosing a winner, and instead looks at the probability of every possible outcome up to the NBA Finals.
The advantage of the probabilistic system is that it takes into account the possibility of any team winning. The weakness of the absolute winner system is that it doesn’t account for the relative probability of a team advancing. For example, lets say a team has a tough match-up in the first round and only has a 52% chance of advancing, but from then on it has much easier match-ups. In the absolute winner system this team would be treated just like every other team when it advances to the next round, whereas the probabilistic system regards the team as being suitably weaker because of the relatively high likelihood that they do not move on.
This is of great benefit when predicting the NBA champion, as it accounts for the relative strengths of the conferences. For example, the eventual winner of the East is highly regarded as a two-horse race between the Hawks and Cavaliers, this means that they both have a relatively high chance of making the finals. On the other hand, there are arguably 5 or 6 teams that are strong competitors for making the Finals, the result of this is that the eventual winner of the west has a relatively low chance of making the finals, because of their tough match-ups in the first 3 rounds.
Predictions
This concludes the explanation of the models used for the predictions that are about to follow, from now on I will be describing the predictions made in the model, discussing the reasons for those results and finding explanations for possibly surprising results.
Absolute Winner
The first half of the predictions will discuss absolute winner predictions, which take the winner after each series and assume that it is guaranteed they win that series when predicting the results of the next round. This is the first model to utilise the predictor that comes to mind, but it’s not the best way. It does however, make sense immediately when explaining it to people. The predicted series results are attained by looking at the most likely series victory. The first round results are as follows.
WEST | |||
1 |
GOLDEN STATE |
0.934543071188391 |
4 |
8 |
NEW ORLEANS |
0.0654569288116088 |
1 |
4 |
PORTLAND |
0.559836924919051 |
4 |
5 |
MEMPHIS |
0.440163075080949 |
3 |
2 |
HOUSTON |
0.526764101359044 |
4 |
7 |
DALLAS |
0.473235898640956 |
3 |
3 |
CLIPPERS |
0.527213575928355 |
4 |
6 |
SAN ANTONIO |
0.472786424071645 |
3 |
EAST | |||
1 |
ATLANTA |
0.873393492868339 |
4 |
8 |
BROOKLYN |
0.126606507131661 |
1 |
4 |
TORONTO |
0.672036807777706 |
4 |
5 |
WASHINGTON |
0.327963192222294 |
1 |
2 |
CLEVELAND |
0.699351173308979 |
4 |
7 |
BOSTON |
0.300648826691021 |
1 |
3 |
CHICAGO |
0.642347277706967 |
4 |
6 |
MILWAUKEE |
0.357652722293033 |
2 |
Golden State vs New Orleans is not a particularly surprising result, with the warriors taking it easily. The pelicans only have a 40.8% chance to win at home, so out of all the 4-1 results, this one is most likely to be a 4-0 sweep. Portland vs Memphis, on the other hand, is a surprising result because Memphis has the better regular season record and HCA in the series, but portland gets away with a slight advantage. Both teams average ~99 points at home when looking at adjusted offense, but Portland has a 2 point advantage in adjusted offence on the road. This prediction however lies on shaky ground as Portland has lost one of their best defenders in Wesley Matthews to injury for the season. Without getting too much into match-ups and straying from the statistics, no Shooting Guard from Memphis will be able to take advantage of that. Regardless, this prediction is reasonably valid as Memphis has been on a long streak of mediocrity lately.
The result of Houston vs Dallas is not particularly surprising, but the fact that it predicts such a close series is intriguing. Both teams have potent offences that average over 100 PPG at home and on the road, and they both have average to subpar defense. The advantage eventually falls to Houston because of Dallas’ -3 defensive impact on the road.
Finally in the West, Clippers vs San Antonio promises to be a close match-up, the model predicts the Clippers to take it 52.7% of the time, which is hardly a conclusive result. Both teams have been on a surge in the second half of the season, so that factor cancels out in this match-up. Over in the East, Atlanta vs Brooklyn looks to be an easy victory for the Hawks, taking a 4-1 win. They also have a 67.6% WP against the Nets in Brooklyn, so this is also quite likely to be a 4-0 sweep.
Toronto vs Washington gives the advantage to the Raptors, who have a strong Home-court advantage in Toronto. Cleveland vs Boston is not particularly surprising when you look at the result alone, a 4-1 win to the Cavaliers, but the fact that it gives the Celtics a 30% chance of winning is shocking considering it’s a 2/7-seed matchup, people are considering the Cavaliers as strong championship contenders, and the Celtics only locked up a playoff spot with their penultimate game.
This prediction is most likely affected by the fact that the Cavaliers have been resting their star players over the later stretch of games, so that might have some effect. However, equally valid is the fact that the Celtics have a vastly improved squad over the second half of the season, so this prediction is probably reliable enough that I’m confident the Celtics can win one game.
Finally, Chicago vs Milwaukee gives the advantage to the Bulls, and nothing surprising or interesting can be taken from the match-up. From the first round results, the higher series WP team is assumed to win the series, and they advance to the Conference Semi-finals. The second round match-ups are as follows:
GOLDEN STATE |
0.840149175403901 |
4 |
PORTLAND |
0.1598508245961 |
1 |
HOUSTON |
0.308351539555116 |
2 |
CLIPPERS |
0.691648460444884 |
4 |
ATLANTA |
0.63170970735141 |
4 |
TORONTO |
0.36829029264859 |
1 |
CLEVELAND |
0.569015616287925 |
4 |
CHICAGO |
0.430984383712075 |
3 |
Golden State vs Portland is not a surprising result, with the Warriors taking it easily. Portland has a ~50% chance of winning at home according to the model, so a 4-0 isn’t a highly likely result. Houston vs the Clippers is an interesting result, with a dominant win the the Clippers over the 2-seed Rockets, despite the Rockets having HCA. A 4-2 result is likely here, giving the Clippers the chance to win the 4th game on their home court.
Atlanta vs Toronto is not surprising at all, the Hawks taking it 4-1. The Raptors have a 49% chance of winning at home, so I don’t predict this one to be a 4-0 sweep. Finally Cleveland vs Chicago, which looks to be a close series. Cleveland has HCA which is their big advantage, but this series is close because Chicago has a strong WP at home because of Cleveland’s scattershot offence on the road.
GOLDEN STATE |
0.768005289699203 |
4 |
CLIPPERS |
0.231994710300796 |
1 |
ATLANTA |
0.570691565152009 |
4 |
CLEVELAND |
0.429308434847991 |
3 |
Golden State vs the Clippers gives another relatively easy victory to the Warriors, who steamroll into the finals after 3 dominant victories against western conference opponents. Atlanta vs Cleveland win be a much closer matchup, but Atlanta still takes the series 4-3. Both teams suffer from having rested players at the end of the season, but this has little effect in this matchup because they both did it to similar extents.
GOLDEN STATE |
0.825800485478469 |
4 |
ATLANTA |
0.174199514521531 |
1 |
In the finals, the model predicts another dominant win to the Warriors over the Hawks. Overall the absolute model demonstrates how dominant the Warriors are going to be, but not a great deal can be determined from the Absolute model because it predicts chalk in almost every situation. A better model for looking at each teams actual chances in the playoffs is the Probabilistic model.
Probabilistic Model
The probabilistic model is far more complex, but provides more accurate predictions for the later rounds, where it properly accounts for the likelihood of upsets and their effects. It is exactly the same as the absolute model in the first round, because there is only one possible match-up. IN the second round is where the power of the probabilistic model comes into play.
Look at the Warriors for example. They have a 93.5% chance of beating the pelicans. In the second round they have two possible opponents, Portland and Memphis. Portland has a 56.0% chance of making the second round and the Grizzlies have a 44.0% chance. All that is needed now is the probability of Golden State beating both of those teams. The warriors have an 84.0% chance of beating the blazers and an 89.4% chance of beatings the grizzlies.
The general rule for this probabilistic model is as follows: P(GSW winning WCSF)=P(GSW making WCSF) * (P(X-opponent making WCSF)*P(GSW beating X-opponent) + P(Y-opponent making WCSF)*P(GSW beating Y-opponent)) This formula can be expanded inside the brackets for the following rounds to incorporate 4 possible opponents in the Conference Finals, and 8 possible opponents in the NBA Finals. By using this approach, the probabilistic model accounts for all possibilities and the WPs in those situations and produces an overall probability for each team at each stage of the competition. The results are as follows for each team winning in the conference semi-finals.
WIN CONFERENCE SEMIS | ||
1 |
GOLDEN STATE |
0.807311899807801 |
1 |
ATLANTA |
0.596362337436809 |
2 |
CLEVELAND |
0.421077674814479 |
3 |
CLIPPERS |
0.364985875739439 |
3 |
CHICAGO |
0.323246681407111 |
6 |
SAN ANTONIO |
0.315181007951462 |
4 |
TORONTO |
0.282709257814962 |
2 |
HOUSTON |
0.172676812018262 |
7 |
DALLAS |
0.147156304290838 |
6 |
MILWAUKEE |
0.142381775180314 |
7 |
BOSTON |
0.113293868598096 |
4 |
PORTLAND |
0.111375747718758 |
5 |
WASHINGTON |
0.0907226581332247 |
5 |
MEMPHIS |
0.0627824269196012 |
8 |
BROOKLYN |
0.0302057466150039 |
8 |
NEW ORLEANS |
0.0185299255538402 |
TOTAL |
4 |
The first 3 teams on the list should come as no surprise, as they’re 1 and 2 seeds, the Clippers are the first surprise. The strangest result however is the Rockets, who fall all the way down to the 8th spot for most likely to win the conference semis, despite being the 2-seed in the west. This is because their second round matchup is either the clippers or the spurs, and the fact that they’re only just more likely to make the second round than their opponent the Mavericks.
The other large drop is by Portland and Memphis, because of the fact that they both are almost guaranteed to play the Warriors in the second round, and neither has a very good chance of beating the warriors. The total of all probabilities adds up to 4, this shows that the probabilities are valid, because in the conference semis there are 4 games, and therefore 4 winners. Logically, the probability of all possible teams should add up to 4. The same method of checking has been used for the following rounds as well. The next table shows the probability of each team winning the conference.
CONFERENCE CHAMPION | ||
1 |
GOLDEN STATE |
0.647094236077852 |
1 |
ATLANTA |
0.386623211564788 |
2 |
CLEVELAND |
0.211619506412052 |
4 |
TORONTO |
0.146381258601121 |
3 |
CHICAGO |
0.140248919616856 |
3 |
CLIPPERS |
0.115361180951387 |
6 |
SAN ANTONIO |
0.0965804660331421 |
4 |
PORTLAND |
0.0488693209995067 |
6 |
MILWAUKEE |
0.043289820764252 |
2 |
HOUSTON |
0.0335877442944217 |
5 |
WASHINGTON |
0.0334126854890256 |
7 |
BOSTON |
0.0309490476913957 |
7 |
DALLAS |
0.0301673431854398 |
5 |
MEMPHIS |
0.0237346401527711 |
8 |
BROOKLYN |
0.00747554986050949 |
8 |
NEW ORLEANS |
0.00460506830548049 |
The next round shows pretty similar results, but every western team outside of the warriors has dropped from the previous round’s predictions because of the fact that they are most likely going to play the warriors, who are most likely to beat every possible opponent in the western conference finals. The Clippers are the second most likely team to win the west after the warriors, and they are the 6th team on this list.
One of the big benefits of the probabilistic model is demonstrated here, where the clippers and spurs are the 2nd and 3rd most likely teams to win the west respectively, even though they play off against each other in the first round. This is because they have a roughly equal chance of winning that first round, and are therein the two best teams in the conference outside of the warriors. The absolute model just isn’t able to show results like these. From that result one can conclude that whoever wins out of the clippers and spurs in the first round is going to be a strong competitor for the western conference title. It’s also clear once again that Houston is not the title competitor it’s assumed to be from the 2nd seed, coming 10th overall on this list. The final table shows the probability of each team winning the NBA Championship.
CHAMPION | ||
1 |
GOLDEN STATE |
0.545841074123992 |
1 |
ATLANTA |
0.114055617034654 |
2 |
CLEVELAND |
0.0640097445844359 |
4 |
TORONTO |
0.0315915815725008 |
3 |
CHICAGO |
0.027749653060396 |
3 |
CLIPPERS |
0.0745191789804979 |
6 |
SAN ANTONIO |
0.061011156654803 |
4 |
PORTLAND |
0.0262498033155244 |
6 |
MILWAUKEE |
0.0054887060342587 |
2 |
HOUSTON |
0.0159077314906383 |
5 |
WASHINGTON |
0.00360481199049952 |
7 |
BOSTON |
0.00338745113256729 |
7 |
DALLAS |
0.0135183798284877 |
5 |
MEMPHIS |
0.0110845416541319 |
8 |
BROOKLYN |
0.000494779696178371 |
8 |
NEW ORLEANS |
0.00148578884643376 |
The most shocking result from these probabilities is obviously that the Golden State Warriors have a 54.6% chance of winning the championship. This means that the other 15 teams competing in the playoffs have to share the other 45.4% remaining. Golden State has a higher chance of winning the championship than every other team combined. Atlanta is the only other team that features with a probability greater than 10%. The rest of the table is mostly unchanged from the previous one, but this table speaks volumes about how much of the favourite the Warriors should be to win the title this year.
Limitations
The main problem with this model is that it relies on results over the entire season. This creates plenty of problems for teams that have suffered injuries over the season, teams that have rested players, teams that have made trades, teams that have had an increase in production in the seance half of the season or teams that have been on the slide recently. However, this is not as big of a problem as one might think, it’s clear that overall the predictions make a lot of sense, and the fact that these sorts of things happen to every team suggests that they even all out and it doesn’t have a massive impact.
Injuries, which have been brought up as a limitation of the model, are actually not as big of an effect as one might think. Because the model relies entirely on probability, teams that suffered from injuries over the season but are now healthy might actually still be modelled correctly. The reason for this is that the model is based on regular season stats, so it accounts for the probability that a player is injured for one particular game during the regular season. One can assume that the likelihood that a player gets injured in the regular season is roughly similar to the likelihood that they get injured in the post season, and therefore the model is accounting for the probability that the team suffers from an injury again.
On teams that have improved or diminished play in the second half of the season, the model is simply accounting for the likelihood that the team drops off or gets back to its previous form again. Therefore one can conclude that it is rested players and trades that should be accounted for in the model.
Possible Improvements
As mentioned before, the model does not account for the possibility of resting starters during the regular season. This could be done by looking at the starting line-up currently used by the team (accounting for injured players) and removing games from the team results sheet that feature 2 or more starters who were listed as DNP (coaches decision). This would include players in the current starting lineup who were traded to the team during the season. The reason this has not been done in this model is that for the probabilistic modelling I had to go through and calculate the WP for all 120 different possible match-ups, and I am unaware of any system that would be able to speed up this process. My spreadsheet is available , so anybody who could be able to provide a method for more quickly calculating the probabilistic results is welcome to improve on the spreadsheet.
Cool 🙂
LikeLike