Section 2: Handicapping Theory 1/3 (Model Handicapping)
There are three general theories of how a bettor can gain an edge handicapping sports: Model Handicapping, Fundamental Analysis and Technical Analysis. In this three-part article, I explain each of these theories independently, and how I combine them to produce my Best Bets.
The core of my handicapping comes from the mathematical models I have built which predicts the results of games more accurately than the public or Las Vegas odds makers. Less sophisticated simulators that try to come up with a formula to predict future games tend to make the same mistake; they use regression analysis to find the correlation between different statistics and point differential. While that exercise is very useful for explaining which statistics impact a game’s result, regression is not necessarily useful in using past statistical averages to predict future results since some important statistics simply don’t correlate very highly to the future. For example, turnovers are the number one factor in point differential in football, but turnovers are also the least predictable statistic. A model that is based on regression analysis will weigh turnovers very highly, but since past turnovers do not correlate highly with future turnovers such models will over-weigh the affect of past turnovers – creating a model that is good at explaining what has happened but not very good at predicting what will happen.
Fumbles in particular are random, as they are about 90% due to variance. That is to say that historically, if you took all of the teams that fumbled a lot over the first half of a season, and all of the teams that fumbled very little over the first half of the season, those two groups of teams fumbled at a similar rate in the last half of the season. In other words, when the talking heads on ESPN praise teams that ‘hold on to the football’ and criticize teams with ‘fumbilitis,’ one must realize that such labels are just fooling you with randomness, and that in future games, the ‘hold on to the football’ teams will not necessarily fumble less than the ‘fumbilitis’ teams. (Of course, when this happens, the talking heads then say, “Iowa fumbled 10 times in the first 5 games, but has only fumbled once in the 5 games since then. They have learned how to take care of the football!”) This is just the most obvious of literally hundreds of different metrics which are factored into my mathematical model, and is one of the reasons that my model is much better than regressive models and has a consistent, winning track record to prove it.
My math models incorporate the predictability of past statistics to future games and uses each team’s compensated statistics rather than their raw stats, which adds to the accuracy of my predictions. Compensated statistics are derived by comparing a team’s statistics to the statistics of the opponents that they have faced.
For instance, if Oregon is averaging 3.6 yards per carry, and Rutgers is averaging 4.0 yards per carry, but Oregon’s opponents (when adjusted for schedule strength) only project to allow a combined 3.4 ypc against an average opponent, and Rutgers’ opponents project to allow a combined 4.2 ypc against an average opponent, then compensated statistical analysis (which I have tested over a sample size of tens of thousands of games) predicts that Oregon is actually likely to fare better rushing against an average run defense than would Rutgers, despite the fact that Rutgers is running at a rate of 4.0 ypc to Oregon’s 3.6. Using compensated statistics in combination with the predictive nature of each statistic used in my model produces an accurate measure of the true differences between two teams future performances – not the difference between their past performances.
I also adjust my projected numbers based on current personnel for each team and those extra hours of statistical work have paid off handsomely over the years (and I get better each year at making those adjustments). A lot of my edge comes from some complex defensive player analysis models which I have built to evaluate the effects of defensive injuries. I also remove meaningless plays from my data set such as kneel downs at the end of a half or game and quarterback spikes, in addition to lessening the impact of stats in ‘garbage time’, so the game statistics that I use are more representative of a team’s performance than the statistics used by other handicappers who take a lazier approach and just plug in box scores.
I have been using my current NCAA math model since 2001 (with occasional upgrades and adjustments) and my long term win percentage in College Football (56% over 29 years) is why my Best Bets move the odds within minutes of each Best Bet release.
One of the critical advantages of Model Handicapping is that it allows me to quantify my edge. That is to say, that over many years my model not only identifies advantageous lines, but also can give me a percentage estimate of how likely a given team is to cover the spread. Quantifying my edge allows me to adjust my bet sizes for optimal bankroll growth, which allows my customers to make more money.
It takes years of careful tweaking and analysis to really determine how much value each point of difference between a bettor’s own lines and Vegas’ lines is worth. The easiest way to test the validity of a model is to use statistical software to create a regression equation predicting home team spread result (1, 0, or 0.5 for a push) as a function of the line differential of the power ratings/math model from the actual line (in terms of the home team). If the slope of that equation is positive then you have a model/ratings that are better than the Vegas line (and a negative slope indicates your model doesn’t work) and the more games you use to test your model the more likely that the slope of that equation is a true indicator of how well your model will work going forward. For instance, I have 16 years using my current College Football math model and the equation to predict the chance that the home team covers the spread is .500 + 0.01 x LD, where LD is the line differential between my math model prediction and the line. So, for every point differential, I can add 1% to my chance of winning. Each College game has a hypothetical ‘perfect’ line where each side would cover exactly 50% of the time, and I try my best to arrive at that line. If I had ‘perfect’ lines, then I would have about a 3% advantage per point differential between my lines and the Vegas’ lines, but no model can achieve the perfect line. I spend the majority of each summer researching my methods and fine tuning my analysis, and my lines have become more accurate over time. Unfortunately, the Vegas odds makers have also become better at making lines so I have to continue to fine tune my model.
Remember, it doesn’t matter how much of a differential there is between your ratings/math model if your predicted line is not proven to be better than the actual point spread, as my lines have!
Many handicappers have a set of ratings, most often referred to as power ratings, that gauge the overall strength of each team in comparison to every other team. They then take the difference in ratings between two teams as the predicted point differential between the teams if they met on a neutral field. Of course, teams don’t usually meet on a neutral field so points are added to the home team to compensate for the advantage that most teams have playing at home. The home field advantage can be a set amount for all teams (such as 2.5 or 3 points in the NFL and 3.5 to 4 points in college football) or can vary from team to team depending on their individual variance in their level of play at home and on the road – although I find it best to not add or subtract more than 1/2 a point for any team’s home field advantage as past differences in level of play between home and road games that vary from the standard home field advantage are most often nothing more than variance and unlikely to continue.
While the concept of power ratings is rather simple, it is very difficult to come up with a set of accurate ratings. The problem with most power ratings methods is that the ratings are generated using some sort of mathematical process based on the past performance of each team and the level of opposition that they have faced. An example of this is the Sagarin ratings seen in USA Today each week. I’ve talked to many amateur handicappers that use the Sagarin ratings to figure out if the point spread is too high or low on a particular game. What is important to remember is that the Sagarin ratings, and any other mathematically produced set of ratings using only scores, explain what has already happened rather than what will happen. In other words, while it is true that these ratings accurately reflect the difference in the performance of each team up to that point of the season they are not a predictive tool to be used to forecast the future performance level of teams, which is what we are truly interested in as handicappers.
If beating the point spread were as easy as checking the Sagarin ratings and making your wagers based on that, then everyone would be winning and sports books would all be out of business. Obviously, that is not the case. So, while the Sagarin ratings can be used to see how teams have performed up to that point of the season, do not depend on them to forecast how teams will perform in their next game.
Power ratings are typically based off of the final scores of games. That works okay in basketball (although it won’t be better than the Vegas line) but in football there is a lot of ‘noise’ and ‘variance’ in scoring, and points are not nearly as useful for predicting the outcomes of games. Furthermore, power ratings which reduce every team to a single number ignore the enormous importance of matchups. If Texas Tech and Georgia Tech have similarly rated offenses, then you would expect them to fair similarly against a defense that had an average rating across the board in all defensive metrics. However, against a defense with an average overall rating, but on a more specific level, with very high run-defense rating (allowing 3.1 ypc against opponents who combine for an adjusted 4.5 ypc) and very bad pass-defense ratings (allowing 8.8 ypa against opponents who combine for an adjusted 6.4 ypa), you would expect Texas Tech’s pass-heavy offense to perform much better than Georgia Tech’s run-heavy offense, even though the two offenses are rated similarly overall. Obviously analyzing matchups is much deeper and more complex than this, and often gets into very technical data concerning advantages at individual positions, but this simple example illustrates the overall concept of how power ratings do not factor matchups.