Jump to content
  • Using Advanced Analytics to Predict the Georgia vs. Clemson Game

    By Nathan Lawrence
    Published in 

     0

    This Sunday, Ross of r2sports metrics posted a tweet that was the cause of some consternation. This tweet linked to his first CBCR Quckie preview of the Clemson game. The reaction was a sort of positive confusion. Like when your dog is watching you play video games.They want you to succeed, but they have no idea what’s going on. To that end, I’m here today to answer a few questions. First, why are we doing this? Second, how are we doing it? Third, what do these funny numbers mean? And finally, what do these numbers tell us about this weekend's game?

    Question 1: Why are we doing this? 

    There are a lot of really good resources for predictive stats. I’m partial to gameonpaper and CFBData, who both have very good pre-game breakdowns. Generally, these sites compare the two teams using previous results in stats that are predictive of winning. What none of them do, and the niche that Ross is trying to fill, is project these key stats in a specific way. Using a massive number of statistics, including some privately developed by Ross and available nowhere else, our previews predict the specific statistical results of a game, without predicting a score. We use cutting edge linear regression and machine learning to make projections using a massive data set.  While we can predict a score differential using CBCR2 – for instance, we have Georgia in this game by ~12.5 points – this metric isn’t really intended to produce a specific score result. Instead, we’re trying to advance the ball by providing projections to inspire conversation, debate, and hopefully greater understanding of the stats that matter. 

     

    Question 2: How are we doing this? 

    The most consistent reaction to the charts we produced was one of confusion. I’ll admit, they aren’t your typical charts. However, I still think they are useful. Let’s take a look at our summary chart below: 

    Contrary to what you might expect, these graphs measure not the predicted statistic in each category, but the percentile of performance that we predict. In other words, out of all projections we’ve made this season, what percentile of performance do we project for each team? If you take a look at pass yards per play, for example, you’ll see that we project UGA to have a very good day through the air,  as compared to other teams’ results. These quick graphs are really helpful, because they give us a snapshot into what the model believes to be true about not just this matchup, but how it expects Georgia to play in comparison to all of its peers. 

    cbcrquickiesummary.thumb.png.607c934f7411b5aa5cd7f85f748944b6.png

    The other kind of chart we produce for this preview is a more direct comparison of each individual stat. These look like this: 

    cbcrquickieclemsonypp.thumb.jpg.1d1d2d6713f19974a408acfbf05dcd80.jpg

    This bell curve (IT’S THE NAME OF THE PODCAST WOWOWOW) is essentially a zoomed in version of the summary chart. In it, you get not just a representation of position of both teams relative to their peers’ projections, but also a more specific breakdown of what the projected total for the stat is. 

    Question 3: What do these funny numbers mean? 

    Hopefully, given that you have visited a website dedicated to a college football program, you understand the yards based stats that we project. While air yards, rush yards, and their yards per play brethren are generally considered to be surface level stats by most stats nerds, they can be helpful in giving you a frame of reference for what the numbers believe the flow of each game will look like. In our example of above, it doesn’t necessarily guarantee that Georgia will win if they out gain Clemson on a per play basis by 2 yards, but it does let you know that 1) UGA has the opportunity to pass on this Clemson defense, and 2) if they aren’t, it’s happening for a reason. These aren’t the kind of stats that unlock the keys to understanding football at a galaxy-brain level, but they do give us a baseline of expectation from which we can understand what is happening as the game unfolds on a deeper level. 

    Here are our other yard stat projections for the Clemson game: 

    cbcrquikieclemsonry.thumb.jpg.cf96d7ee1c73ff496cc441dd150cf225.jpg

    cbcrquickieclemsonrypp.thumb.jpg.04bd28cfd3b271cf6fedbbefdd064d48.jpg

    cbcrquickieclemsonpy.thumb.jpg.f52f433cf36ce86af3de998806a88d95.jpg

    A couple of things stand out to me here. First, the general perception, at least among people who I trust when it comes to CFB (Read: Graham) is that UGA will lean into the run game this season. That is probably true across all 12 games, but CBCR2 seems to think that the Dawgs have more of a chance in the air for this particular matchup. It’s worth noting that CBCR2 can’t see ETN’s suspension or lack thereof. It also can’t see Roderick Robinson’s turf toe, which makes me think that Clemson’s defense might have some shakiness in the defensive backfield. (I’m pretty sure scouting reports bare this out, but I’m not confident enough to include that outside of parentheses.) The other thing I see here is that the model thinks that UGA will do a decent job of slowing down Clemson’s run game. 147 yds at 4.6 yds/carry isn’t anything to sneeze at, but it doesn’t speak to the dominance that Clemson will be looking for to keep the pressure off of a depleted WR room. 

    Now, let’s get into the fun stuff. In addition to yard stats, we also project opportunities, points per opportunity, field position, available yards, sacks created, turnovers and negative plays allowed. Some of those – like field position, turnovers, negative plays, and sacks – are pretty easy to understand. They are also interesting in their own right. It’s interesting, for instance, that the model thinks that UGA will surrender a relatively large number of negative plays. 

    fieldposition.thumb.jpg.bfd365bd4a4ea184b83c498c6574c5a8.jpg

    negativeplaysallowed.thumb.jpg.0a9ccfd1fa839b44efdae45bb2ce6b46.jpg

    sacks.thumb.jpg.926f23ecf6d9ed319e0f62d294cd6140.jpg

    turnovers.thumb.jpg.a9e52c8ab8f5a5faaec66262fffd2b06.jpg

    That being said, I find the more esoteric stats to be the most interesting and important. Let’s take a look at them one by one.

    First, we have opportunities. Opportunities are offensive possessions that have at least one snap inside the opponents 40. Statistically, your chance of scoring goes up past the opponent’s 40, so we measure “successful” drives as those that reach that mark. Opportunities are a great statistic because they help us understand how well an offense is doing situationally, not just on a play by play basis. What I mean is that, if an offense has a lot of explosive plays, but every drive ends at the 50, it’s not doing its job. Opportunities, and opportunity rate (the ratio of how many opportunities you generate among your drives) help us weed out junk production. 

    opputunities.thumb.jpg.e00b07ae24cd63d0c6ee176089f027cd.jpg

    This is an informative projection to have, but it’s also an example of how percentile can be a little bit misleading. Even though UGA is in the 79% of all of our projections more than 50% higher than our projection for the Tigers, we only predict them to have 1.7 opportunities more. This is something that happens when results are clumped together around a certain point. Since there is very little space between the best projection and the worst, percentile can mislead us into thinking that the Dawgs have a bigger advantage than we think. Despite the small separation, even two more opportunities can radically change the course of a game. It’s a possible 14 point swing, but even more than points, it represents time off the clock, which puts pressure on the team with fewer opportunities. 

    Related to opportunities are points per opportunity. This stat is a simple extension of the previous one. When you have the ball inside your opponent’s 40, how likely are you to score? A good points per opportunity number is generally above 3.5,which indicates that a team is more likely to get a TD than a field goal when they’re knocking on the door. We think of PPO as a drive finishing stat. Great offenses consistently have a high PPO. 

    ppo.thumb.jpg.39d3608a8b91c3b80519087c53b2ce9e.jpg

    Finally, let’s talk about available yards. When we talk about the available yards Gained or available yards ratio, we’re talking about how many yards a team gains of the distance between their starting field position and the goal line. If you start the possession at your own 25, you have 75 yards available. Gain all of them, and your available yards rate is 100% and you’ve gained 75 available yards. This is a really easy way to quickly measure how successful an offense is from series to series, and what a defense does to limit that success. 

    availableyards.thumb.jpg.954ffe3e47e27d51cebc609f9c1bc027.jpg

    This is probably the starkest demonstration of what CBCR2 sees as the gap between these two teams. It’s not a direct indicator of results, but it is very difficult to win a football game when your average drive is 10 yards shorter than your opponent. Given that the field position battle is even, or even worse - that you’re losing it - the difficulty of overcoming a 10+ yard AY/possession average approaches insurmountable. 

    Question 4: What do these numbers tell us about this weekend’s game? 

    Given that each of the projections above shows a significant percentile advantage for UGA, it’s easy to say that these numbers tell us Georgia will win. The Dawgs are favored, after all, and most prognosticators are forecasting a win. That’s easy analysis, and more importantly, it doesn’t add anything to the conversation started by people who know more about ball than I do. So what new information can we learn from these stats? First, I would say it’s important to note that, while our model see’s Georgia as performing on a much more elite level in relation to the rest of the field, the actual gaps in many of these metrics are relatively small. Georgia, in absolute terms, is projected to have a small-to-medium advantage in each of the statistics we think are indicators for a win. In my mind, this reflects the nature of the game we’ll see on Saturday, and is – more generally – a reminder of what it takes to beat a good team. Great teams beat good teams by capitalizing on a series of small but significant marginal advantages. Wins, at the level that Georgia plays on, are made of small moments where bounces go your way. What separates the GOATs from the just-goods-of-all-time is the ability to train a team to affect those “bounces” on a consistent basis. These metrics tell a story. Namely, that this will be a close lead for the Dawgs for most of the game, and that the final score, even in the event of a cover, will not reflect the relative quality of the two teams.  Having said that, maybe we get two pick sixes and blow them out. I’d be happy to be wrong. 

    By the way, if you liked what you heard here today, pretty please check out my podcast, Chapel Bell Curve. Imagine this, but less dry, and more stupid, and you’ve got the CBC aesthetic. You can find us anywhere you find podcasts and at our linktree here

    Additionally, if you’d like to see Ross and I break down these very same numbers, check out our stream from this week here, or Ross’ blog article here

     

    I’ll catch you this weekend in the Benz (I’ll be the one in the polo with the walkie talkie), but until then, Go Dawgs! 

    • Like 1
    • Fire 3

    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...