Predicting batting performance

The rapid growth of T20 cricket has accelerated the use of data and reformed how those within the game use statistics. Mainstream media often lags behind - I recently read a report on the BBC describing "a thrilling one-wicket win over Surrey", as if the difference in wickets were particularly relevant in T20 cricket. The match was extremely tight but mainly because it was decided on the last ball, rather than wickets

Even as a fairly well-informed fan, it can be hard to find stats in T20 that are as reliable and informative as batting and bowling averages in Test cricket. Mostly, I am guided by strike rates and economy but I still need to contextualise these due to variable scoring rates over the course of a T20 match, as we progress from the Powerplay, through the middle overs, and into the death overs

My aim in this article is to explore which statistics in T20 are most consistent and predictive indicators of future value. At White Ball Analytics, Runs Added and Win Probability Added are often used to evaluate performances but whilst they are good descriptive statistics (i.e. they tell us what happened in the past), they may not be the best predictive statistics (i.e. they don't necessarily tell us what is likely to happen in the future)

Chris Gayle hitting a six as a symbol of batting performance

Predictive statistics for batsmen

The first feature of a good predictive statistic is usually that it is consistent from game to game. I arranged batsman performances into chronological order and divided them into 10-match blocks. And then looked at how different statistics varied from one 10-match block to another (a batsman would therefore require at least 20 innings to be included in this analysis)

One of the more consistent stats is the total number of balls a batsman faces. The correlation from one 10-match block to another was 44%. This suggest a definite positive, but not perfect, interrelationship across different time periods

For example, during 10 T20Is between 2012-2014, Kane Williamson faced 204 balls in total; and in the next 10 matches between 2014-2016, his time at the crease increased to 323 balls

Weak but definite positive correlation between balls faced in different innings by the same batsman. Correlation coefficient is 44%

Clearly, the number of balls that a batsman faces is unlikely to predict high levels of future performance. Facing a ball is almost always negative unless it is also accompanied by scoring runs. Indeed, total runs scored is also weakly correlated from one 10-match block to another (37%)

But we also know that some batsman are capable of scoring many runs whilst not actually increasing the team's chances of winning the match. So let's look at my aforementioned go-to statistic. Strike rate is much less consistent than the other two (22%) but it does have the advantage that a high strike rate is very rarely a bad thing

Correlations between the most common T20 statistics and Runs Added (RA) and Win Probability Added (WPA)

What I really want to do is look at the correlation between the strike rate in one 10-match block and a robust measure of performance in the next 10-match block - Runs Added (RA) or Win Probability Added (WPA)

By this method, strike rate is more predictive than balls faced and roughly on par with total runs scored (8% / 8%)

The best predictor of both RA and WPA is Runs Added itself. This is a decent validation that creating such a complicated statistic has some value. That the correlations are so weak speaks to the mercurial nature of batting - we already know that batting performances are incredibly inconsistent from one match to the next, even for the very best players

Broaden the search

Let's broaden the search to include some other statistics which aren't always easily found in the traditional scorecard... I looked at roughly 150 different stats but the vast majority were not great predictors of future value. The table below shows some of the best, alongside the other statistics we were looking at before

Best predictors of future value in T20 are: Runs Added, 6s / sixes, strike rate, wides

Some conclusions

  • T20 batting statistics are very inconsistent; Runs Added is still the best predictor of itself and yet correlation between 10-match blocks is weak at just 12%

  • Win Probability Added is so inconsistent, being sensitive to match situation and swings at the death, that it has almost no predictive value whatsoever

  • The total number of 6s hit is surprisingly consistent (31%). And if we add 4s into the mix then it is actually more consistent than runs scored (38% vs. 37%). Furthermore, total 6s is actually the best predictor of future WPA and the impact a player has on winning the game

  • From an analysis perspective, freehits are valuable events during a T20 innings. They seem to have high predictive power despite occurring so infrequently. This is probably because freehits are the only time that the outcome is almost completely independent of context; a pure battle between bat and ball

  • The number of wides that a player faces has some predictive power despite having no direct control over them. I expect that this is the result of bowlers fearing some batsmen and taking additional risks to avoid getting hit for a big score. Chris Gayle is a good example, facing more wides than any other batsman

So let's use this information to make some extremely naive predictions... As I write this article, most teams in the T20 blast have now played 2 or 3 matches. Below are the batsman leading three different statistical lists: sixes, strike rate, runs

My analysis suggests that the batsmen in the first list will contribute most to their teams run totals, that the second list will contribute most to wins, and that the third list will provide the least value of all

Below are a final few charts / tables that I created whilst doing the analysis but didn't use during the article:

05 runs per freehit.PNG