Predicting bowling performance

I have already written about the consistency (or inconsistency) of various batting statistics and how well they predict future performance. In this post, it is the turn of the bowlers

This time, I will jump quickly into the results rather than spend too much time walking through the key concepts. The basic aim is to identify which statistics in T20 are most consistent and predictive indicators of future value (as measured by Runs Added and Win Probability Added). For a more complete explanation, read what I wrote already about batting, as the approach is exactly the same as it is here

Predictive statistics for batsmen

The table below shows a mixture of traditional statistics used widely throughout cricket (e.g. bowling average) and some bespoke statistics that I found to be good predictors of future bowling performance (e.g. % deliveries bowled wide)

The percentage values show the correlation between a bowler's figures in one block of ten T20 matches and the subsequent block of ten T20 matches. For example, the first column shows a statistic's correlation with itself - how consistent it is over time

The next two columns show the correlation between a statistic and future performance, as measured by Runs Added (RA) and Win Probability Added (WPA). The final column is simply an average of the previous two

Correlations between various bowling statistics and future performance (as measured by RA and WPA)

Some conclusions

  • T20 bowling performance (RA & WPA) is far more consistent than batting performance. My hypothesis is that this is because a bowler has more influence over the outcome of a delivery than the batsman (they chose what they bowl whereas the batsman usually just reacts). So a good bowler can adapt their game to the situation. I proposed a similar theory when looking at whether batsmen or bowlers win tournaments

  • Bowlers who concede a lot of wides are likely to continue bowling wides (48% consistency) and this has a predictably negative impact on both RA and WPA. Wides are the best predictor that I found for both performance measures

  • Byes are often the fault of the bowler. The number of byes a bowler concedes is relatively consitent from one 10-match block to the next (27%). This seems to suggest that bowlers who have a high degree of control when bowling (i.e. few wides and byes) are valuable players


It was not surprising to see wickets as one of the best predictors of both measures of performance (although I was expecting economy to be an even better predictor). But it was also interesting to see which types of wickets are the best predictors of future value and which types are most consistent from one 10-match block to another

Caught and bowled appears to be the best predictor of future value but we should bear in mind that there is only a small sample. This is also why the stat is so inconsistent - caught and bowled doesn't happen often in T20

Stumpings are the most consistent wicket type. Bowlers who get wickets in this way are likely to continue to do so, whilst those who do not are unlikely to suddenly start. This is largely a function of the fact that spinners are generally more likely to pick up stumpings. That it is a decent predictor of future value is a bit more surprising especially as it is unlikely to capture quality pace bowling

The seemingly peculiar result is that bowlers who see batsmen dismissed through run outs off their bowling tend to deliver value in the future, despite having little to do with this outcome. My suspicion is that this merely reflects the fact that death bowlers are both the most likely to see run outs and the most likely to be the best bowlers

The other interesting feature of the table is that the number of wickets caught is not very predictive of either RA or WPA. Bowlers who get hit for a lot of runs might occasionally luck into a catch on the boundary, weakening the correlation between catches and future value

There is clearly a lot more digging that can be done into these correlations to identify the true causal relationships between variables. Better data could also massively enhance these insights; for example, richer data could allow us to differentiate between caught behind and caught on the boundary. This is enough for now, however, and helps me to understand what historic information is most important to factor into my predictive models