Trialling a T20 Captaincy Metric

The importance of captaincy in cricket is greater than in almost any other sport. In football, for just one example, the captain has almost no influence whatsoever over team selection or strategy. This is not true in cricket. Not only are cricket captains involved in most strategic and tactical decision-making, this also comes with more responsibility for the other players in the team, their mindset and their morale. Measuring the value of a good captain is incredibly difficult

One aspect of captaincy that might be measurable is the decisions that they make on the field. Here, the decision-making is observable by an outsider. Obvious, even, in the case of bowling changes. They occur 20 times in an innings and at regular intervals. However, my experience is that my evaluation of their decisions is extremely susceptible to personal biases. If I already think that they are good captains, then I might interpret a risky decision as a calculated gamble, if I already think that they are bad, then I might interpret the same decision as a foolish blunder

Evaluations will always be clouded by hindsight bias. The outcomes can persuade us whether decisions were good ones or bad ones. In their first match of IPL 2018, Dinesh Karthik introduces Nitish Rana to bowl at AB de Villiers and Virat Kohli. It paid off, both gone in successive balls. To the neutral, Karthik had taken a calculated risk and was rewarded. To a KKR supporter, it was a stroke of genius. To an RCB fan, bloody lucky

What we really need is an objective benchmark to measure against. I have been experimenting with a model to predict what an average captain would do based on a specific match situation. I built a Neural Network that can anticipate the correct bowling decision 50% of the time

What if we assume that the model is always right? It definitely isn't... but indulge me for a moment. Let's assume that on the 50% of occasions that the model predicts incorrectly, it is actually the captain who has erred. In the heat of battle, his decision-making is impaired and he has miscalculated. Then we can measure a captain based on how often they deviate from the model

This isn't completely outlandish. I would argue that an average captain is probably a good captain. There seem to be enough obvious mistakes during T20 matches that teams could benefit from simply finding somebody who can consistently make sensible decisions. There is a small correlation between how closely captains follow the model and their win percentage. No model will ever be infallible, especially with no insight into pitch conditions, but it doesn't need to be. It only needs to be correct most of the time and, importantly, it needs to be objective

I looked at how often each captain in franchise/club cricket made a bowling change despite the model predicting a more than 60% chance that a different bowler would be picked. Or in other words, when there was a obvious bowling choice (60%+) and the captain went in a different direction

I also measured how the win probability changed during those overs. If the captain saw a big swing in their favour then maybe we should be slightly more forgiving. Perhaps it was one of those strokes of genius that the model did not see coming. And if it turns out to be a blunder then the captain should be penalised for making a predictable mistake

The chart below shows the top 100 captains in franchise/club cricket who have made the most bowling changes in my database since 2011. The x-axis shows how often they make 'surprising' decisions - either they didn't make the change that the model thought was more than 60% likely to happen, or they did make a change that the model thought was less than 5% likely to happen. The y-axis then tells us what the win probability change was for those captains when they made these 'surprising' decisions

captain bubbles version 2.PNG

If we take the results at face value then we can see that Dhoni, of the top 5 decision-makers, is the most likely to make a left-field decision (13% of the time). And for each of those left-field decisions, the subsequent over sees a 1% reduction in win probability for his team. At the other end of the spectrum is Rohit Sharma who only disagrees with the model 9% of the time

However, I have extremely little faith in the validity of these results. The model is not yet good enough at imitating an average captain to trust them. It does identify decisions that look strange in retrospect, but it also highlights enough perfectly defensible decisions that the results in the chart above are mostly random. For an example from either side of the coin, let's look at two of Kohli's decisions from IPL 2018...

Decision 1: Bowling Mohammad Siraj in over 18 against Mumbai Indians. Going into the final three overs, with Hardik Pandya and a well-set Rohit Sharma at the crease, Chris Woakes had 2 overs remaining, and Mohammed Siraj had just 1 over remaining. The model gave a 63% chance that Woakes would bowl over 18. Instead, Kohli went for a Siraj-Woakes-Anderson combination and suffered 21 runs from a disastrous last over. This time, Kohli doesn't actually get penalised as my metric doesn't look at the damage caused in Anderson's over - it only counts the over from Siraj, which was actually quite good

Decision 2. Bowling Washington Sundar in overs 12 and 14 against Delhi Daredevils. The premier spin bowler of the team, Yuzvendra Chahal, had already bowled two overs in the Powerplay, successfully sent in to dismiss Jason Roy (who has a terrible record against Chahal). It was therefore completely reasonable that Kohli should switch to a different spinner during the middle overs. The model disagreed, becoming more and more certain that Chahal would come back into the game: 54% in over 10, 68% in over 12, 75% in over 14

Another common flaw in the model is assuming the wrong role for each bowler. Many disagreements between Kohli and the model were about precisely which pace bowler should bowl. It is extremely difficult from the data alone to determine which one should be the team's death bowler. Ajinkya Rahane was another captain that the model disagreed with a lot (way to the right in the chart above). This was caused mostly by situations involving Krishnappa Gowtham. Rahane often used him in the Powerplay, rather than in the more traditional middle overs role that the model expected from the team's best spin bowler

Clearly a lot of room for improvement. This idea is still in its infant stages. A more mature version of the model would hopefully be better at identifying strange bowling changes. Good enough and we might have a useful metric. One that specifically focuses on the actions of the captains alone, stripping out the role of the other players and the vagaries of T20 outcomes

Here are a few ways that I might move towards a more credible version:

  • Change the target variable - currently the model is trained to rank the best seamers and spinners in the team and then bases its decisions on those rankings. Three possible alternatives:
    1. Use a different method to rank bowlers (currently the model ranks on past performance but it could conceivably work just as well by ranking on how often the bowlers are used in the past)
    2. Order them randomly - this would destroy some information but helps to avoid any over-fitting
    3. Predict something completely different, like the bowling style or simply spin vs. pace
  • Change the modelling approach - I used a Neural Network because I didn't want to spend too much time imposing a structure on the data. Decision tree modelling has that same advantage and isn't so dependent on the huge datasets that Neural Networks typically require

  • Use a different approach to attribute credit / penalties for 'surprising' decisions. As we saw above, Kohli was not properly penalised for confusing the order of his bowlers at the death against Mumbai, because the damage happened two overs after the first decision was made. I could potentially group overs or look several overs into the future or take different approaches depending on when the decision occurs

  • Create separate models for different situations. This is the manual effort that I was trying to avoid. Unfortunately, this is also where the biggest improvements are likely to be found. It probably makes sense to structure the model to predict bowling changes at the death differently to the model that picks bowlers through the middle overs. That way, the modeller (me) can point the computer towards the data which is likely to be most pertinent to each specific scenario

  • Develop a completely different metric. I have some more ideas and I'm interested to hear thoughts from other people too. One theory I have is that the best captains should be able to find a Nash equilibrium between bowling styles. Seam and spin should deliver the exact same impact as each other. Otherwise, one form of bowling is being under-utilised. Good captains should have a small performance difference between when they use their spinners and their seam bowlers