Simulating T20 Matches: Pinching

The debate on whether Sunil Narine should open is one of my favourite in T20 cricket. I stand resolutely on team ‘yes’. It is hard to imagine situations in which a team suffers from losing his wicket and, at the same time, extremely easy to imagine his team gaining an immediate advantage. First-hand experience of Narine scoring 50 from 19 balls, against RCB in their first game of the 2018 season, has me entrenched even more firmly at my position… but standing on team ‘yes’ does not mean that I don’t still have doubt. It wouldn’t be a debate if I didn’t see merit to arguments on both sides

 Sunil Narine SR in T20 by batting position. Highest SR when he opens the batting

We simply don’t have enough evidence to say, with 100% certainty, that pinch-hitting with Narine is the optimal strategy. My database contains 50+ matches featuring him as an opening batsman. That is a good sample in T20 cricket but not enough for 100% certainty. It is enough to tell us he can do remarkable things with a bat but not enough to be confident that another batsman couldn’t produce more value over the course of an entire innings

One argument in favour is the low value of his wicket, low enough to be risked at the top of the order. However, irrational or not, going a wicket down has an impact on how the rest of the batting order approach their innings. They should be just as aggressive but, psychologically, they might feel that the tail is slightly more exposed with one on the scoreboard already. They would no doubt feel more secure if a batsman such as Narine was still lurking in the tail before they reached the other bowlers

One argument against is that bona fide batsmen have more reliable routes to producing value. Most have the ability to achieve the exactly same results, and the reason they don’t is that can do even better. Perhaps Chris Lynn in the Big Bash is a good example of what happens when a brilliant batsman throws caution to the wind - the results can be spectacular. But a top batsmen knows that a slow start can bring rewards later. Shane Watson’s innings in the IPL final is a case in point. They can play each delivery on its merits rather than attack every time, like Narine must in order to be productive

My theory is that the best use for Sunil Narine is as an optional #3 or #4 batsman. Only if early wickets fall does he move up the order. On the other hand, if the top order makes runs then batting depth isn’t as much of a problem anyway and Narine isn’t needed. Unfortunately, I have almost zero evidence to support my theory. My database contains only 2 innings where Narine has played at #4 and in neither occasion did he see any of the Powerplay. So whilst my theory sounds plausible, it is yet untested

We need a tool that allows us to play these ‘what if’ games. Something to mitigate the poor sample sizes in T20 cricket. As a solution, I have built a simulator that will play out T20 games ball-by-ball, thousands of times, so that we can see the impact of playing in a certain way, or with a certain line-up. This means that we can set up a game between the Kolkata Knight Riders and the Sunrisers Hyderabad (say) and simulate what would happen with Narine at different spots in the line-up. We could even include a rule that says, “if a wicket falls within the first two overs, or two wickets within the first four, then send in Narine at #3 or #4; otherwise, keep him at the #8 spot”. This would then play out my theory above

The simulator is a way to move from treating each game as a datapoint, to treating each ball as a datapoint. Rather than look for entire games where Narine has started at #4, we look at all the deliveries that he has faced in similar situations. He has never batted at #3 but we have plenty of evidence to suggest what Narine does in the first 10 overs, at various scorelines, against various bowling types, with various batting partners, batting first and batting second. We still need to be careful extrapolating into the unknown - we have little evidence to suggest how he might fare when facing his first ball in the 4th over, for example - but the simulator allows us to extrapolate much further than we can by analysing historical results on a game-by-game basis

The ideal would be observing these thought experiments pan out in real-life. A team desperate enough might be willing to trial some radical strategies. Such as the Worcestershire ‘no-wicketkeeper’ tactic in 2015. But in the absence of the Oakland A’s of cricket, and sample sizes boosted by a 162-game season, a simulator will have to do

As always, there are an overwhelming number of factors to consider. The England selectors rarely use basic averages to assess performance in limited overs cricket, because the raw numbers depend so much on the context of each individual innings. We have rain-affected games and occasions like the CPL playoff between Trinbago and Guyana, in which one team had nothing to play for whilst the other needed to chase a target within 16 overs. Sometimes the personal incentives of individual players are misaligned with the objective to win, even before considering the potential implications of match-fixing on a game… I can account for some of these factors in the modelling - the context of an innings, who is bowling, player-specific tendencies, removing rain-affected matches from the sample. Other factors are impossible to address and we are left to hope that sample sizes are large enough to drown out these anomalies


Another challenge in T20 is to accurate assess overall team strength based on historical data. And how likely one team is to beat another, as a corollary. Previously, I had trialled several different approaches to predict the outcome of individual games. I had an elo-based model (here is an example, but not mine), models based on specific team strengths (Powerplay batting, death bowling etc.), and models that reach an evaluation of team strength by taking my own measures for player value and adding them up in some way

Generally, the results were always disappointing. I usually treat the markets on Betfair as the best, objective, publicly-available source of truth. Even so, the perfect model should be able to beat those markets. I never expected those models to beat the market, but I did hope my predictions would at least correlate with the implied probabilities on Betfair. Yet even with the best models, I always felt that I could outperform them on a hunch, simply by looking at the names in each lineup. None of my models could beat a hunch. And there are many people out there with better hunches than me

The new simulation is a significant upgrade on those previous efforts. Whilst the primary objective was to create a tool that could conduct thought experiments such as the Narine example outlined above, I also wanted something that could accurately predict win probability from the very start of a match. These two objectives are aligned: if I want to calculate the impact of pinch-hitting on a team’s chances then my estimation of those chances needs to be somewhat accurate. Unfortunately, they are not aligned perfectly and I sometimes needed to compromise on one to achieve the other. The main point of difference was in the number of moving parts involved in the modelling - I need as many as possible to achieve the flexibility required for those thought experiments but, sadly, each additional moving part would usually reduce the overall accuracy

Forced to choose, the predictive ability was always sacrificed – I was more interested in strategic thought experiments than accurate prediction. But the predictions were still encouraging. A huge amount rests on having good estimates of player ability, but the model did OK on its first run out the box

I plan to explore my success vs. Betfair markets in a future article. For now, the short story is that the simulation model performed well with newly constructed teams, before the world had a chance to see the players together, and worse as the season progressed, as bettors developed a sense for which teams performed well as a collective. As I continue to use the model in future, I will be able to adjust some of the many moving parts in the model apparatus and keep up as markets mature throughout a season


In the meantime, let’s revert to the Narine question…

I set up two scenarios pitching the Kolkata Knight Riders against an average IPL team (some good batsmen, some good bowlers, variety of bowling styles). In the first scenario, Sunil Narine was at #2, opening with Chris Lynn. In the second scenario, Robin Uthappa was promoted to open and everybody else moved up a spot, with Narine slotting back in at #7 behind Andre Russell and Shubman Gill. I also played out a very similar scenario between the Trinbago Knight Riders and an average CPL team

2 team sheets.PNG

After simulating each match-up 6000 times, the results suggest very little difference between pinching and not, with not pinching actually holding a slight, but non-significant, advantage. Accepting the results at face value, we would be forced to accept that pinching is generally not a great idea. If it doesn’t work with Narine, then it probably isn’t going to work with many other bowlers either. However, there is enough real-life evidence to believe that the model dramatically under-estimates his Powerplay impact in these simulations…

The fact that Chris Lynn features in both the KKR and TKR line-ups has a big effect. My model treats Lynn as some sort of super-duper-pinch-hitter (approx. 180 SR) and so the impact of Narine on the run rate is softened. Anybody who watched KKR play in IPL 2018, knows that Chris Lynn actually took a much more reserved approach than he did in the previous season and when he plays for the Brisbane Heat. Partnering Narine enabled him to realise his value later in the innings, just like many other top batsmen. I also set up an extra KKR simulation in which Lynn was replaced by a second Robin Uthappa (the beauty of a computer simulation is that you can have two of the same person in your team). Without Chris Lynn, the difference between playing Narine at #2 and #7 completely disappeared; both scenarios were equally successful (67.8% vs. 67.9%)

 Narine is under estimated by the simulation. We can see this because his Powerplay Strike Rate in real life is much higher

Whilst the model might overestimate the abilities of Chris Lynn (the simulated opposition threw him a few too many RA bowlers), it might also underestimate the effect that Narine can have in the Powerplay. His average simulated innings was 22 runs from 15 balls (approx. 150 SR). We have enough evidence from real life to suggest that is too low: Narine has a 160 SR in the Powerplay in 45 matches as an opener across major T20 leagues, and 177 SR in the IPL only. Partly, the simulations underestimate Narine because they struggle with extreme phenomena (more on that in a bit) but they also underestimate him for the same reason that they overestimate Lynn: he faced more RA bowling in the simulations than usual. This is what makes the Lynn-Narine partnership so effective at both KKR and TKR; opposition captains limit the amount of RA bowling against Chris Lynn and so they are forced to deploy more spin against Narine

The model also failed to account for the fact that Narine was playing in the IPL. The simulations do make some adjustments for playing at Eden Gardens… but those adjustments are applied equally to all players. Hitters like Narine benefit more from the batsman-friendly conditions in India and his rating needs to be boosted in future. However, in the CPL, with the Trinbago Knight Riders, the results of the simulation are probably more accurate. Narine had a simulated 122 SR in the Powerplay at Queen’s Park Oval, just 10 points lower than his real-life 132 SR

Sunil Narine is a T20 outlier. The algorithms used by the model are trained to treat outliers with scepticism. They have a tough time believing that Narine is real, that he can possibly hit sixes and get caught as often as he does. But we know better, because we understand that the context surrounding Narine the batsman is unique. The algorithms struggle because they start with the basic assumption that he should be like all the other opening batsmen and they are slow to accept the reality. In the future, I can teach my model to know better, but, for now, I think that it is reasonable to continue believing that Narine should pinch hit for both KKR and TKR. However, the results of these simulations have led me to soften my stance - there are clearly some situations where a more traditional batting line-up is more effective


The simulations suggest that pinching works better when chasing than setting. For both KKR and TKR, there was no significant difference in winning percentage when they batted second (approx. 75%) but not pinching had a 3% advantage when they batted first. We can also see that pinch hitting as the best strategy when chasing an extremely high score in the IPL

4 results by chase.PNG

This result surprised me. Pinching is essentially a risk averse strategy - you avoid risking the wickets of your best batsmen early. But when you need to produce a good score, logic would suggest that getting your guns in early make sense. It appears that the habits of batsmen are so ingrained psychologically that they do not play aggressively enough from the start when chasing a huge score. They need an early boost from a pinch-hitter while they rev up the engines. On the other hand, chasing low scores, perhaps there is something to the argument that the top batsmen are able to control the run rate - keeping both the run rate manageable and plenty of wickets in hand - whilst the Narine blunderbuss lacks the same level of precision

For KKR, I also simulated a scenario to test whether Narine as an optional #3 or #4 is a good strategy. The results vindicated my theory. Whilst the model suggested that Narine opening is less effective when batting first, it had no problem introducing Narine early in the event that the top order get dismissed quickly. And we have already seen that the simulator tends to underestimate Narine’s Powerplay impact… in reality, the optional pinching strategy would probably significantly outperform sticking to a traditional line-up


all resutkls.PNG

So far, I have focused on pinch-hitting solely from the perspective of teams with the luxury of playing Narine. For both the Kolkata and Trinbago Knight Riders, the evidence seems to suggest that opening with him is usually the right call in the IPL, and also in the CPL, unless chasing either low score, or when the opposition bowling attack is heavy staffed by quality seam bowlers. The simulation suggests a close call, but we have evidence from real life to believe that the model underestimates his true Powerplay strike rate

SHAPE.PNG

We can also use the model to test whether other teams might benefit from promoting a different bowler to the top of the order. In many ways, this is actually more useful, as we have scarce real-life evidence to test whether or not other bowlers can achieve the same effect as Narine. Let’s look at the Rajasthan Royals, who very briefly experimented with Jofra Archer at the top of the order against RCB in their last match of the season

One input into the model is a ‘star-rating’ which I use to assess each players ability (based on historical performance but tweaked if necessary). The ratings also vary by bowling type but that doesn’t matter so much for this experiment. Jofra Archer has 2.4 stars, actually pretty good for a bowler. Whilst Narine initially gets a ridiculous 5 stars, the model definitely does not treat him like a typical 5-star batsman. Each player also comes with a second input, which I call their ‘shape’. It shows Narine is more than twice as likely than the typical 5-star batsmen to be caught on any given ball and 47% more likely to hit a six

What we can do is assume that we have somehow trained Jofra Archer to ‘play like Sunil’. We can manually change his ‘shape’ to match the ‘shape’ of Narine when opening the batting. We then re-run the simulation another 12000 times, 6000 with him opening the batting and 6000 in the normal order

Rajasthan were worse with Archer at the top, winning 50.5% of the time against an average IPL team. With Tripathi and Rahane opening, as they would have against RCB, that win percentage increased to 61.4%. A statistically significant jump of 10.9%. The simulation suggests that, despite dramatically increasing his propensity to hit sixes artificially, he still doesn’t hit enough sixes (4.1%) or fours (12.8%) to make the experiment work. The only real-life evidence we have backs this up: against RCB he faced 4 balls and scored nothing. Each delivery in the Powerplay is too valuable to waste on a poor batsman - even if they are highly focused on getting sixes and getting out


I’m excited by the range of questions that the simulation model can answer and the range of insights that it can provide. The optimal strategy for DRS reviews, optimal batting orders, optimal bowling allocations. Whether or not batsmen or bowlers are overvalued by franchises. The value of an all-rounder. The value of a keeper. The impact of a deep tail to the batting line-up. Does a single superstar batsman or a single superstar bowler have a bigger impact? And can we measure that impact team by team? What is the best recruitment fit for an existing team? The cost of the wrong person being on strike to start the final over. The optimal strategy to maximise runs when only one true batsman remains. When to retire a player that can’t get going. Which teams over- or under- performance given the players selected? What would an innings of one hundred balls look like…

Many of these it is possible to approach other ways. It is also possible to tackle them all via specific simulations. Mostly, it makes sense to do both. Using the simulator to trial scenarios that don’t often get seen and use actual match data to check that the outputs correlate with reality