Series 9: A Numbery-Wumbery Breakdown (Part 1)
Joshua Yetman presents a statistical breakdown of how Series 9 performed overall.
It only seemed like yesterday when Series 9 first tumbled onto our screens, but here we stand at the other side of yet another series of Doctor Who. Hasn’t time gone fast! With 12 episodes, 6 stories, 7 writers, 6 directors, precisely 34,182 seconds of content (in other words, 569.7 minutes, or essentially 9 and a half hours), 2 stunning lead actors, 1 clockwork squirrel and a countless number of tears, it’s been a hell of a ride, and it received considerable acclaim from fans and critics alike.
Now, to business. For me personally, Doctor Who provides a double whammy of exciting and fascinating material these days: its episodes (of course) and the fascinating and abundant statistics buzzing around them. Over the past few weeks, DWTV held polls asking you to rate each episode of Series 9, and you responded in your tens of thousands. Over the past few weeks, I provided 12 articles summarising each episode’s statistics in turn, in order to make sense of the colossal 67,770 votes cast in total and – to try to – make sense and provide meaningful insight into how Series 9 was performing against the other series in the revival.
These three articles (yes, three! As the start of a trilogy, I had to resist calling this article The Stats Awakens) will complement the Your Verdict On series by seeking to provide a comprehensive, hopefully interesting examination and surmising of the statistics disseminated this year. Part 1 will condense the main results of Series 9 and then compare the series average – and other information – to that of other series in the revival, in order to determine exactly how Series 9 places in the grand scheme of things. Part 2 will consider the divisiveness and consistency of the series, along with the current state of the revival and the best writer of the series. Finally, Part 3 will break away from the DWTV poll results and provide an analysis of the audience and AI figures of Series 9.
Now, before I begin, I will declare a very important disclaimer that I seek to say once, so as not to repeat myself constantly throughout the article or in the comment section. Fundamentally, this article makes the assumption that it is fair and justified practice for initial community scores (specifically, the ones for Series 9) to be used in conjunction with long-term community scores (for Series 1-8) on the basis of making comparisons. Yes, there is an evident propensity for initial community scores to decrease over time (an effect colloquially referred to as “recent episode syndrome”), but it is not an universal rule, has had occasional violations in the past, and, ultimately, the initial scores reflect the community opinion of Series 9 at the current point in time. However, initial scores will be used where they exist.
(1) The full rankings
For reference, the Series 9 community ratings are restated below in a graphical format, complete with their overall placings in the 129-episode strong revival at the top of each bar:
(2) The standalone statistics and results of Series 9
The Series 9 mean was 8.171, a figure which puts the series 8.9% above the pre-Series 9 revival average. The median score was a much higher 8.380, implying the Series 9 scores were skewed towards the more positive end of the score scale, as they commonly are with poll results.
A grand total of 8 episodes scored above 8/10 (so two thirds of the series), and 3 episodes scored above 8.75/10, the threshold for an episode to be “acclaimed” as per my personal definition. Amongst these, Heaven Sent became the second highest rated episode in the history of the revival after The Day of the Doctor, scoring a sensational 9.344, even managing to beat the legendary Blink. Only one episode this series scored below 7/10: the poorly received Sleep No More, which unfortunately became the 9th worst episode of the revival (in a noteworthy inversion, The Zygon Inversion became the 9th best episode of the revival).
If you average together all the 1/10, 2/10, etc. votes respectively for each episode in the series, then the “typical” Series 9 episode looked like this:
So, the modal score – i.e. the score which received the highest overall percentage of the votes – for Series 9 was the revered, full-marks 10/10 option. Some may attribute this to community overpositivity and post-broadcast hype, but I believe this is a rather cynical point of view. The DWTV community is generally harsher than most other communities I have come across, and tends to use up much more of the scoring scale. It should also be noted that the last series to have a modal score of 10 was Series 5 back in 2010. Series 6, 7 and 8 all had modal scores of 8/10. Thus, we can reasonably conclude that Series 9’s success stems from its intrinsic quality, and not from an overly generous community.
From the average score distribution above, we can also calculate that 91.5% of people, on average, gave an episode of Series 9 half-marks or more (5/10 or more). The 2/10 option was the least popular score, as per usual.
The standard deviation of the “average” Series 9 episode was 2.197. Average is put in inverted commas because the calculation of this figure is not as simple as just averaging the standard deviation of each individual episode (in which case you’d get a figure of 1.981), but instead averaging all the 1/10, 2/10, etc. votes (as above) and then calculating the standard deviation. This is the better, more comprehensive approach. To clarify, standard deviation is a useful statistic that can be used in this context to measure the “divisiveness” (i.e. the extent of disagreement and discord amongst the fanbase) of an episode or series,. The higher the standard deviation, the more divisive the episode. We shall compare this standard deviation figure of 2.197 to other series in part 2 of this series. However, we can already deduce that Series 9 will be quite high up, as Series 9 encapsulates the two most divisive episodes in the history of the revival (Sleep No More and Hell Bent), a shocking result which occurred this year that led to the consumption of many hats.
Lastly, Heaven Sent commanded the highest proportion of 10/10 votes of the series with a staggering 71.7%, the second highest on record after the 50th anniversary special The Day of the Doctor. Sleep No More had the least 10/10 votes, with just 10.9%. Sleep No More also had the highest 1/10 votes at 10.6%. The lowest 1/10 votes did not go to Heaven Sent, however, but Under the Lake, at 0.8%.
(3) Considering the average of each series
As said before, Series 9 averaged 8.171 overall. Ignoring all specials, how does Series 9 currently rank against the other eight series of the revival? The official rankings using long-term scores for Series 1-8 are as such:
- Series 9 – 8.171
- Series 4 – 7.832
- Series 5 – 7.780
- Series 8 – 7.665
- Series 1 – 7.587
- Series 3 – 7.481
- Series 6 – 7.334
- Series 7 – 7.224
- Series 2 – 7.049
Expressing this data in a diagram:
For reference, if you did include Christmas specials (as I have done in the past, but I have decided to omit them this year as there is a large degree of subjectivity about which Christmas special goes in which series) then Series 5 would rank higher than Series 4. As a run of 13 episodes, however, Series 4 pips Series 5 for the silver medal. However, the gold medal is definitively claimed by Series 9, which hasn’t just smashed all the other series of the revival, but – under the current scores – is the only series to average above 8/10.
However, how about we consider initial scores (i.e. immediately after broadcast) instead of long-term scores in our comparison? Unfortunately, initial scores are only available as far back as The Impossible Astronaut (after a long, thorough search through the jungle archives of DWTV, I finally found them!), so we can only consider Series 6-9 on an initial basis. But even with this limited information, they paint an interesting picture:
- Series 6 – 8.308
- Series 9 – 8.171
- Series 8 – 8.094
- Series 7 – 8.012
Again, showing this graphically:
Again, these averages omit specials. Using initial scores, then, Series 9 ranks higher than Series 7 and Series 8 but, rather unexpectedly, Series 6 has the highest initial average of the four. This is potentially because of what I did to bring the data “up to scratch”, so to speak, as the Series 6 data was seriously lacking in places and so some strong assumptions had to be made during the analysis. Furthermore, Series 6 and 7 were polled on a 5 score system (i.e. the community was asked to rate on a scale from 1 to 5 as opposed to the 1 to 10 scale system currently used from Series 8 onwards) which incurred further adjustments. However, Series 6’s strong result kind of make sense in context, as it did have some episodes that would naturally perform exceptionally well immediately after broadcast. However, if the Series 6 initial average above was true, it would be represent the biggest fall from initial average to long-term average on record.
For the final statistic in this section, let’s return to the long-term scores, and consider how many episodes in each series had an average above 8/10. 8/10 is seen a pretty important threshold for an episode to beat, and Series 9 did very well in this regard with no less than 8 of its episodes scoring above 8/10. As for the rest of the revival:
Even with a 12 episode run as opposed to the 13 episode runs of most of the other series, Series 9 has the most episodes averaging 8/10 or more, taking that title away from Series 5. Series 7 pulls up at the rear, unfortunately, with just 2 episodes going over this threshold.
(4) Revised long-term projections
Before Series 9 even started, I developed a model capable of estimating what the Series 9 initial scores would be based purely on historical behaviour, e.g. extrapolating an estimate for Whithouse’s episodes this series based on the community reaction to his previous work. The model was – ultimately – a fairly catastrophic failure, delivering an average error of 8% from the actual results and presenting me with the clear conclusion that such projections will lack accuracy due to the sheer amount of unforeseeable events that have influence over the final score.
However, now that we have some initial scores for the series, I can revise my projections and actually base them on something tangible, evident and numerical instead of something no more sophisticated than speculation, and create some long-term projections, i.e. what the score of the episode will be in the future after any potential recent episode syndrome has taken effect.
Now, I considered two separate models to make these projections, and then I averaged the results together. Without wanting to go into too much detail due to the complexity of the models involved, one model looked at how older episodes have fallen over time here on DWTV, creating a line of best fit between how the episode scored initially and how it scored after a few years, and then employing this line of best fit for Series 9. The second model considers the IMDB community and its scores for Series 9, creating long-term projections for these scores using regression modelling, and then mapping them over to DWTV scores using a further model.
Anyway, without further ado, the official Series 9 long-term projections are:
- The Magician’s Apprentice – 8.461
- The Witch’s Familiar – 8.521
- Under the Lake – 8.312
- Before the Flood – 8.116
- The Girl Who Died – 7.210
- The Woman Who Lived – 7.541
- The Zygon Invasion – 7.786
- The Zygon Inversion – 8.814
- Sleep No More – 5.065 (this is the only projection I hugely disagree with; I think this episode has real potential to go up considerably, not go down).
- Face the Raven – 8.707
- Heaven Sent – 9.458
- Hell Bent – 8.182
If these scores were true, every episode in Series 9 would fall in the long-term except Heaven Sent and Hell Bent, which are both projected to go up, as illustrated below:
The Series 9 average, if these scores were true, would be 8.014, still making it the highest rated series of the revival by a huge margin. Hopefully in the future we can determine how accurate these projections are!
That marks the end of part 1 of this trilogy of articles. Join us for part 2 when we will consider the overall divisiveness of Series 9, how consistent it was, compare its divisiveness and consistency with that of other series, and then consider how the overall revival currently stands, how the Moffat era currently stands, and how the Capaldi era currently stands. I hope you’ve found this interesting!