My last post looked at the highest number of goals scored in the final fixtures of a season, but this got me thinking: what about the highest scoring single matchday? In other words, what was the best Match of the Day ever? In this post I’ll examine trends showing a diminishing numbers of goals scored per matchday in the top division over recent decades and look at the effects of declining ‘fixture density’  the number of games played per day  to see if we can blame this on fixtures becoming more ‘spread out’ and the death of Saturday football in the top flight. (By the way, keep an eye out for inline footnotes in this and future posts, indicated by these numbers ^{1}.)
To begin, we’ll subset the top tier of the england
dataframe from the R package engsoccerdata
as in the last post:
devtools::install_github("jalapic/engsoccerdata")
library(engsoccerdata)
library(dplyr)
library(ggplot2)
library(DT)
# Update 'england' dataframe if there are new results
england < rbind(england, subset(england_current(), !(Date %in% england$Date & home %in% england$home)))
#subset alltime top flight and prettify the season variable for plotting (e.g. '2016' > '201617')
topflight < subset(england, tier==1) %>%
arrange(Date) %>%
mutate(season = as.factor(paste0(Season, "", substr(Season+1, 3, 4))))
Best ever Match Of The Day?
First the easiest question: most goals scored in a single day?
d1 < topflight %>%
group_by(Date) %>%
summarise(Season = Season[1], goals = sum(totgoal), matches = n()) %>%
arrange(goals) %>%
## # A tibble: 10 x 4
## Date Season goals matches
## <date> <dbl> <int> <int>
## 1 19631226 1963 66 10
## 2 19311128 1931 62 11
## 3 19601210 1960 60 11
## 4 19250926 1925 59 11
## 5 19300913 1930 59 11
## 6 19301213 1930 59 11
## 7 19550212 1954 59 11
## 8 19300104 1929 58 11
## 9 19260911 1926 57 11
## 10 19290223 1928 57 11
The 66 goals gifted to us on Boxing Day 1963, and just look at those results:
subset(topflight, Date=="19631226") %>%
select(Date, home, visitor, FT)
## Date home visitor FT
## 25309 19631226 Blackpool Chelsea 15
## 25310 19631226 Burnley Manchester United 61
## 25311 19631226 Fulham Ipswich Town 101
## 25312 19631226 Leicester City Everton 20
## 25313 19631226 Liverpool Stoke City 61
## 25314 19631226 Nottingham Forest Sheffield United 33
## 25315 19631226 Sheffield Wednesday Bolton Wanderers 30
## 25316 19631226 West Bromwich Albion Tottenham Hotspur 44
## 25317 19631226 West Ham United Blackburn Rovers 28
## 25318 19631226 Wolverhampton Wanderers Aston Villa 33
The only thing is, Match Of The Day (MotD) wasn’t on the air then  although it did start only 8 months later so it’s possible this Boxing Day bonanza helped push the BBC towards televising football highlights. Before we continue, I still feel obliged to answer the original question and find out what is the highest scoring matchday since MotD began, i.e. postAugust 1964?
## # A tibble: 10 x 3
## Date goals matches
## <date> <int> <int>
## 1 19651211 56 11
## 2 19641205 52 11
## 3 19661001 50 11
## 4 19820925 50 11
## 5 19650918 49 11
## 6 19640926 48 11
## 7 19651016 48 11
## 8 19670506 47 11
## 9 19930508 47 9
## 10 19661105 46 11
The slightlylessexciting sum of 56 goals scored in December of the 196566 season:
subset(topflight, Date=="19651211") %>%
select(Date, home, visitor, FT)
## Date home visitor FT
## 26190 19651211 Aston Villa Everton 32
## 26191 19651211 Blackburn Rovers Northampton Town 61
## 26192 19651211 Blackpool Stoke City 11
## 26193 19651211 Fulham Burnley 25
## 26194 19651211 Leeds United West Bromwich Albion 40
## 26195 19651211 Leicester City Sheffield Wednesday 41
## 26196 19651211 Liverpool Arsenal 42
## 26197 19651211 Sheffield United Nottingham Forest 11
## 26198 19651211 Sunderland Manchester United 23
## 26199 19651211 Tottenham Hotspur Chelsea 42
## 26200 19651211 West Ham United Newcastle United 43
So teams are scoring less nowadays, right?
The first thing I notice when looking at the top table is that almost all of the highest scoring matchdays are from over 50 years ago; in fact, you have to look all the way down to the 83rd highest entry to find a fixture from the 1970s or later. If we plot the data we can visualise this decline in goals per matchday (red line shows a linear regression fitted to the data, blue line a smoothed Loess curve):
So why might this be happening? The most obvious explanation would be that there are simply fewer goals scored nowadays; after all, the modern game has evolved a lot in the last few decades in terms of both physicality and tactics. Let’s check to see if the data supports this idea:
d2 < topflight %>%
group_by(Season) %>%
group_by(Season) %>%
summarise(goals.sum = sum(totgoal), goals.per.game = sum(totgoal) / n(), games = n())
The total number of goals scored has definitely declined since the 1950s…
…but standardising for the amount of games played per season shows this decline is less dramatic in terms of goals per game.
In fact, it looks like the period from the 1950s to the mid1960s was exceptionally high scoring, but that things have been relatively stable since the 1970s (there’s actually a trend for a slight increase):
So the number of goals per matchday looks to be decreasing but the number of goals per game doesn’t? Surely then that suggests there are fewer games per matchday in the modern game compared to the pre1970s… It seems to me nowadays that most weekends in the EPL feature only a handful of Saturday fixtures with a few being played instead on Sunday and sometimes Monday and/or Friday too. (The penultimate week of this season saw games played on Friday, Saturday, Sunday, Monday, Tuesday, Wednesday and Thursday!) Also, 3pm5pm Saturday kick offs are now the only Premier League fixtures not to be televised live in the UK, a blackout introduced by the Football League in the 1960s which aimed to protect attendances at lower league games.
Fixture density
I can’t recall ever seeing any data to support this idea though so let’s investigate using our historical results. A straightforward way of measuring whether fixtures have become more ‘spread out’ could be to calculate ‘fixture density score’ by dividing the expected number of games per season by the number of unique matchdays that season. So in the modern 20team EPL featuring 38 gameweeks, the highest score would be 10.0 if all 10 matches each gameweek were played on a Saturday (380 / (38 × 1) = 380 / 38 = 10.0), and the lowest possible score would be 1.4 if the 10 matches each gameweek were spread out over all 7 days that week (380 / (38 × 7) = 380 / 266 = 1.43). To make it more intuitive, let’s divide this number by the expected number of matches per gameweek (in the example above, 10 games per week) to let our score fall between 0 and 1 ^{2}.
fixdens < england %>%
group_by(division, Season) %>%
summarise(gameweeks = (n_distinct(home)1) * 2,
matchdays = n_distinct(Date)) %>%
mutate(fixdens = gameweeks / matchdays)
Let’s visualise fixture density over the years in the top fight first:
There looks to be a trend of decreasing fixture density since the 1950s, and a particularly sharp dip since around 1992. Is this something to do with TV rights and increasing commercialisation of the top division in England? Let’s have a look at fixture density in the lower divisions to compare ^{3} ^{4}:
fixdens < fixdens %>%
mutate(division2 = recode(division, "3a" = "3", "3b" = "3"))
It looks like the second tier (presentday EFL Championship) has experienced a similar decline in fixture density since the 1950s, although this seems to be offset by increased fixture density in the 3rd and 4th tiers.
What about the distribution of matchdays across days of the week  can we see whether this decline come from a decrease in Saturday fixtures? We can use the wday()
function from the handy lubridate
package to infer days of the week from our dates.
matchdays < england %>%
mutate(day = lubridate::wday(Date, label=TRUE)) %>%
group_by(division, Season) %>%
mutate(games = n()) %>%
group_by(division, Season, day) %>%
summarise(prop = n() / games[1], games = n()) %>%
mutate(division2 = recode(division, "3a" = "3", "3b" = "3"))
It looks like there is a decline in the proportion of matches played on a Saturday in the top two divisions; in the top flight, from around threequarters in the 1950s to around half now. And this seems to be accounted for by an increased proportion of Sunday fixtures in the top division and Tuesday fixtures in the second division. The lower leagues seem to tell the same story except with an increasing proportion of Saturdays being replaced by Tuesdays, but I’m thinking that this is probably due to an increased occurrance of two fixtures per week due to the increased number of teams in their leagues. So let’s look at the absolute number of games instead of proportions:
Just as I suspected: there appears to be a greater number of total number in the lower leagues but the number of Saturday fixtures in the top division has definitely been declining since around 1980  although we can see now this is not happened in the second tier.
In conclusion…
So whilst the number of goals per game is much the same in the EPL now as it was during the First Division in the 1970s, the absolute number of goals  the important thing  scored per matchday is decreasing as a result of decreasing fixture density, most noticeable fewer Saturday fixtures. Here’s the last figure to show that Saturday’s MotD is offering up less goals.
#subset Saturday fixtures only
d3 < topflight %>%
mutate(day = lubridate::wday(Date, label=TRUE)) %>%
subset(day == "Sat") %>%
group_by(Season) %>%
summarise(games = n(), goals = sum(totgoal))

Inline footnotes pop up here thanks to a jQuery plugin, bigfoot. ↩

The equation explicitly: for a league consisting of a number of teams, , fixture density is calculated as the number of games per season ( ) divided by the number of unique matchdays () divided by the expected number of games per gameweek ( ). This simplifies down to expected number of gameweeks per season divided by the number of unique matchdays per season ( ). ↩

The roots of the third and fourth tiers of English football are a bit of a nightmare: the Third Division lasted only a single season before the league split into North and South divisions in 192122 and again merged in 19581959, with the top half of the league that season going on to form the new Third Division and the bottom half the new Fourth Division. To save on headaches, I’ve therefore combined division
3
(192021 Third Division + new Third Division) with3a
(Third Division North) and3b
(Third Division South), and left division 4 alone. ↩ 
I’ve sprung for a third order polynomial to fit the data as this curve appears to fit the data better and seems less biased by single outliers than a Loess curve. ↩