The Soccer Factor Model
The Soccer Factor Model (SFM) (Andorra and Göbel, 2024) is a Bayesian regression model, intended to elicit the individual skill (\(\alpha\)) of a football (or for Americans: soccer) player. The model is inspired by the literature on financial asset pricing, in particular, by the performance attribution of portfolio managers (see e.g. Coggin, Fabozzi and Rahman, 1993; Fama and French, 2010; Berg and van Binsbergen, 2015).
In Andorra and Göbel (2024), we focus exclusively on strikers. The model is set up to predict the number of goals (\(n = 0,1,2,3+\)) a striker is expected to score in the upcoming match. As the observed performance (OP) of a player is a convolution of both individual player-specific skill and team-effort, the SFM is designed to disentangle the former from the latter with the use of factors. These factors are proxies for strength differential between the team that the player is playing for and the opponent to be faced in the upcoming match: $$ {P\left(Y_{i,m} = n \, | \, \alpha_i,\mathbf{X}_{i,m-1}\right) = g\left(\alpha_i,\mathbf{X}_{i,m-1} \, | \, \theta \right)} $$
where \(Y_{i,m}\) is the observed performance of player \(i\) in match \(m\) (denoted by OP in the paper). \(P\left(Y_{i,m} = n \, | \, \cdot \right)\) thus denotes the probability of player \(i\) scoring \(n\) goals in match \(m\), conditional on information encoded in \(\alpha_i\) and \(\mathbf{X}_{i,m-1}\). The conditioning set is composed of the skill of the player, \(\alpha_i\), and a vector of \(F\) factors, \(\mathbf{X}_{i,m-1}\), containing information that has been available prior to kicking-off match \(m\). The (potentially) nonlinear function \(g\left(\cdot\right)\), described by the vector of parameters \(\theta\) and features, for example, a collection of Gaussian processes (see e.g. Riutort-Mayol et al. (2022)) for the skill component, aggregates \(\alpha_i\) and \(\mathbf{X}_{i,m-1}\) to achieve a best guess for \(P\left(Y_{i,m} = n \, | \, \alpha_i,\mathbf{X}_{i,m-1}\right)\).
Goal of the SFM
The SFM thus delivers two notable insights at once:
a quantification of a player's skill (\(\alpha\)),
a prediction of the dependent variable, i.e. target, which in our case are probabilities for scoring a set number of goals.
The latter is what you can find on this website: scoring probabilities for upcoming fixtures of hundreds of players in Europe's top leagues. Those interested a comparison of the skill of prominent players that defined the first quarter of the twenty-first century of European football, see the section below.
Player Evaluation: Skill- & Performance Above Replacement
The concept of Skill Above Replacement (SAR) and Performance Above Replacement (PAR) is adapted from the framework of wins-above-replacement (WAR), a popular metric in Baseball or American Football (see e.g.: Baumer et al. (2015))
RLP: a player that you could think of as replacing your player of interest (note: this is not necessarily an average player!).
SAR: the number of goals per game that a given player is expected to score relative to an RLP, based solely on the skill of the players.
PAR: the number of goals per game that a given player is expected to score relative to an RLP, taking also the strength of the team into account.
If SAR > PAR, the player is actually undervalued, i.e. the performance of the team is dragging down the player’s performance. Said differently, based on skill alone, the player could actually perform better than what the plain statistics tell us.
For the Geeks who want to Dig Deeper
For additional material and access to an accompanying dashboard, check out Maximilian Göbel's website.