The Soccer Factor Model


The Soccer Factor Model (SFM) (Andorra and Göbel, 2024) is a Bayesian regression model, intended to elicit the individual skill (\(\alpha\)) of a football (or for Americans: soccer) player. The model is inspired by the literature on financial asset pricing, in particular, by the performance attribution of portfolio managers (see e.g. Coggin, Fabozzi and Rahman, 1993; Fama and French, 2010; Berg and van Binsbergen, 2015).

In Andorra and Göbel (2024), we focus exclusively on strikers. The model is set up to predict the number of goals (\(n = 0,1,2,3+\)) a striker is expected to score in the upcoming match. As the observed performance (OP) of a player is a convolution of both individual player-specific skill and team-effort, the SFM is designed to disentangle the former from the latter with the use of factors. These factors are proxies for strength differential between the team that the player is playing for and the opponent to be faced in the upcoming match: $$ {P\left(Y_{i,m} = n \, | \, \alpha_i,\mathbf{X}_{i,m-1}\right) = g\left(\alpha_i,\mathbf{X}_{i,m-1} \, | \, \theta \right)} $$

where \(Y_{i,m}\) is the observed performance of player \(i\) in match \(m\) (denoted by OP in the paper). \(P\left(Y_{i,m} = n \, | \, \cdot \right)\) thus denotes the probability of player \(i\) scoring \(n\) goals in match \(m\), conditional on information encoded in \(\alpha_i\) and \(\mathbf{X}_{i,m-1}\). The conditioning set is composed of the skill of the player, \(\alpha_i\), and a vector of \(F\) factors, \(\mathbf{X}_{i,m-1}\), containing information that has been available prior to kicking-off match \(m\). The (potentially) nonlinear function \(g\left(\cdot\right)\), described by the vector of parameters \(\theta\) and features, for example, a collection of Gaussian processes (see e.g. Riutort-Mayol et al. (2022)) for the skill component, aggregates \(\alpha_i\) and \(\mathbf{X}_{i,m-1}\) to achieve a best guess for \(P\left(Y_{i,m} = n \, | \, \alpha_i,\mathbf{X}_{i,m-1}\right)\).

Player Evaluation: Skill- & Performance Above Replacement

This concept of SAR and PAR is adapted from the framework of wins-above-replacement (WAR), a popular metric in Baseball or American Football (see e.g.: Baumer et al. (2015))

  • RLP:   a player that you could think of as replacing your player of interest (note: this is not necessarily an average player!).

  • SAR:   the number of goals per game that a given player is expected to score relative to an RLP, based solely on the skill of the players.

  • PAR:   the number of goals per game that a given player is expected to score relative to an RLP, taking also the strength of the team into account.

If SAR > PAR, the player is actually undervalued, i.e. the performance of the team is dragging down the player’s performance. Said differently, based on skill alone, the player could actually perform better than what the plain statistics tell us.