Using a Bivariate Poisson (shared component) to model football matches and Elo as a covariate, a Jupyter notebook (Final_Project.ipynb) then performs Monte Carlo simulations to generate full-season distributions (points, positions, title/top-4/relegation probabilities). Includes lightweight backtesting.
- π§ Overview
- ποΈ Project Structure
- βοΈ Installation
- π Data
- π Quickstart
- π‘ What It Does
- π Outputs & Visualizations
- πΌοΈ Project Poster
- π§ Configuration
- π οΈ Troubleshooting
- πΊοΈ Roadmap / Future Work
- π References & Data
- π€ Contributing
- π License
- βοΈ Contact
-
Scoring model: To capture goal dependence, using a bivariate Poisson with a common latent component
$$\lambda_3$$ . - Team effects: attack (Ξ±), defence (Ξ²), and team-specific home advantage (Ξ·_h). We apply ridge regularization and sum-to-zero constraints to Ξ± and Ξ²; Ξ·_h is currently unpenalized.
-
Elo with decay: pre-match
HomeElo/AwayElo; Elo ratio scales goal rates via exponent Ξ³. - Monte Carlo: To simulate distributions over points, positions, and important outcomes, model entire seasons.
- Backtests: Utilizing historical fixtures and points data, compare the simulated mean points to the actual to obtain MAE.
project-acm40690-monte-carlo-simulation-of-epl/
βββ data/
β βββ E0_19_20.csv
β βββ E0_20_21.csv
β βββ E0_21_22.csv
β βββ epl_22_23_fixtures.csv
β βββ epl_23_24_fixtures.csv
β βββ epl_24_25_fixtures.csv
β βββ epl_25_26_fixtures.csv
βββ points/
β βββ epl_2022_23_points.csv
β βββ epl_2023_24_points.csv
βββ images/
β βββ heatmap.png
β βββ points_boxplot_output.png
β βββ relegation_output.png
βββ Final_Project.ipynb # main notebook
βββ README.md
βββ LICENSE
Use a clean environment (venv or conda).
# clone
git clone https://github.com/your-org/project-acm40690-monte-carlo-simulation-of-epl.git
cd project-acm40690-monte-carlo-simulation-of-epl
# create & activate venv
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
# source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txtExample Requirements.txt:
numpy>=1.24
scipy>=1.10
matplotlib>=3.7
pandas>=2.0
seaborn>=0.13
jupyterWe used football-data style match CSVs with:
Date(DD/MM/YYYY or parseable),HomeTeam,AwayTeam,FTHG(home goals),FTAG(away goals)
Example:
Date,HomeTeam,AwayTeam,FTHG,FTAG
10/08/2019,Liverpool,Norwich,4,1
10/08/2019,West Ham,Man City,0,5
Backtest βactual pointsβ files:
- Columns:
Team,Points(the notebook renamesPoints->ActualPtsinternally).
Team,Points
Manchester City,91
Arsenal,89
...
-
Performed an Exploratory Data Analysis (EDA) step to:
-
Check for missing values in critical columns (Date, HomeTeam, AwayTeam, FTHG, FTAG)
-
Remove rows with missing or invalid values
-
Strip extra spaces from team names
-
Ensure goals are integers and non-negative Backtest/forecast fixture files:
-
-
Columns:
HomeTeam,AwayTeam(no dates required for simulation).
Name consistency: Verify that team names appear same in all training, fixture, and point files. Results will suffer from simple mismatches ("Man City" vs. "Manchester City"). To lessen these discrepancies, we made a few consistent manual adjustments to our CSV file.
- Open
Final_Project.ipynbin Notebook/Jupyter Lab or VS Code. - Execute all cells in order:
- Load & clean data
- Compute Elo (with decay)
- Fit Bivariate Poisson model
- Backtest using fixtures & actual points (prints MAE)
- Simulate future seasons
- View plots inline
The notebook saves or loads from the repository root; run it from the project folder.
-
Data engineering
- Parse dates, cast goals to
int, sort chronologically. - Perform an EDA check for missing values and remove any invalid rows.
- Strip extra spaces from team names.
- Collect all unique teams from training + fixture files to ensure complete parameter vectors.
- Parse dates, cast goals to
-
Elo (with decay)
- Baseline 1500 per team; update per match with
- K-factor grid (e.g., 20/30/40), decay factor (e.g., 0.995).
- Expected home result
$E_h = \frac{1}{1+10^{(R_a - R_h)/400}}$ ; update both teams with a decayed step.
- Baseline 1500 per team; update per match with
-
Model fit (Bivariate Poisson)
- For match
$(h,a)$ :$$\lambda_1 = \exp(\alpha_h - \beta_a + \eta_h)\cdot\left(\frac{\mathrm{Elo}_h}{\mathrm{Elo}_a}\right)^{\gamma}$$ - Here, Ξ·_h denotes a team-specific home-advantage term (one per home team).
$$\lambda_2 = \exp(\alpha_a - \beta_h)\cdot\left(\frac{\mathrm{Elo}_h}{\mathrm{Elo}_a}\right)^{-\gamma}$$ -
$$\lambda_3 = \exp(\theta)$$ shared component
- Penalized log-likelihood with ridge on
$\alpha,\beta$ , fitted via BFGS with zero-mean constraints on$\alpha$ and$\beta$ .
- For match
-
Hyper-parameter search
- Small grid for speed:
$K \in {20,30,40}$ ,$\gamma \in {0.04,0.06}$ ,$\lambda_{\text{ridge}} = 0.02$ . - Backtests on two seasons; objective = average MAE between simulated mean points and actual points.
- Small grid for speed:
-
Monte Carlo
- Sample
$k \sim \mathrm{Poisson}(\lambda_3)$ ,$x \sim \mathrm{Poisson}(\lambda_1)$ ,$y \sim \mathrm{Poisson}(\lambda_2)$ ; score$=(x+k,; y+k)$ . - Simulate each fixture list
$N$ times; aggregate to points tables and rank distributions.
- Sample
-
Reproducibility
- Simulation RNGs seeded:
_rng = np.random.default_rng(1)for match sims; a separatedefault_rng(0)for tie-break jitter in ranking.
- Simulation RNGs seeded:
-
The median simulated points for the current league cohort are shown in the Predicted table (printed).
-
Each team's distribution of points is shown in a horizontal boxplot.
-
Heatmap of finish-position probability
-
The best teams are at the top; positions 1β¦N leftβright; darker = higher likelihood.
-
The bars showing the outcome probabilities are
P(Title),P(Top-4), andP(Relegation).
Edit the first Config cell in Final_Project.ipynb (example below mirrors the notebook variables):
from pathlib import Path
ROOT = Path(__file__).resolve().parent if "__file__" in globals() else Path.cwd().resolve()
DATA_DIR = ROOT / "data"
POINTS_DIR = ROOT / "points"
training_csvs = [DATA_DIR / "E0_19_20.csv", DATA_DIR / "E0_20_21.csv", DATA_DIR / "E0_21_22.csv"]
backtest_fixtures_csvs = [DATA_DIR / "epl_22_23_fixtures.csv", DATA_DIR / "epl_23_24_fixtures.csv"]
backtest_actual_pts_csvs = [POINTS_DIR / "epl_2022_23_points.csv", POINTS_DIR / "epl_2023_24_points.csv"]
future_fixtures_csvs = [DATA_DIR / "epl_24_25_fixtures.csv", DATA_DIR / "epl_25_26_fixtures.csv"]
# Monte Carlo simulations per season
N_SIMS = 500Model / search defaults (inside the notebook):
- Elo decay:
0.995(effective step β(1 - decay) * K) - Search grids (for speed; expand for final tuning):
K β {20, 30, 40}Ξ³ β {0.04, 0.06}Ξ»_ridge = 0.02
Tips
- Runtime scales roughly linearly with
N_SIMS. - Keep
N_SIMSsmall while iterating; bump for final figures.
-
Seaborn theming
Use:import seaborn as sns sns.set_theme(style="whitegrid", rc={"figure.dpi": 120})
-
Pandas
observedwarning
Forpivot_table, passobserved=False(already set), or switch to.pivot()if categories are fixed. -
Plots not showing
In some environments, add%matplotlib inlineat the top of a notebook. -
Paths / working directory
The notebook uses project-relative paths viapathlib.Path. Run from the repo root (cd your-repo) or adjustROOT. -
Team name mismatches
Make sure that training, fixtures, and points files use identical team names as strings.
- DixonβColes time decay in the likelihood (down-weight older matches).
- More covariates: injuries, transfers, schedule congestion (rest days).
- Bayesian / hierarchical variants for team effects and ( \lambda_3 ).
- Calibration checks (50/80/95% interval coverage) & reliability plots.
- Expanded hyper-parameter search and cross-league support.
- Dixon & Coles (1997). Modelling Association Football Scores and Inefficiencies in the Football Betting Market. JRSS C 46(2): 265β280. DOI: 10.2307/2986290.
- Related reading: https://royalsocietypublishing.org/doi/10.1098/rsos.210617
- Football-Data.co.uk β England (EPL) match results & odds
- Fixture Download β Premier League results/fixtures 2024β25
Accessed: August 12, 2025. Check each siteβs terms before redistribution.
PRs welcome β loaders, metrics (rank corr, Brier), visual polish, or tuning refactors.
Steps
- Fork.
- Create a branch.
- Commit changes with examples.
- Open a Pull Request.
Released under the MIT License. See LICENSE.
Authors: Anusha Sarla & Sanmesh Shintre
Emails: anusha.sarla@ucdconnect.ie Β· sanmesh.shintre@ucdconnect.ie



