This repository contains a small, modular Python project for processing annual CPI (consumer price index) data exported from the Hungarian Central Statistical Office (KSH). The code reads a KSH CSV, computes annual inflation as CPI_index - 100, produces diagnostic plots, and fits a simple linear regression of inflation on year.
main.py— the top-level orchestrator: parses CLI args, loads data, prepares it, runs regression, saves outputs and produces plots.inflacio/— package with small modules:io.py,prep.py,analysis.py,plots.py.stadat-ara0001-1.1.1.1-hu.csv— example KSH CSV (included in repository).requirements.txt— Python dependencies.README.md— original README (Hungarian).README_en.md— this English README.output/— example output files produced bymain.py(cleaned CSV, regression metrics, plots).
The project depends on these Python packages (also listed in requirements.txt):
- pandas
- numpy
- matplotlib
- scikit-learn
- statsmodels
Install dependencies (PowerShell):
python -m pip install -r .\requirements.txtIf python is not available on your PATH, use your system Python executable instead.
Run the analysis from the project root:
python .\main.py --input stadat-ara0001-1.1.1.1-hu.csv --outdir outputOptions:
--input, -i: path to the input CSV (optional — the script also tries a couple of default filenames).--outdir, -o: output directory (default:output).--no-show: do not display plots interactively (useful for headless runs).--debug: enable debug logging.
What the script does:
- Loads the KSH CSV (tries several encodings; expects
;separator and,decimal mark). - Detects the CPI column and the year column, converts types and drops missing rows.
- Computes inflation percentage as:
Inflacio_pct = CPI_index - 100. - Saves cleaned data to
output/cleaned_data.csv. - Fits a linear regression of inflation(%) on centered year (sklearn + statsmodels), computes R² and RMSE.
- Saves OLS summary to
output/regression_summary.txtand a compact JSON metrics file tooutput/regression_metrics.json. - Produces and saves plots to
output/plots/(line plot, scatter + regression, residuals, residuals histogram).
output/cleaned_data.csv— prepared dataset used in the analysis.output/regression_summary.txt— full statsmodels OLS summary.output/regression_metrics.json— JSON with slope, intercept, R², RMSE and slope p-value.output/plots/*.png— generated plot images.
CPI_indexis the annual index (100 = previous year). For example,104.9indicates ~4.9% increase.Inflation (%) = CPI_index - 100.- Linear regression fits a trend line:
Inflation(%) = slope * Year + intercept.R²measures how much variance the linear model explains.RMSEis the average error in percentage points.- slope
p-valueindicates whether the trend is statistically significant.
- Inspect residual plots: if residuals show patterns, a simple linear trend may be insufficient.
- Allow explicit CLI selection of the year and CPI columns if automatic detection fails.
- Add unit tests for
load_ksh_csv,detect_cpi_column,prepare_dataframe, andrun_regression. - Add confidence intervals or prediction bands to the regression plot.
- Add time series models (ARIMA/Holt-Winters) for forecasting.
- If the script cannot find the CSV, pass it with
--inputor place it in the project root. - If plotting fails on headless servers, use
--no-showand check theoutput/plots/folder for saved PNGs. - If dependencies are missing, re-run
pip install -r requirements.txt.