Skip to content

This repository contains a small, modular Python project for processing annual CPI (consumer price index) data exported from the Hungarian Central Statistical Office (KSH). The code reads a KSH CSV, computes annual inflation as CPI_index - 100, produces diagnostic plots, and fits a simple linear regression of inflation on year.

Notifications You must be signed in to change notification settings

Magnetika/ksh_analyze_python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KSH Inflation Analysis

This repository contains a small, modular Python project for processing annual CPI (consumer price index) data exported from the Hungarian Central Statistical Office (KSH). The code reads a KSH CSV, computes annual inflation as CPI_index - 100, produces diagnostic plots, and fits a simple linear regression of inflation on year.

Contents

  • main.py — the top-level orchestrator: parses CLI args, loads data, prepares it, runs regression, saves outputs and produces plots.
  • inflacio/ — package with small modules: io.py, prep.py, analysis.py, plots.py.
  • stadat-ara0001-1.1.1.1-hu.csv — example KSH CSV (included in repository).
  • requirements.txt — Python dependencies.
  • README.md — original README (Hungarian).
  • README_en.md — this English README.
  • output/ — example output files produced by main.py (cleaned CSV, regression metrics, plots).

Requirements

The project depends on these Python packages (also listed in requirements.txt):

  • pandas
  • numpy
  • matplotlib
  • scikit-learn
  • statsmodels

Install dependencies (PowerShell):

python -m pip install -r .\requirements.txt

If python is not available on your PATH, use your system Python executable instead.

Usage

Run the analysis from the project root:

python .\main.py --input stadat-ara0001-1.1.1.1-hu.csv --outdir output

Options:

  • --input, -i : path to the input CSV (optional — the script also tries a couple of default filenames).
  • --outdir, -o : output directory (default: output).
  • --no-show : do not display plots interactively (useful for headless runs).
  • --debug : enable debug logging.

What the script does:

  • Loads the KSH CSV (tries several encodings; expects ; separator and , decimal mark).
  • Detects the CPI column and the year column, converts types and drops missing rows.
  • Computes inflation percentage as: Inflacio_pct = CPI_index - 100.
  • Saves cleaned data to output/cleaned_data.csv.
  • Fits a linear regression of inflation(%) on centered year (sklearn + statsmodels), computes R² and RMSE.
  • Saves OLS summary to output/regression_summary.txt and a compact JSON metrics file to output/regression_metrics.json.
  • Produces and saves plots to output/plots/ (line plot, scatter + regression, residuals, residuals histogram).

Output files

  • output/cleaned_data.csv — prepared dataset used in the analysis.
  • output/regression_summary.txt — full statsmodels OLS summary.
  • output/regression_metrics.json — JSON with slope, intercept, R², RMSE and slope p-value.
  • output/plots/*.png — generated plot images.

Interpretation (short)

  • CPI_index is the annual index (100 = previous year). For example, 104.9 indicates ~4.9% increase.
  • Inflation (%) = CPI_index - 100.
  • Linear regression fits a trend line: Inflation(%) = slope * Year + intercept.
    • measures how much variance the linear model explains.
    • RMSE is the average error in percentage points.
    • slope p-value indicates whether the trend is statistically significant.
  • Inspect residual plots: if residuals show patterns, a simple linear trend may be insufficient.

Development & Improvements

  • Allow explicit CLI selection of the year and CPI columns if automatic detection fails.
  • Add unit tests for load_ksh_csv, detect_cpi_column, prepare_dataframe, and run_regression.
  • Add confidence intervals or prediction bands to the regression plot.
  • Add time series models (ARIMA/Holt-Winters) for forecasting.

Troubleshooting

  • If the script cannot find the CSV, pass it with --input or place it in the project root.
  • If plotting fails on headless servers, use --no-show and check the output/plots/ folder for saved PNGs.
  • If dependencies are missing, re-run pip install -r requirements.txt.

About

This repository contains a small, modular Python project for processing annual CPI (consumer price index) data exported from the Hungarian Central Statistical Office (KSH). The code reads a KSH CSV, computes annual inflation as CPI_index - 100, produces diagnostic plots, and fits a simple linear regression of inflation on year.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages