Skip to content

Fix swapped data files, regenerate data, and add frictionless validate to CI#12

Open
olayway wants to merge 1 commit into
mainfrom
regenerate-data
Open

Fix swapped data files, regenerate data, and add frictionless validate to CI#12
olayway wants to merge 1 commit into
mainfrom
regenerate-data

Conversation

@olayway
Copy link
Copy Markdown
Contributor

@olayway olayway commented May 22, 2026

Summary

Bug fixed: data files were swapped

scripts/process.py defines two parallel lists:

SOURCES = [
    'https://api.worldbank.org/v2/en/indicator/NY.GDP.DEFL.KD.ZG?downloadformat=csv',  # GDP deflator
    'https://api.worldbank.org/v2/en/indicator/FP.CPI.TOTL.ZG?downloadformat=csv',      # CPI
]
FILE_NAMES = ['inflation-consumer.csv', 'inflation-gdp.csv']  # ← swapped!

SOURCES[0] is the GDP deflator indicator but FILE_NAMES[0] is inflation-consumer.csv. As a result:

  • data/inflation-consumer.csv contained GDP deflator values
  • data/inflation-gdp.csv contained consumer price (CPI) values

This was confirmed by cross-checking the archive CSVs against the data files: Aruba's first value in archive/NY.GDP.DEFL.KD.ZG.csv (GDP deflator, 1987 = 3.591…) matched data/inflation-consumer.csv, not data/inflation-gdp.csv.

Fix

Swapped FILE_NAMES to ['inflation-gdp.csv', 'inflation-consumer.csv']. Regenerated both data files. frictionless validate passes on the regenerated output.

CI addition

Added frictionless validate datapackage.json step to .github/workflows/actions.yml (runs after make data). This will fail the workflow if the descriptor and data files drift out of sync in the future.

Bug: SOURCES and FILE_NAMES in scripts/process.py were ordered such that
NY.GDP.DEFL.KD.ZG (GDP deflator) was written to inflation-consumer.csv
and FP.CPI.TOTL.ZG (CPI) was written to inflation-gdp.csv — the two
output file names were swapped.

Fix: swap FILE_NAMES order so index 0 (NY.GDP.DEFL.KD.ZG) writes to
inflation-gdp.csv and index 1 (FP.CPI.TOTL.ZG) writes to
inflation-consumer.csv.

Data regenerated: inflation-gdp.csv now contains GDP deflator values
and inflation-consumer.csv now contains CPI values, consistent with
their names and the resource descriptions in datapackage.json.

CI: added 'frictionless validate datapackage.json' step to actions.yml
to catch any future descriptor/data drift before it is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant