The Archive of Mass ENvironmental Data (AMEND) is a project to assemble and analyze data related to environmental regulation, focused on water policy in Massachusetts.
The website for the project is openamend.org.
This git repository contains code for data acquisition (see get_data/), analysis (see analysis/), and the jekyll site (see docs/).
Data is refreshed automatically every Monday at 6am UTC via two GitHub Actions workflows:
- Update Data: Fetches all active data sources, validates row counts and schema, assembles the SQLite database, commits updated CSVs, and regenerates the AI Analysis semantic context. If any step fails, a GitHub Issue is opened automatically.
- Update Charts: Runs after a successful data update to regenerate Chart.js visualizations. The PySTAN-based CSO regression analysis (
NECIR_CSO_map.py) is excluded from CI and must be run locally.
If a workflow fails, a GitHub Issue is opened with a link to the failed run.
To run a full update locally:
bash update_all.shThis script will not update ECOS budget records or the SSA wage table, which require manual data entry.
Large files (SQLite database, full drinking water CSV, permit PDFs) are stored on Google Cloud Storage.
The site is hosted via GitHub Pages from the docs/ directory.
To run locally (use --host localhost so sidebar links resolve correctly in the browser):
conda env create -f amend_jekyll_env.yml
conda activate amend_jekyll
cd docs
bundle exec jekyll serve --host localhost --port 4000 --baseurl ""For faster rebuilds while editing, add the --incremental flag to rebuild only the files that have changed:
bundle exec jekyll serve --host localhost --port 4000 --baseurl "" --incrementalFor running data fetches and most chart scripts (no PySTAN/geopandas):
pip install -r requirements-ci.txtFor all scripts including PySTAN CSO regression analysis:
conda env create -f amend_python_env.yml
conda activate amend_pythonThe AI Analysis page lets users ask natural-language questions about the AMEND database. An AI model generates SQL, executes it client-side via sql.js, and renders interactive charts with Plotly — no data leaves your browser.
The AI Analysis page lets users ask natural-language questions about the database. The LLM generates SQL, executes it client-side via sql.js, and renders results with Plotly.
The LLM is given a rich schema description — docs/assets/db_semantic_context.txt — instead of bare CREATE TABLE statements. This file includes:
- Table descriptions and row counts
- 5 sample rows per table (showing actual value formats, e.g. ALL-CAPS town names)
- Per-column notes (typos, date formats, join keys)
- Cross-table join relationships
The semantic context must be regenerated whenever data sources change (new tables, renamed columns, schema changes). It is regenerated automatically by assemble_db.py on each weekly data update. To regenerate manually:
cd get_data
conda run -n amend_python python generate_semantic_context.pyWhen adding or changing a data source:
- Update
TABLE_DESCRIPTIONSandCOLUMN_NOTESinget_data/generate_semantic_context.py - Run
generate_semantic_context.pyto regeneratedocs/assets/db_semantic_context.txt - Commit both files
To create an automated screen-capture demo of the Ask AMEND AI feature:
# First time: create the Playwright environment
conda env create -f environment-playwright.yml
# Record a demo with Groq (free API key)
GROQ_API_KEY=sk_... conda run -n amend_playwright python record_ai_demo.py
# Or with OpenAI
OPENAI_API_KEY=sk_... conda run -n amend_playwright python record_ai_demo.py
# Or with Google Gemini
GOOGLE_API_KEY=... conda run -n amend_playwright python record_ai_demo.py
# Specify starter question (0-8)
GROQ_API_KEY=sk_... conda run -n amend_playwright python record_ai_demo.py --question 2
# Custom output path
GROQ_API_KEY=sk_... conda run -n amend_playwright python record_ai_demo.py --output my-demo.webmPrerequisites:
- Jekyll server running:
cd docs && bundle exec jekyll serve --host localhost --port 4000 --baseurl "" - API key from Groq (free), OpenAI, or Google Gemini
The script records a ~30–45 second demo showing a user asking a natural-language question, the generated SQL, and an interactive chart rendering. The API key is only held in memory during the session and is not persisted to disk. Output is saved as WebM video.
The Ask AI page stores API keys in browser localStorage (plain text, client-side). This is not cryptographically secured but keeps keys off your servers. On shared machines, any JavaScript on the page or browser extensions could access the key.
Recommendations:
- Use restricted/temporary API keys if your provider supports them (rate limits, expiration, scope limitations)
- On shared machines, clear localStorage after use: open browser console (
F12), runlocalStorage.clear(), then refresh - Or use private/incognito browsing to isolate keys from your regular browsing history
All major providers (Groq, OpenAI, Google Gemini) allow creating limited-scope API keys.