CUSTOM_GEOGRAPHY.md (Original)
Custom Geography Zone Systems — Architecture Plan
Overview
The app supports two independent geography choices:
- Subject area — the zone whose residents or workers are being analyzed (currently: a named city or county)
- Display geography — how destination flows are spatially aggregated and rendered (currently: city or county boundaries)
This plan extends both to support user-defined options: a drawn polygon for subject area, and a user-uploaded GIS file for display geography.
Dependencies: DuckDB spatial extension (for all spatial data I/O and PIP operations) + Turf.js (centroid computation only, for arc positioning). No other spatial or format libraries. If the DuckDB spatial extension proof-of-concept does not pass, this feature is not built.
Validation Gate (Do This First)
Before any implementation, run a proof-of-concept in the current app's DuckDB WASM environment to confirm the spatial extension works end-to-end. All three steps must pass:
Step 1 — Extension loads in the EH bundle on GitHub Pages:
await conn.query("INSTALL spatial; LOAD spatial;");
Step 2 — ST_Read reads a browser-uploaded file via registerFileBuffer:
const buf = await uploadedFile.arrayBuffer();
await db.registerFileBuffer('test.gpkg', new Uint8Array(buf));
const result = await conn.query("SELECT * FROM ST_Read('test.gpkg') LIMIT 5;");
// Must return rows with a geometry column
Step 3 — Spatial predicate works on the loaded geometry:
await conn.query(`
SELECT zone_name, ST_Within(ST_Point(-111.89, 40.60), geom) AS inside
FROM ST_Read('test.gpkg')
LIMIT 10;
`);
Test with a real GeoPackage and a real Shapefile (zipped). If any step fails, this feature is not pursued.
The Full Interaction Matrix
|
Display: City / County |
Display: Custom upload |
| Subject: Named city or county |
① Current behavior |
② New |
| Subject: Drawn polygon |
③ New |
④ New |
Case ① is untouched — uses the existing pre-processed city_flows.parquet pipeline and stays on the fast path.
Cases ②③④ all require block-level OD data and the DuckDB spatial extension. The trigger is: any time the user either draws a polygon subject or uploads custom display zones, the app switches to block mode.
Data Files Required
Always committed to the repo
| File |
Size |
Description |
data/block_centroids.parquet |
~500 KB |
One row per WFRC census block: geocode, lat, lon. Year-invariant (Census 2020 blocks). Loaded eagerly on app start. |
data/city_boundaries.geojson |
256 KB |
Existing city polygons — also used as built-in display zone file in block mode |
data/county_boundaries.geojson |
53 KB |
Existing county polygons — same |
data/lehd/{year}/city_flows.parquet |
~284 KB/year |
Fast-path for case ① only |
Loaded on demand (block mode)
| File |
Size |
Description |
data/lehd/{year}/block_od.parquet |
~18–22 MB/year |
Block-level OD pairs for WFRC region: h_geocode, w_geocode, S000, SA01–03, SE01–03, SI01–03, d0_10, d10_25, d25_50, d50p |
Spatial Stack
| Operation |
Tool |
Reason |
| Read uploaded GIS file (any format) |
DuckDB ST_Read via spatial extension |
GDAL-backed, handles GeoPackage, Shapefile, GeoJSON, FlatGeobuf, KML, and more without any additional JS libraries |
| Point-in-polygon (blocks → zones) |
DuckDB ST_Within |
Runs inside DuckDB alongside the aggregation query — no JS ↔ DuckDB data transfer of 44K block points |
| Arc origin (drawn polygon centroid) |
turf.centroid(polygon) |
Synchronous, lightweight, no DuckDB round-trip needed for a single geometry |
| Arc endpoints (zone centroids) |
turf.centroid(zoneFeature) |
Same — read from already-loaded GeoJSON features in JS |
No other spatial or format libraries are used. If a user uploads a format that GDAL cannot read (confirmed via ST_Drivers()), the app shows a clear error message listing supported formats.
Architecture: How All Four Cases Work
Unified block-mode pipeline (cases ②③④)
All three new cases use the same DuckDB query. The only difference is the source of subject_geom and the display zone file.
-- Load spatial extension (once per session)
INSTALL spatial; LOAD spatial;
-- Load block OD (once per year selection in block mode)
CREATE OR REPLACE VIEW block_od AS
SELECT * FROM read_parquet('block_od_2023.parquet');
-- Load block centroids (always available)
CREATE OR REPLACE VIEW block_centroids AS
SELECT * FROM read_parquet('block_centroids.parquet');
-- Load display zones from uploaded file (or from built-in boundary file)
CREATE OR REPLACE TABLE display_zones AS
SELECT zone_name, geom FROM ST_Read('uploaded.gpkg');
-- or: SELECT name AS zone_name, geom FROM ST_Read('city_boundaries.geojson')
-- Load subject area geometry (drawn polygon as GeoJSON, or named place boundary)
CREATE OR REPLACE TABLE subject_area AS
SELECT geom FROM ST_Read('subject.geojson'); -- single-feature GeoJSON
-- Full query: PIP + aggregation in one shot
SELECT
dz.zone_name AS dest_name,
SUM(od.S000) AS S000,
SUM(od.SA01) AS SA01, SUM(od.SA02) AS SA02, SUM(od.SA03) AS SA03,
SUM(od.SE01) AS SE01, SUM(od.SE02) AS SE02, SUM(od.SE03) AS SE03,
SUM(od.SI01) AS SI01, SUM(od.SI02) AS SI02, SUM(od.SI03) AS SI03,
SUM(od.d0_10) AS d0_10, SUM(od.d10_25) AS d10_25,
SUM(od.d25_50) AS d25_50,SUM(od.d50p) AS d50p
FROM block_od od
JOIN block_centroids bc_h ON bc_h.geocode = od.h_geocode
JOIN block_centroids bc_w ON bc_w.geocode = od.w_geocode
JOIN display_zones dz ON ST_Within(ST_Point(bc_w.lon, bc_w.lat), dz.geom)
JOIN subject_area sa ON ST_Within(ST_Point(bc_h.lon, bc_h.lat), sa.geom)
GROUP BY dz.zone_name
ORDER BY S000 DESC;
Case-by-case breakdown
Case ①: Named city/county subject + city/county display
- Uses existing
city_flows.parquet and current SQL query logic — unchanged
- Spatial extension not used, block data not loaded
Case ②: Named city/county subject + custom upload display
subject_area: single-feature GeoJSON registered as buffer from city_boundaries.geojson for the selected named place
display_zones: user-uploaded file registered via registerFileBuffer, read by ST_Read
- Triggers block mode load
Case ③: Drawn polygon subject + city/county display
subject_area: user-drawn polygon serialized to GeoJSON, registered as buffer
display_zones: built-in city_boundaries.geojson or county_boundaries.geojson, registered as buffer
- City/county boundaries are treated identically to user-uploaded zones — same
ST_Read mechanism
Case ④: Drawn polygon subject + custom upload display
subject_area: user-drawn polygon as GeoJSON buffer
display_zones: user-uploaded file registered via registerFileBuffer
- Both sides resolved through
ST_Read
Key insight: City and county boundaries are just the built-in default zone files. They go through the same ST_Read pipeline as user-uploaded files. The display geography mechanism is fully unified.
Flow Arc Origin and Destination
- Subject area (arc origin):
turf.centroid(subjectPolygon) — centroid of the drawn polygon or of the selected named-place boundary feature. Computed once in JS when the subject changes.
- Display zones (arc destinations):
turf.centroid(zoneFeature) per zone — computed from the uploaded or built-in GeoJSON features already in JS memory.
Case ① continues to use pre-computed centroids from city_meta.json / county_meta.json as today.
Block Mode UX
The app starts in fast mode (case ①). Block mode activates when the user:
- Draws a polygon on the map, or
- Uploads a custom zone file
On first activation for a given year, the app loads block_od_{year}.parquet (~18–22 MB) with a visible progress indicator. Subsequent queries within that year are instant.
Year scrubber in block mode: Switching years triggers a new block OD load. Two options (decide during implementation):
- Lock the year selector when block mode is active (simplest)
- Allow year switching with a reload indicator
Exiting block mode: Clear the drawn polygon and dismiss the uploaded zone file → reverts to case ① fast path, block OD unloaded from DuckDB.
ACS charts: ACS transport/travel-time panels use Census FIPS lookups — unavailable for custom display zones. Hide those panels when custom zones are active.
Data Pipeline Changes (scripts/process_data.py)
The existing city/county aggregation pipeline is unchanged. New additions only:
export_block_od(od_wfrc, year_dir)
Writes block_od_{year}.parquet from the WFRC-filtered block-level OD dataframe (already computed mid-pipeline before city/county aggregation — currently discarded). Columns: h_geocode, w_geocode, S000, SA01–03, SE01–03, SI01–03, d0_10, d10_25, d25_50, d50p.
export_block_centroids(xwalk, output_path)
Writes data/block_centroids.parquet once from the LEHD crosswalk (h_blk_lat/lon already present). Year-invariant; only needs to run once.
manifest.json
Add a block_od_years key listing which years have block OD committed (e.g., recent years only to manage repo size).
No changes to custom_places.py, fetch_acs.py, or any existing aggregation functions.
Frontend Changes
New file: src/draw.js (~80 lines)
- Integrates
@mapbox/mapbox-gl-draw for polygon drawing on the map
- Exposes
getDrawnPolygon() → current GeoJSON feature or null
- Serializes drawn polygon to single-feature GeoJSON for
registerFileBuffer
- Fires callback on polygon change → triggers block mode activation
New file: src/block_query.js (~150 lines)
initBlockMode(year) — loads block_od_{year}.parquet and block_centroids.parquet into DuckDB; installs and loads spatial extension
loadSubjectArea(geoJSON) — registers single-feature GeoJSON buffer as subject_area table
loadDisplayZones(fileBuffer, filename) — registers uploaded file buffer, runs CREATE TABLE display_zones AS SELECT * FROM ST_Read(filename)
loadBuiltinZones(geoJSON) — same as above but from pre-committed boundary files
queryBlockFlows(direction) — runs the unified PIP + aggregation query
getZoneCentroids() — returns {zone_name: [lon, lat]} via turf.centroid on loaded zone features for arc rendering
Modified: src/db.js
- Add
installSpatialExtension() — INSTALL spatial; LOAD spatial; (called once on block mode entry)
- Add
loadBlockOD(year) — registers and creates view for block_od_{year}.parquet
- Existing query functions (
queryFlows, querySelfFlow, etc.) unchanged
Modified: src/map.js
- Add
addUploadedZoneLayer(geoJSON) / removeUploadedZoneLayer() — dynamic GeoJSON source for custom display zones
- Arc layer: accept computed
[lon, lat] origin and per-zone destination centroids in block mode
Modified: src/sidebar.js
- Add file upload control in the Map Zones section (alongside City/County toggle)
- Add drawn polygon status indicator ("Custom area — 847 blocks inside drawn polygon")
- Add custom zone status indicator ("Custom zones active — 47 zones loaded")
- Hide ACS chart panels when custom display zones are active
- Show supported format hint: "GeoPackage, Shapefile, GeoJSON, FlatGeobuf"
Modified: src/main.js
- State additions:
state.drawnPolygon (GeoJSON feature or null), state.uploadedZones (File or null), state.blockMode (boolean)
refreshVisualization() branches: state.blockMode → block_query.js pipeline; otherwise → existing db.js path
- Year scrubber: disable or prompt reload when
state.blockMode is true
Summary: What Changes, What Doesn't
| Component |
Status |
city_flows.parquet pipeline |
Unchanged |
city_meta.json, county_meta.json |
Unchanged |
city_boundaries.geojson, county_boundaries.geojson |
Unchanged (reused as built-in zone source files) |
db.js existing query functions |
Unchanged |
map.js existing city/county layers |
Unchanged |
sidebar.js city/county toggle |
Unchanged |
custom_places.py, fetch_acs.py |
Unchanged |
process_data.py |
Add export_block_od() and export_block_centroids() only |
src/draw.js |
New — MapLibre GL Draw integration |
src/block_query.js |
New — block mode pipeline (spatial extension + unified query) |
src/db.js |
Add installSpatialExtension() and loadBlockOD() |
src/map.js |
Add uploaded zone layer management |
src/sidebar.js |
Add upload control and draw indicator |
src/main.js |
Add block mode state and branch in refreshVisualization() |
Effort Estimate (Contingent on POC Passing)
| Task |
Days |
| Proof-of-concept validation (spatial extension + ST_Read + registerFileBuffer) |
0.5 |
process_data.py: export block OD + centroids |
0.5 |
src/draw.js: MapLibre GL Draw integration |
1.0 |
src/block_query.js: block mode pipeline + unified SQL query |
2.0 |
src/db.js: spatial extension init + block OD loading |
0.5 |
src/map.js: uploaded zone layer + arc centroid updates |
1.0 |
src/sidebar.js + src/main.js: state, UI controls, mode switching |
1.5 |
| Testing all 4 matrix cases end-to-end with real GIS files |
1.0 |
| Total |
~8 days |
The 0.5-day POC is the decision gate. No other work begins until it passes.
CUSTOM_GEOGRAPHY.md (Original)
Custom Geography Zone Systems — Architecture Plan
Overview
The app supports two independent geography choices:
This plan extends both to support user-defined options: a drawn polygon for subject area, and a user-uploaded GIS file for display geography.
Dependencies: DuckDB spatial extension (for all spatial data I/O and PIP operations) + Turf.js (centroid computation only, for arc positioning). No other spatial or format libraries. If the DuckDB spatial extension proof-of-concept does not pass, this feature is not built.
Validation Gate (Do This First)
Before any implementation, run a proof-of-concept in the current app's DuckDB WASM environment to confirm the spatial extension works end-to-end. All three steps must pass:
Step 1 — Extension loads in the EH bundle on GitHub Pages:
Step 2 — ST_Read reads a browser-uploaded file via registerFileBuffer:
Step 3 — Spatial predicate works on the loaded geometry:
Test with a real GeoPackage and a real Shapefile (zipped). If any step fails, this feature is not pursued.
The Full Interaction Matrix
Case ① is untouched — uses the existing pre-processed
city_flows.parquetpipeline and stays on the fast path.Cases ②③④ all require block-level OD data and the DuckDB spatial extension. The trigger is: any time the user either draws a polygon subject or uploads custom display zones, the app switches to block mode.
Data Files Required
Always committed to the repo
data/block_centroids.parquetgeocode, lat, lon. Year-invariant (Census 2020 blocks). Loaded eagerly on app start.data/city_boundaries.geojsondata/county_boundaries.geojsondata/lehd/{year}/city_flows.parquetLoaded on demand (block mode)
data/lehd/{year}/block_od.parqueth_geocode, w_geocode, S000, SA01–03, SE01–03, SI01–03, d0_10, d10_25, d25_50, d50pSpatial Stack
ST_Readvia spatial extensionST_Withinturf.centroid(polygon)turf.centroid(zoneFeature)No other spatial or format libraries are used. If a user uploads a format that GDAL cannot read (confirmed via
ST_Drivers()), the app shows a clear error message listing supported formats.Architecture: How All Four Cases Work
Unified block-mode pipeline (cases ②③④)
All three new cases use the same DuckDB query. The only difference is the source of
subject_geomand the display zone file.Case-by-case breakdown
Case ①: Named city/county subject + city/county display
city_flows.parquetand current SQL query logic — unchangedCase ②: Named city/county subject + custom upload display
subject_area: single-feature GeoJSON registered as buffer fromcity_boundaries.geojsonfor the selected named placedisplay_zones: user-uploaded file registered viaregisterFileBuffer, read byST_ReadCase ③: Drawn polygon subject + city/county display
subject_area: user-drawn polygon serialized to GeoJSON, registered as bufferdisplay_zones: built-incity_boundaries.geojsonorcounty_boundaries.geojson, registered as bufferST_ReadmechanismCase ④: Drawn polygon subject + custom upload display
subject_area: user-drawn polygon as GeoJSON bufferdisplay_zones: user-uploaded file registered viaregisterFileBufferST_ReadKey insight: City and county boundaries are just the built-in default zone files. They go through the same
ST_Readpipeline as user-uploaded files. The display geography mechanism is fully unified.Flow Arc Origin and Destination
turf.centroid(subjectPolygon)— centroid of the drawn polygon or of the selected named-place boundary feature. Computed once in JS when the subject changes.turf.centroid(zoneFeature)per zone — computed from the uploaded or built-in GeoJSON features already in JS memory.Case ① continues to use pre-computed centroids from
city_meta.json/county_meta.jsonas today.Block Mode UX
The app starts in fast mode (case ①). Block mode activates when the user:
On first activation for a given year, the app loads
block_od_{year}.parquet(~18–22 MB) with a visible progress indicator. Subsequent queries within that year are instant.Year scrubber in block mode: Switching years triggers a new block OD load. Two options (decide during implementation):
Exiting block mode: Clear the drawn polygon and dismiss the uploaded zone file → reverts to case ① fast path, block OD unloaded from DuckDB.
ACS charts: ACS transport/travel-time panels use Census FIPS lookups — unavailable for custom display zones. Hide those panels when custom zones are active.
Data Pipeline Changes (
scripts/process_data.py)The existing city/county aggregation pipeline is unchanged. New additions only:
export_block_od(od_wfrc, year_dir)Writes
block_od_{year}.parquetfrom the WFRC-filtered block-level OD dataframe (already computed mid-pipeline before city/county aggregation — currently discarded). Columns:h_geocode, w_geocode, S000, SA01–03, SE01–03, SI01–03, d0_10, d10_25, d25_50, d50p.export_block_centroids(xwalk, output_path)Writes
data/block_centroids.parquetonce from the LEHD crosswalk (h_blk_lat/lonalready present). Year-invariant; only needs to run once.manifest.jsonAdd a
block_od_yearskey listing which years have block OD committed (e.g., recent years only to manage repo size).No changes to
custom_places.py,fetch_acs.py, or any existing aggregation functions.Frontend Changes
New file:
src/draw.js(~80 lines)@mapbox/mapbox-gl-drawfor polygon drawing on the mapgetDrawnPolygon()→ current GeoJSON feature or nullregisterFileBufferNew file:
src/block_query.js(~150 lines)initBlockMode(year)— loadsblock_od_{year}.parquetandblock_centroids.parquetinto DuckDB; installs and loads spatial extensionloadSubjectArea(geoJSON)— registers single-feature GeoJSON buffer assubject_areatableloadDisplayZones(fileBuffer, filename)— registers uploaded file buffer, runsCREATE TABLE display_zones AS SELECT * FROM ST_Read(filename)loadBuiltinZones(geoJSON)— same as above but from pre-committed boundary filesqueryBlockFlows(direction)— runs the unified PIP + aggregation querygetZoneCentroids()— returns{zone_name: [lon, lat]}viaturf.centroidon loaded zone features for arc renderingModified:
src/db.jsinstallSpatialExtension()—INSTALL spatial; LOAD spatial;(called once on block mode entry)loadBlockOD(year)— registers and creates view forblock_od_{year}.parquetqueryFlows,querySelfFlow, etc.) unchangedModified:
src/map.jsaddUploadedZoneLayer(geoJSON)/removeUploadedZoneLayer()— dynamic GeoJSON source for custom display zones[lon, lat]origin and per-zone destination centroids in block modeModified:
src/sidebar.jsModified:
src/main.jsstate.drawnPolygon(GeoJSON feature or null),state.uploadedZones(File or null),state.blockMode(boolean)refreshVisualization()branches:state.blockMode→block_query.jspipeline; otherwise → existingdb.jspathstate.blockModeis trueSummary: What Changes, What Doesn't
city_flows.parquetpipelinecity_meta.json,county_meta.jsoncity_boundaries.geojson,county_boundaries.geojsondb.jsexisting query functionsmap.jsexisting city/county layerssidebar.jscity/county togglecustom_places.py,fetch_acs.pyprocess_data.pyexport_block_od()andexport_block_centroids()onlysrc/draw.jssrc/block_query.jssrc/db.jsinstallSpatialExtension()andloadBlockOD()src/map.jssrc/sidebar.jssrc/main.jsrefreshVisualization()Effort Estimate (Contingent on POC Passing)
process_data.py: export block OD + centroidssrc/draw.js: MapLibre GL Draw integrationsrc/block_query.js: block mode pipeline + unified SQL querysrc/db.js: spatial extension init + block OD loadingsrc/map.js: uploaded zone layer + arc centroid updatessrc/sidebar.js+src/main.js: state, UI controls, mode switchingThe 0.5-day POC is the decision gate. No other work begins until it passes.