Skip to content

Add Overture data source support#541

Open
migurski wants to merge 59 commits intoprotomaps:mainfrom
migurski:migurski/add-overture-basemap-source
Open

Add Overture data source support#541
migurski wants to merge 59 commits intoprotomaps:mainfrom
migurski:migurski/add-overture-basemap-source

Conversation

@migurski
Copy link
Collaborator

@migurski migurski commented Jan 3, 2026

Add basemap support for Overture Maps input data as alternative to OSM extracts. Port kind=/kind_detail=/min_zoom= mappings with no changes to MapLibre styles.

Layers

All Protomaps layers except for Boundaries and Transit have some coverage in this PR.

POIs

  • Overture theme=places basic_category mapped to Protomaps kind, with limited exceptions
  • No OSM-style polygon area grading is available for POI importance
  • Future work:
    • Can we use QRank to find high-priority POIs?
    • Try filtering by Overture confidence scores to eliminate bad POIs

Roads

  • Rendering includes theme=transportation type=segment subtype=road, subtype=rail, and subtype=water
  • Overture class and subclass mapped to Protomaps kind, kind_detail, and internal highway
  • Uses linear referencing from Overture array properties like road_flags to split linestrings on bridge, tunnel, and level flags to match rendering of OSM basemap
  • No features for airport runways are available

Places

  • Rendering includes high-zoom places like cities and neighborhoods from theme=places
  • Future work:
    • Include low-zoom places like states and countries

Buildings

  • Rendering includes both theme=buildings type=building and type=building_part to match Protomaps visual style

Landuse

  • Rendering includes theme=base type=land_use to match Protomaps visual style

Earth, Water, and Land Cover

  • Rendering uses theme=base type=land, type=water, type=land_cover to match Protomaps visual style
  • Future work:
    • Reduce excessive labels displayed for water bodies
    • Fix visual appearance of grainy land cover at low zooms

Testing

Extract Overture data with e.g. DuckDB:

COPY (
    SELECT *
    FROM read_parquet(
        's3://overturemaps-us-west-2/release/2025-12-17.0/**/*.parquet',
        hive_partitioning=1, filename=1, union_by_name=1
    )
    WHERE theme IN ('transportation', 'places', 'base', 'buildings', 'divisions')
      AND bbox.xmin <= -121
      AND bbox.xmax >= -123
      AND bbox.ymin <= 38
      AND bbox.ymax >= 37
) TO 'data/sources/bay-area.parquet' (FORMAT PARQUET);

Generate PMTiles from Overture data:

java -jar target/protomaps-basemap-HEAD-with-deps.jar \
    --overture=data/sources/bay-area.parquet --download --force

Run the app/ map frontend or one of the HTML examples to preview.

Screenshots

Taken from interactive preview at mike.teczno.com; compare with OSM data at maps.protomaps.com.

Screenshot 2026-01-03 at 3 24 25 PM

Screenshot 2026-01-03 at 3 24 48 PM

Screenshot 2026-01-03 at 3 25 06 PM

Screenshot 2026-01-03 at 3 25 25 PM

Screenshot 2026-01-03 at 3 25 56 PM

Screenshot 2026-01-03 at 3 26 18 PM

Flavor Compatibility

The changes in this PR map Overture properties to existing Protomaps conventions with no changes to styles, so all five of the included flavors are compatible: black, grayscale, white, light, and dark.

Screenshot 2026-01-03 at 3 38 32 PM Screenshot 2026-01-03 at 3 38 46 PM Screenshot 2026-01-03 at 3 38 03 PM Screenshot 2026-01-03 at 3 39 58 PM Screenshot 2026-01-03 at 3 39 01 PM

@migurski migurski force-pushed the migurski/add-overture-basemap-source branch from a905299 to 03a5d92 Compare January 5, 2026 19:09
@danabauer
Copy link

This is awesome, @migurski. /cc @jonahadkins

migurski and others added 28 commits January 12, 2026 08:52
…and line splitting

- Updated 6 existing tests to include access_restrictions and road_flags data
- Added 6 new splitting tests for partial bridge/tunnel/oneway/level application
- Tests use simple geometries (0,0)-(1,0) for easy verification
- All 12 tests failing as expected (property extraction and splitting not yet implemented)
- References real Overture feature IDs and OSM way IDs in comments
…way/level properties

Major Changes:
- Created com.protomaps.basemap.geometry.Linear utility class for line splitting operations
- Rewrote Roads.processOverture() to handle fractional 'between' ranges from Overture data
- Implemented collectSplitPoints() to gather all split positions from road_flags, access_restrictions, and level_rules
- Implemented extractSegmentProperties() to determine which properties apply to each split segment
- Added emitRoadFeature() to create features with custom split geometries

Results:
- 15/21 tests now passing (6 original property extraction tests now pass)
- 6 splitting tests create correct features with correct attributes and geometries
- Only remaining issue: cosmetic Norm{} wrapper in test assertions (geometries are actually correct)

Implementation handles:
- Partial bridges via road_flags with 'is_bridge' flag
- Partial tunnels via road_flags with 'is_tunnel' flag
- Partial oneway restrictions via access_restrictions with heading='backward'
- Partial level changes via level_rules
- Overlapping property ranges (e.g., bridge + oneway on same segment)
- Multiple split points creating 2-5 output features per input feature
…urved roads

- Add comprehensive unit tests for line splitting with curves
- Rewrite Linear.splitAtFractions() to preserve all vertices between split points
- Add coordinate transformation from lat/lon to world coordinates before emitting
- All 9 new Linear tests pass, roads render correctly with curves preserved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@migurski migurski changed the title [WIP] Add Overture data source support Add Overture data source support Jan 13, 2026
@migurski migurski force-pushed the migurski/add-overture-basemap-source branch from 5a435aa to a459963 Compare January 14, 2026 02:42
@sonarqubecloud
Copy link

@bdon
Copy link
Member

bdon commented Jan 15, 2026

COPY (
    SELECT *
    FROM read_parquet(
        's3://overturemaps-us-west-2/release/2025-12-17.0/**/*.parquet',
        hive_partitioning=1, filename=1, union_by_name=1
    )
    WHERE theme IN ('transportation', 'places', 'base', 'buildings', 'divisions')
      AND bbox.xmin <= -121
      AND bbox.xmax >= -123
      AND bbox.ymin <= 38
      AND bbox.ymax >= 37
) TO 'data/sources/bay-area.parquet' (FORMAT PARQUET);

Curious, how long does this take for you to download? From how long it takes it seems to require some enumeration over the overture .parquet files

@wipfli
Copy link
Collaborator

wipfli commented Jan 15, 2026

I am just downloading the entire December release now with something like:

aws s3 cp --no-sign-request --recursive s3://overturemaps-us-west-2/release/2025-12-17.0/ .

With about 10 MiB/s it is a bit pedestrian but if the whole thing is only roughly 500 GB then the download should still complete within 14 hours. Sometimes those downloads also get faster over time. Let's see...

@wipfli
Copy link
Collaborator

wipfli commented Jan 15, 2026

Thanks for making the online demo @migurski! The first thing that I saw when opening it was the Emirate Kush Cargo Airport in the middle of Oakland. According to the overture explorer this POI has a confidence score of roughly 63 percent. To experiment with thresholding it might make sense to temporarily add the confidence score in the tiles and then we could add a slider to the frontend and change the maplibre style with a filter or so based on the slider value.

@migurski
Copy link
Collaborator Author

I am just downloading the entire December release now with something like:

aws s3 cp --no-sign-request --recursive s3://overturemaps-us-west-2/release/2025-12-17.0/ .

With about 10 MiB/s it is a bit pedestrian but if the whole thing is only roughly 500 GB then the download should still complete within 14 hours. Sometimes those downloads also get faster over time. Let's see...

Try the DuckDB sample code in the description! The OMS Parquet files are already clustered spatially, and small areas can take just minutes to extract.

@migurski
Copy link
Collaborator Author

COPY (
    SELECT *
    FROM read_parquet(
        's3://overturemaps-us-west-2/release/2025-12-17.0/**/*.parquet',
        hive_partitioning=1, filename=1, union_by_name=1
    )
    WHERE theme IN ('transportation', 'places', 'base', 'buildings', 'divisions')
      AND bbox.xmin <= -121
      AND bbox.xmax >= -123
      AND bbox.ymin <= 38
      AND bbox.ymax >= 37
) TO 'data/sources/bay-area.parquet' (FORMAT PARQUET);

Curious, how long does this take for you to download? From how long it takes it seems to require some enumeration over the overture .parquet files

For me, about a minute or two.

@migurski
Copy link
Collaborator Author

Thanks for making the online demo @migurski! The first thing that I saw when opening it was the Emirate Kush Cargo Airport in the middle of Oakland. According to the overture explorer this POI has a confidence score of roughly 63 percent. To experiment with thresholding it might make sense to temporarily add the confidence score in the tiles and then we could add a slider to the frontend and change the maplibre style with a filter or so based on the slider value.

I agree, and incorporating QRank and confidence would be a good way to improve the POI quality since we don't have way areas or building heights like in OSM.

@wipfli
Copy link
Collaborator

wipfli commented Jan 17, 2026

I was able to run a local build of Zürich:

image

@wipfli
Copy link
Collaborator

wipfli commented Jan 17, 2026

I have a local copy now of the roughly 500 GB overture data.
Is it possible to run this on the full Overture data planet-wide?

@wipfli
Copy link
Collaborator

wipfli commented Jan 17, 2026

@msbarry uses the following I think to read a parquet directory:

https://github.com/onthegomap/planetiler-examples/blob/8901f77284712fd8f8f35f3fface712962d6287c/Overture.java#L42-L46

@migurski
Copy link
Collaborator Author

I have a local copy now of the roughly 500 GB overture data. Is it possible to run this on the full Overture data planet-wide?

I have not tried yet at this scale, but I’m not sure it would be meaningfully different from a planet-scale OSM import.

@wipfli
Copy link
Collaborator

wipfli commented Feb 4, 2026

The ideal way to interact with Overture data from my perspective would be that you specify an area as defined by the Geofabrik polygons with --area=monaco for example and you say that you want Overture instead of OpenStreetMap by adding a flag like --use-overture or so.
If those arguments are present, our program should download the polygon for Monaco from Geofabrik and then get an extract from the Overture parquets. @msbarry can the parquet reader already ingest polygons for clipping? Or is it maybe possible with bboxes?

@wipfli
Copy link
Collaborator

wipfli commented Feb 4, 2026

If no --area flag is provided then it should just make the whole planet.

@msbarry
Copy link
Contributor

msbarry commented Feb 4, 2026

The overture feature filter only works with a bounding box, although if you specify --polygon=country.poly it will initialize the bounding box to what covers that polygon and only process tiles that overlap the polygon.

Another possibility is to implement this feature on planetiler: onthegomap/planetiler#1455 which would let it just read the relevant parts of the parquet file over http while processing it so you don't need to store anything on disk. Overture just started publishing https://stac.overturemaps.org/ that would make this much simpler and faster.

@migurski
Copy link
Collaborator Author

I’m reluctant to formalize the Geofabrik area naming conventions for non-OSM data, but maybe there’s a way to do that cleanly by referencing the content of tiles/src/main/resources/borders.json ?

I do like the idea of referencing live Overture data and avoiding local storage. Perhaps in a followup PR?

bdon added a commit that referenced this pull request Feb 13, 2026
@bdon
Copy link
Member

bdon commented Feb 13, 2026

I would prefer for now that the Overture developer experience is analogous to the OSM one - a single input file that you can acquire however you like. The DuckDB script that creates a single .parquet from remote Overture data seems ideal.

Making that work with named areas would need a gazetteer, I think that would be an interesting external project, maybe using Overture boundaries themselves with a cloud-native storage format>

@bdon
Copy link
Member

bdon commented Feb 13, 2026

It would be useful if planetiler.addParquetSource read the bounds of the Parquet file (is this standardized in GeoParquet?) and then passed it through to the bounds header of the tile archive output. This way "fit bounds" on maps.protomaps.com would zoom to the precise area.

@migurski
Copy link
Collaborator Author

It would be useful if planetiler.addParquetSource read the bounds of the Parquet file (is this standardized in GeoParquet?) and then passed it through to the bounds header of the tile archive output. This way "fit bounds" on maps.protomaps.com would zoom to the precise area.

I’ll research what’s needed for this in the current PR.

migurski and others added 2 commits February 16, 2026 13:38
When using --overture with a Parquet file, the basemap now reads the
bounding box from the GeoParquet metadata and uses it to set the bounds
in the output PMTiles archive. This enables "fit bounds" functionality
on maps.protomaps.com to zoom to the precise area covered by the data.

The bounds are only applied when no --bounds argument is provided,
allowing users to still override with manual bounds if needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@migurski
Copy link
Collaborator Author

It would be useful if planetiler.addParquetSource read the bounds of the Parquet file (is this standardized in GeoParquet?) and then passed it through to the bounds header of the tile archive output. This way "fit bounds" on maps.protomaps.com would zoom to the precise area.

I’ll research what’s needed for this in the current PR.

The adjustment for this was small, implemented in 160a21c. The input Parquet files don’t consistently have bounding boxes defined when they're generated by DuckDB, so in my tests the output bounds are accurately set to match the whole-earth bbox of the input data. A way around this might be to read the actual envelope from Parquet row groups?

I resolved the merge conflicts.

Refactors bounds reading logic into a separate testable method and adds comprehensive test coverage including valid bounds extraction, missing file handling, and invalid file handling. Includes test GeoParquet file with Alcatraz Island buildings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants