Skip to content

Update wording in Geospatial spec#560

Open
milastdbx wants to merge 13 commits intoapache:masterfrom
milastdbx:updateGeoSpec
Open

Update wording in Geospatial spec#560
milastdbx wants to merge 13 commits intoapache:masterfrom
milastdbx:updateGeoSpec

Conversation

@milastdbx
Copy link
Copy Markdown

@milastdbx milastdbx commented Mar 31, 2026

Rationale for this change

Current geospatial spec has wording that different parties interpret differently. This PR aims to resolve the ambiguity.

What changes are included in this PR?

In this PR spec is updated Geospatial types such that wording is more precise in what is allowed and what is suggested.

As per discussion on https://lists.apache.org/thread/r5x0do8f241bpf565rx8s5s3wc9ogp0f it seems that most of the writers already behave in a way thats descirbed in this PR, so spec should be crisp about that.

Do these changes have PoC implementations?

Yes, writers are already implementing this

@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented Apr 1, 2026

cc geo experts @paleolimbot @jiayuasu

@jiayuasu
Copy link
Copy Markdown
Member

jiayuasu commented Apr 1, 2026

The overall idea looks ok to me. But as we mentioned in the Parquet mailing list, this sounds like a fairly significant change. Should we inform the geo community? CC @cholmes

@milastdbx milastdbx marked this pull request as ready for review April 1, 2026 12:22
@milastdbx
Copy link
Copy Markdown
Author

Thanks @jiayuasu ,

Yes can you please add relevant members ?

Also, I have updated examples, please take another look

@milastdbx milastdbx changed the title Updated geospatial doc Update wording in Geospatial spec Apr 1, 2026
Co-authored-by: Milan Stefanovic <150366084+milastdbx@users.noreply.github.com>
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @milastdbx ! Agree that the standard should not be too presecriptive to what all engine can put there , but still recommend the format

Copy link
Copy Markdown

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, otherwise LGTM

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this on! I think this is a much improved section that reflects the values that are likely to be written and that implementors can reference when building readers. Other than the typos this looks good to go from my end!

I can add an authority:code example file to parquet-examples as well (there is already an inline PROJJSON one there).

@jiayuasu
Copy link
Copy Markdown
Member

jiayuasu commented Apr 2, 2026

CC a few geo folks: @cholmes @jorisvandenbossche @kylebarron

Co-authored-by: Uros Bojanic <uros.bojanic@databricks.com>
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Copy link
Copy Markdown

@mkaravel mkaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor suggestion to add one more example for EPSG.

Co-authored-by: Menelaos Karavelas <mkaravel@users.noreply.github.com>
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully on board with the clarification! This is a big improvement of the document on that topic. Just a few smaller textual comments.

Geospatial.md Outdated

Custom CRS can be specified by a string value. It is recommended to use an
identifier-based approach like [Spatial reference identifier][srid].
Non-default CRS values are specified by any case insensitive string that uniquely identifies a coordinate reference system associated with this type.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to include the "case insensitive" part here?
AFAIK the projjson keys are certainly case sensitive

Or maybe only mention this for the relevant options, as the "authority" in authority:code can indeed be case insensitive?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good point - ill move it to authority:code section

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually ill remove case-insensitivity altogether, it does not hurt writers thinking that this is case insensitive, its up to readers to be more flexible, and they will do sensible thing

Copy link
Copy Markdown

@mkaravel mkaravel Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding case-insensitivity, I agree it does not make sense for PROJJSON. However, I think it would be helpful to have it for the other three cases. It specifically removes the ambiguity as to whether SRID:<identifier> or srid:<identifier> are both valid or not, and same for the <authority>:<code> and projjson:<table_property_name>.

Geospatial.md Outdated
* `authority:code` - where `authority` represents some well known authorities, and `code` is the code used by the authority to identify the CRS - examples are `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities.
* `srid:identifier` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where identifier is the numeric SRID value. For example: `SRID:0`.
* `projjson:table_property_name` - where `table_property_name` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key or table property.
* `projjson` - A complete CRS definition embedded directly using the [PROJJSON](https://proj.org/en/stable/specifications/projjson.html) specification. Example for `OGC:CRS83`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clear from the fact that the custom CRS value should be a string, but the example below showing the JSON object might make it confusing again. So wondering if there is some formulation to make it clearer this is the string representation of the JSON to be used here.

(I know I had made that mistake in the past for the GeoParquet spec, although there this crs value is included inside a json metadata key:value store as well, so there was this potential ambiguity. That might not actually exist here, since it is a string keyword of the type)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just before going into suggested formats we say that CRS is any string value, so hopefully that makes it unambigous. WDYT ?

What I think might be better is to do somethign like * <projjson> instead of projjson making it clear that this is not a literal, but a placeholder for what is about to be described.

However i do not see anybody else in spec doing it this way

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think might be better is to do somethign like * instead of projjson making it clear that this is not a literal, but a placeholder for what is about to be described.

That might actually also make it clearer if used for the other entries as well. For example for projjson:table_property_name, to make it clearer that "projjson" is literal and "table_property_name" is a placeholder. So I would be in favor of using it for all entries then

Copy link
Copy Markdown

@mkaravel mkaravel Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically?

  • <projjson>
  • <authority>:<code>
  • srid:<identifier>
  • projjson:<table_property_name>

I also think this is cleaner as to what is expected verbatim and what is a placeholder.

Geospatial.md Outdated
Non-default CRS values are specified by any string that uniquely identifies a coordinate reference system associated with this type.
To maximize interoperability, suggested (but not limited to) formats for CRS are:
* `authority:code` - where `authority` represents some well known authorities, and `code` is the code used by the authority to identify the CRS. Examples are - `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities.
* `srid:identifier` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where identifier is the numeric SRID value. For example: `SRID:0`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an authority, what does an SRID mean? Does the existence of an SRID:X reference imply the existence of a spatial_ref_sys table to dereference it into something actionable?

Copy link
Copy Markdown

@mkaravel mkaravel Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question. From my perspective SRID:X means that it is not standard so the engine needs to figure out how to deal with it. Do we have to specify the requirements on the engine side here? My thinking/preference is that we should not do that.

Practically speaking the main reason for this entry is SRID:0.

Geospatial.md Outdated
* `<projjson>` - A complete CRS definition embedded directly using the [PROJJSON](https://proj.org/en/stable/specifications/projjson.html) specification. Example for `OGC:CRS83`: `{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json","type": "GeographicCRS","name": "NAD83 (CRS83)","datum": {"type": "GeodeticReferenceFrame"...`
* `<authority>:<code>` - where `<authority>` represents some well known authorities, and `code` is the code used by the authority to identify the CRS. Examples are - `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`, `IGNF:ATI`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities.
* `srid:<identifier>` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where <identifier> is the numeric SRID value. For example: `SRID:0`.
* `projjson:<table_property_name>` - where `<table_property_name>` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key or table property.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `projjson:<table_property_name>` - where `<table_property_name>` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key or table property.
* `projjson:<table_property_name>` - where `<table_property_name>` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key.

For Parquet, it is just the file-level metadata (I don't think the spec talks about "table properties")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded it, LMK what you think

Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for iterating on this! I think the current text reflects the current reality and is far less confusing than the original text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants