Conversation
|
cc geo experts @paleolimbot @jiayuasu |
|
The overall idea looks ok to me. But as we mentioned in the Parquet mailing list, this sounds like a fairly significant change. Should we inform the geo community? CC @cholmes |
|
Thanks @jiayuasu , Yes can you please add relevant members ? Also, I have updated examples, please take another look |
Co-authored-by: Milan Stefanovic <150366084+milastdbx@users.noreply.github.com>
szehon-ho
left a comment
There was a problem hiding this comment.
Thanks @milastdbx ! Agree that the standard should not be too presecriptive to what all engine can put there , but still recommend the format
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for taking this on! I think this is a much improved section that reflects the values that are likely to be written and that implementors can reference when building readers. Other than the typos this looks good to go from my end!
I can add an authority:code example file to parquet-examples as well (there is already an inline PROJJSON one there).
|
CC a few geo folks: @cholmes @jorisvandenbossche @kylebarron |
Co-authored-by: Uros Bojanic <uros.bojanic@databricks.com> Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
mkaravel
left a comment
There was a problem hiding this comment.
LGTM. One minor suggestion to add one more example for EPSG.
Co-authored-by: Menelaos Karavelas <mkaravel@users.noreply.github.com>
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Fully on board with the clarification! This is a big improvement of the document on that topic. Just a few smaller textual comments.
Geospatial.md
Outdated
|
|
||
| Custom CRS can be specified by a string value. It is recommended to use an | ||
| identifier-based approach like [Spatial reference identifier][srid]. | ||
| Non-default CRS values are specified by any case insensitive string that uniquely identifies a coordinate reference system associated with this type. |
There was a problem hiding this comment.
Is there a reason to include the "case insensitive" part here?
AFAIK the projjson keys are certainly case sensitive
Or maybe only mention this for the relevant options, as the "authority" in authority:code can indeed be case insensitive?
There was a problem hiding this comment.
thats a good point - ill move it to authority:code section
There was a problem hiding this comment.
actually ill remove case-insensitivity altogether, it does not hurt writers thinking that this is case insensitive, its up to readers to be more flexible, and they will do sensible thing
There was a problem hiding this comment.
Regarding case-insensitivity, I agree it does not make sense for PROJJSON. However, I think it would be helpful to have it for the other three cases. It specifically removes the ambiguity as to whether SRID:<identifier> or srid:<identifier> are both valid or not, and same for the <authority>:<code> and projjson:<table_property_name>.
Geospatial.md
Outdated
| * `authority:code` - where `authority` represents some well known authorities, and `code` is the code used by the authority to identify the CRS - examples are `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities. | ||
| * `srid:identifier` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where identifier is the numeric SRID value. For example: `SRID:0`. | ||
| * `projjson:table_property_name` - where `table_property_name` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key or table property. | ||
| * `projjson` - A complete CRS definition embedded directly using the [PROJJSON](https://proj.org/en/stable/specifications/projjson.html) specification. Example for `OGC:CRS83`: |
There was a problem hiding this comment.
It might be clear from the fact that the custom CRS value should be a string, but the example below showing the JSON object might make it confusing again. So wondering if there is some formulation to make it clearer this is the string representation of the JSON to be used here.
(I know I had made that mistake in the past for the GeoParquet spec, although there this crs value is included inside a json metadata key:value store as well, so there was this potential ambiguity. That might not actually exist here, since it is a string keyword of the type)
There was a problem hiding this comment.
Just before going into suggested formats we say that CRS is any string value, so hopefully that makes it unambigous. WDYT ?
What I think might be better is to do somethign like * <projjson> instead of projjson making it clear that this is not a literal, but a placeholder for what is about to be described.
However i do not see anybody else in spec doing it this way
There was a problem hiding this comment.
What I think might be better is to do somethign like * instead of projjson making it clear that this is not a literal, but a placeholder for what is about to be described.
That might actually also make it clearer if used for the other entries as well. For example for projjson:table_property_name, to make it clearer that "projjson" is literal and "table_property_name" is a placeholder. So I would be in favor of using it for all entries then
There was a problem hiding this comment.
So basically?
<projjson><authority>:<code>srid:<identifier>projjson:<table_property_name>
I also think this is cleaner as to what is expected verbatim and what is a placeholder.
Geospatial.md
Outdated
| Non-default CRS values are specified by any string that uniquely identifies a coordinate reference system associated with this type. | ||
| To maximize interoperability, suggested (but not limited to) formats for CRS are: | ||
| * `authority:code` - where `authority` represents some well known authorities, and `code` is the code used by the authority to identify the CRS. Examples are - `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities. | ||
| * `srid:identifier` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where identifier is the numeric SRID value. For example: `SRID:0`. |
There was a problem hiding this comment.
Without an authority, what does an SRID mean? Does the existence of an SRID:X reference imply the existence of a spatial_ref_sys table to dereference it into something actionable?
There was a problem hiding this comment.
That's a very good question. From my perspective SRID:X means that it is not standard so the engine needs to figure out how to deal with it. Do we have to specify the requirements on the engine side here? My thinking/preference is that we should not do that.
Practically speaking the main reason for this entry is SRID:0.
Geospatial.md
Outdated
| * `<projjson>` - A complete CRS definition embedded directly using the [PROJJSON](https://proj.org/en/stable/specifications/projjson.html) specification. Example for `OGC:CRS83`: `{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json","type": "GeographicCRS","name": "NAD83 (CRS83)","datum": {"type": "GeodeticReferenceFrame"...` | ||
| * `<authority>:<code>` - where `<authority>` represents some well known authorities, and `code` is the code used by the authority to identify the CRS. Examples are - `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`, `IGNF:ATI`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities. | ||
| * `srid:<identifier>` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where <identifier> is the numeric SRID value. For example: `SRID:0`. | ||
| * `projjson:<table_property_name>` - where `<table_property_name>` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key or table property. |
There was a problem hiding this comment.
| * `projjson:<table_property_name>` - where `<table_property_name>` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key or table property. | |
| * `projjson:<table_property_name>` - where `<table_property_name>` is a reference to a CRS definition in [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), stored in a file level key/value metadata key. |
For Parquet, it is just the file-level metadata (I don't think the spec talks about "table properties")
There was a problem hiding this comment.
I reworded it, LMK what you think
paleolimbot
left a comment
There was a problem hiding this comment.
Thanks all for iterating on this! I think the current text reflects the current reality and is far less confusing than the original text.
Rationale for this change
Current geospatial spec has wording that different parties interpret differently. This PR aims to resolve the ambiguity.
What changes are included in this PR?
In this PR spec is updated Geospatial types such that wording is more precise in what is allowed and what is suggested.
As per discussion on https://lists.apache.org/thread/r5x0do8f241bpf565rx8s5s3wc9ogp0f it seems that most of the writers already behave in a way thats descirbed in this PR, so spec should be crisp about that.
Do these changes have PoC implementations?
Yes, writers are already implementing this