-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
For example, attempting to ingest:
https://unhcollection.unh.edu/database/content/dwca/UNHC-UNHC_DwC-A.zip
The published Darwin Core Archive includes a meta.xml which has a blank encoding value:
encoding=""
The rest of that line looks like:
<core dateFormat="YYYY-MM-DD" encoding="" fieldsTerminatedBy="," linesTerminatedBy="\n" fieldsEnclosedBy=""" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
The encoding value tell the consumers of the occurrence file how to process the file properly.
The data provider has been unable to resolve the situation in over a year.
https://redmine.idigbio.org/issues/3002
Consider whether it is worth applying UTF-8 encoding in this situation so the data can be ingested, or whether it still makes sense to hard fail since there is a chance of "bad things" if the encoding turns out to be mismatched.