Consider using UTF-8 when encoding is unspecified

For example, attempting to ingest:

https://unhcollection.unh.edu/database/content/dwca/UNHC-UNHC_DwC-A.zip

The published Darwin Core Archive includes a meta.xml which has a blank encoding value:

`encoding=""`

The rest of that line looks like:

```
<core dateFormat="YYYY-MM-DD" encoding="" fieldsTerminatedBy="," linesTerminatedBy="\n" fieldsEnclosedBy=""" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
```

The encoding value tell the consumers of the occurrence file how to process the file properly. 

The data provider has been unable to resolve the situation in over a year.

https://redmine.idigbio.org/issues/3002

Consider whether it is worth applying UTF-8 encoding in this situation so the data can be ingested, or whether it still makes sense to hard fail since there is a chance of "bad things" if the encoding turns out to be mismatched.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider using UTF-8 when encoding is unspecified #200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider using UTF-8 when encoding is unspecified #200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions