You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bespoke MaRESS harvester (added in #192, commit `add Mountain Wetlands Repository (MaRESS) harvester`) takes a complicated path because every record currently returned by the public API has `DOI=null` and an empty `url`:
We compensate by calling `build_openalex_fields(title, doi=None, author=)` and post-classifying the result into `provenance.openalex_match.status` ∈ {`verified`, `candidate`, `none`}. When verified, we extract the DOI from `openalex_ids` and persist it on the Work.
This works but produces a noticeable fraction of `candidate` and `none` matches because title+author-only matching is fragile, especially on older records (1990s) and short titles.
The MaRESS maintainers have indicated that DOIs will likely be added to the API records soon. Once that lands, the harvester can be simplified considerably.
What to revisit when DOIs land
Use DOI as the primary OpenAlex match key — pass `build_openalex_fields(title, doi=<api_doi>)` and rely on the existing DOI-match strategy. Title+author becomes a fallback, not the primary signal.
Skip OpenAlex matching entirely when both DOI and authors come from the API — there is no extra metadata to recover and the API call is just rate-limit pressure.
Persist DOIs from the API directly, not via the OpenAlex `openalex_ids` round-trip. Drop the `raw.split('doi.org/', 1)[-1]` cleanup once we can trust the raw value.
Backfill existing harvested works whose `provenance.openalex_match.status` is `candidate` or `none`: re-run enrichment against the now-DOI-bearing records and upgrade matches where possible. The verbatim API record stored in `provenance.harvest.original_record` is the input — no re-fetch needed.
Drop the "et al." / null-firstName special-casing in `_mwr_first_author_surname` if DOI matching makes the surname signal redundant.
Document the simplified flow in MANAGE.md under "Mountain Wetlands Repository (MaRESS) — `mountain-wetlands` source type".
Replacing the harvester architecture itself — even with DOIs, the Zotero-shaped `study_sites` list still differs from OAI-PMH/RSS/Crossref enough to warrant a bespoke harvester.
Authenticated API access (BibTeX export, Zotero sync endpoints) — separate question, separate issue if pursued.
Context
The bespoke MaRESS harvester (added in #192, commit `add Mountain Wetlands Repository (MaRESS) harvester`) takes a complicated path because every record currently returned by the public API has `DOI=null` and an empty `url`:
The MaRESS maintainers have indicated that DOIs will likely be added to the API records soon. Once that lands, the harvester can be simplified considerably.
What to revisit when DOIs land
How to verify the API is ready
Run `curl -s 'https://andes.mountain-wetlands-repository.info/api/v1/items/?limit=500&scope=all' | jq '[.data[] | select(.DOI != null)] | length'` and check the result is a meaningful fraction of `.count`. If most records carry a DOI, this issue is actionable.
Out of scope