Adds Zenodo data archival functionality by nuest · Pull Request #249 · GeoinformationSystems/optimap

nuest · 2026-05-11T21:29:26Z

Enables automatic data package generation and deposition to Zenodo, ensuring long-term preservation and citability for OPTIMAP data.

This comprehensive feature introduces:

Data Package Generation

A render_zenodo command that builds essential artifacts: a dynamic README.md (generated from a Jinja2 template with live statistics and source information), an optimap-main.zip containing the project's source code snapshot, and a zenodo_dynamic.json file for flexible metadata updates.
This process also manages versioning via last_version.txt.

Zenodo Deposition Management

A deposit_zenodo command for updating existing Zenodo draft depositions. It intelligently merges metadata, protecting crucial identifiers like DOIs, and ensures a clean slate by deleting previous files before uploading new ones.
A combined zenodo_deposit command simplifies the workflow by executing both the rendering and deposition steps sequentially.

Logging, Monitoring, and Notifications

A new ZenodoDepositionLog model records every deposition attempt, tracking status, uploaded files, total size, duration, and any errors encountered.
This log is accessible and viewable in the Django admin, offering detailed insights into each archival event.
Staff users receive email notifications detailing the outcome of each deposition, including direct links to the Zenodo draft for review.
The /data public page now prominently displays information about the latest successful Zenodo deposition, with environment-aware display (sandbox in DEBUG, production otherwise).

Streamlined Administration

An admin action "Trigger Zenodo Deposition" is available for Works, allowing a full render and deposit cycle to be initiated directly from the admin interface.

Configuration

New settings (ZENODO_API_TOKEN, ZENODO_SANDBOX_DEPOSITION_ID, ZENODO_API_BASE) are introduced for flexible environment configuration, supported by a tests/.env.template.

Enhanced Testing

Includes dedicated unit tests for the rendering and deposition logic, alongside robust integration tests that run against the actual Zenodo sandbox API, ensuring reliable end-to-end functionality.

Relates to #63

Implements functionality to deposit OPTIMAP data to Zenodo by creating/updating draft records. This feature enables automated archival and versioning of research data for long-term preservation and citation. Features: - Two Django management commands: - `render_zenodo`: Generates metadata files and data archives - `deposit_zenodo`: Uploads files and merges metadata to Zenodo drafts - Updates existing drafts only (requires deposition ID) - Never publishes automatically - manual approval required in Zenodo UI - Uploads: README.md, optimap-main.zip, latest GeoJSON and GeoPackage files - Merges metadata non-destructively without overwriting stable fields - Configurable via environment variables (ZENODO_API_TOKEN, etc.) - Comprehensive test coverage for rendering and deposition New files: - works/management/commands/deposit_zenodo.py - Upload to Zenodo - works/management/commands/render_zenodo.py - Generate metadata/archives - works/templates/README.md.j2 - Jinja2 template for README - data/README.md, data/last_version.txt, data/zenodo_dynamic.json - tests/test_deposit_zenodo.py - Deposition tests - tests/test_render_zenodo.py - Render tests Modified files: - .gitignore - Ignore Zenodo artifacts - optimap/settings.py - Add Zenodo configuration - requirements.txt - Add zenodo-client, markdown, jinja2 dependencies This implementation is adapted from PR #214 to work with the refactored codebase (publications/ → works/ directory structure). Closes ifgi#63 Co-authored-by: BharatVe <bharatveauli@live.com> Co-authored-by: BharatVe <150399011+BharatVe@users.noreply.github.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Adds comprehensive integration test suite for Zenodo deposition functionality with support for testing against the actual Zenodo sandbox API. Changes: - Fixed model references in tests (Publication → Work, publications → works) - Added tests/.env.template with configuration instructions - Created test_zenodo_integration.py with tagged integration tests - Tests can run against real Zenodo sandbox API with proper credentials - Added .env file to .gitignore to protect secrets Test categories: - Unit tests: Mock-based tests (existing) - Integration tests: Real API tests (new, tagged as 'integration') - Full deposit tests: End-to-end upload tests (tagged as 'slow' and 'upload') Usage: # Run only unit tests (no API calls): python manage.py test tests.test_deposit_zenodo tests.test_render_zenodo # Run integration tests (requires tests/.env): python manage.py test tests.test_zenodo_integration # Run specific test tags: python manage.py test --tag=integration python manage.py test --exclude-tag=slow Setup: 1. Copy tests/.env.template to tests/.env 2. Add Zenodo sandbox API token from https://sandbox.zenodo.org 3. Create a draft deposition and add its ID to .env 4. Run: python manage.py test tests.test_zenodo_integration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Implements automated data archival to Zenodo for long-term preservation and citability. - Introduces a new `zenodo` app with functions for rendering metadata, depositing data, and managing Zenodo records. - Creates new management commands (`render_zenodo`, `deposit_zenodo`, and `zenodo_deposit`) for simplified workflow. - Adds a new `ZenodoDepositionLog` model to track deposition history and status. - Enhances the Django admin interface with actions to trigger depositions and view logs. - Includes comprehensive documentation in `README.md` on setting up and using the Zenodo integration.

Refs ifgi#63. - untrack data/README.md, data/zenodo_dynamic.json, data/last_version.txt (sandbox render output from local runs leaked into the branch); extend .gitignore to cover them plus CSV dump variants - fix the README.md.j2 sources loop — was unpacking dicts as (label, url) tuples so every entry rendered as "[name](url)" with no newline between items; iterate over Source dicts properly - switch tests/test_deposit_zenodo.py and tests/test_render_zenodo.py from unittest.TestCase to django.test.TestCase so the in-test ZenodoDepositionLog.save() and ORM-created Source rows hit a real test DB instead of crashing (deposit) or polluting the dev DB (render) - refresh the 0009 migration header timestamp - CHANGELOG entry under Unreleased describing the deposit groundwork

Refs ifgi#63 (item 5). The render step now overwrites `related_identifiers` on every invocation with the three live download endpoints on optimap.science (geojson / geopackage / csv), derived from settings.BASE_URL + the URL config. Any stale identifiers from a previous render (e.g. localhost URLs left over from a dev run) are discarded, so a deposit can never publish links that only work on a developer's machine. Each entry uses scheme=url, relation=isSupplementTo, resource_type=dataset. Source-level "describes" entries land in a follow-up commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs ifgi#63 (item 6 / 2025-07-14 comment). Per harvested Source, the render step now adds a related_identifiers entry with relation=describes, resource_type=publication — wording straight from nuest's 2025-07-14 comment ("This record describes Journal X"). Scheme picked in order: 1. issn — Source.issn_l (linking ISSN) 2. url — Source.homepage_url canonicalised 3. url — Source.url_field canonicalised Self-references to optimap.science are skipped (the portal isn't a journal it describes), and duplicates collapse on the resolved (scheme, identifier) pair so two Source rows pointing at the same journal collapse to one entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs ifgi#63 (item 4). The deposit's file list now covers every output of regenerate_data_dumps: geojson, geojson.gz, gpkg, csv, and csv.gz. Previously only geojson(.gz) and gpkg shipped — CSV (issue #206) had been added on main but no one told Zenodo about it. The helper now also picks the newest cycle by timestamp when several co-exist in the same dir, so a deposit can't ship a stale .gpkg next to a fresh .geojson. README.md and optimap-main.zip still come from data_dir (where render writes them); data dumps prefer data_dir first (tests / single-dir layouts) and fall back to /tmp/optimap_cache (the default cache dir for production regenerate runs). dump_dir is a parameter so other callers can override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs ifgi#63 (last checklist item). The render step previously swallowed every error from `git archive HEAD` and then wrote a 0-byte optimap-main.zip as a "fallback", so a missing git binary, a non-repo working directory, or a `CalledProcessError` would all produce an empty zip that the deposit then uploaded to Zenodo under a "success" status. Now: - FileNotFoundError (`git` not on PATH) → RuntimeError with a clear hint. - CalledProcessError → RuntimeError including the exit code and stderr. - subprocess.run exits 0 but the file is missing or 0 bytes → RuntimeError with the stderr (covers SIGPIPE / corrupt repo / empty tree cases). The tests are adjusted to write a small non-empty stub zip in the patched subprocess.run, and gain two new cases for the failure paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs ifgi#63 (comment 2025-07-14, comment 2025-07-21). README codebook expands to cover every Work field that ends up in the data dumps — including the ones added since the original Zenodo branch landed: `type`, `authors`, `keywords`, `topics`, `bok_concepts`, `placename`, `country_code`, `volume`/`issue`/`first_page`/`last_page`, `openalex_*`. A short note up front states that the same field names appear verbatim as GeoJSON `Feature.properties`, CSV column headers and GeoPackage attribute columns, with CSV using `WKT` for geometry. Default keywords now include `Open Research Information` alongside `ORI` so the record is findable under either label, per the issue comment. A new `additional_descriptions[type=notes]` entry documents the CC0-1.0 / GPL-3.0 license split with the actual file scopes — README + optimap_data_dump_*.{geojson,geojson.gz,gpkg,csv,csv.gz} under CC0, optimap-main.zip under GPL-3.0. Default `patch_fields` in `deposit_to_zenodo` (and the deposit_zenodo command) is extended so the note actually gets pushed. The render test now copies the real README.md.j2 from the source tree into the tmp project root instead of using a tiny stub, so codebook and prose assertions exercise the production template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs ifgi#63 (2025-08-21 issue comment, Q2 decision). Renders now include a structured `grants` block with the two OPTIMAP grant IDs in OpenAIRE format: - OPTIMETA: 10.13039/501100002347::16TOA028B (BMBF) - KOMET: 10.13039/501100002347::16KOA009A (BMFTR) NFDI4Earth is deliberately excluded per the August comment. Zenodo's curated grants vocabulary doesn't cover every grant — when the metadata PUT returns 400 mentioning `grants`, the deposit now retries once with `grants` removed and prepends a free-text "Funding: …" paragraph to `metadata.notes`, so the funding info is still discoverable even if Zenodo can't resolve the IDs structurally. The fallback is recorded on ZenodoDepositionLog.notes for the admin email. `grants` is added to the default `--patch` list on `deposit_zenodo`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…file Refs ifgi#63. The version counter (v1, v2, v3, …) is now read from the latest successful ZenodoDepositionLog row for the current api_base instead of data/last_version.txt. The file had three problems: - it lived in the project tree but was never committed, so a fresh checkout silently restarted at v1 - sandbox and production runs shared the same counter, so a stream of sandbox renders would jump production's next version into double digits - a failed deposit still bumped the file, burning a version number that never reached Zenodo The new logic filters ZenodoDepositionLog by (status='success', api_base=…), takes the latest `version`, and emits N+1. Sandbox and production increment independently. Failed deposits don't advance the counter. render_zenodo_package gains an optional api_base argument with the same env/settings cascade as deposit_to_zenodo. deposit_to_zenodo now reads log_entry.version from the rendered zenodo_dynamic.json instead of the tracking file. The model and migration help_text are updated to match; .gitignore drops the now- obsolete data/last_version.txt entry; the integration tests stop seeding the file too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs ifgi#63 Make deposition_id optional in deposit_to_zenodo(): if not passed, fall back to the latest successful ZenodoDepositionLog for the same api_base; if there is no prior log either, bootstrap a fresh draft via POST /deposit/depositions. When the resolved record is already published (submitted=true + state="done"), POST .../actions/newversion and switch to the new draft from links.latest_draft before uploading. The admin action and both management commands drop their "no deposition ID" guards. Wrap the full cycle (regenerate dumps → render package → deposit) in works.tasks.run_zenodo_deposition and add a `schedule_zenodo_deposit` management command that idempotently registers it as a yearly Django-Q schedule for Dec 31 23:59. Publishing remains manual.

nuest and others added 12 commits May 11, 2026 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Zenodo data archival functionality#249

Adds Zenodo data archival functionality#249
nuest wants to merge 12 commits into
GeoinformationSystems:mainfrom
nuest:feature/zenodo-deposit

nuest commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nuest commented May 11, 2026

Data Package Generation

Zenodo Deposition Management

Logging, Monitoring, and Notifications

Streamlined Administration

Configuration

Enhanced Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant