Skip to content

MDF Agent: v2 client with auth, curation, streaming, and agentic handlers#43

Open
blaiszik wants to merge 18 commits intomasterfrom
mdf-agent
Open

MDF Agent: v2 client with auth, curation, streaming, and agentic handlers#43
blaiszik wants to merge 18 commits intomasterfrom
mdf-agent

Conversation

@blaiszik
Copy link
Contributor

Summary

  • New mdf_agent package: full Python client + CLI for the MDF Connect v2 backend
  • BackendClient.authenticated() with multi-path auth resolution: explicit token → confidential client credentials → dev user → interactive Globus OAuth
  • CLI commands: mdf login/logout/whoami, mdf publish, mdf backend *, mdf stream *, mdf search
  • Curation CLI: curation-pending, curation-detail, curation-approve, curation-reject
  • Dataset discovery: preview, files, sample, dataset cards, citations
  • Streaming: create, append, close (with DOI minting), snapshot, clone
  • Agent-safe skill handlers for all operations (structured error handling)
  • Data source URL normalization, domains, and external import metadata
  • Comprehensive test suites: auth routing, client sync, CLI, models, extractors, repository, submission normalization
  • Legacy mdf_forge / mdf_connect_client code preserved in legacy/
  • Example scripts including E2E staging tests and Globus Search verification

Key files

Path Description
src/mdf_agent/core/backend_client.py HTTP client with auth, all v2 endpoints
src/mdf_agent/cli/main.py CLI entry point (login, publish, search)
src/mdf_agent/cli/backend.py Backend subcommands (curation, preview)
src/mdf_agent/cli/stream.py Stream subcommands
src/mdf_agent/skill/handlers.py Agent-safe handlers for MCP/agentic use
src/mdf_agent/auth/globus.py Globus OAuth with token caching

Test plan

  • python -m pytest tests/ -v — all client tests pass
  • mdf login --service prod — interactive Globus auth works
  • mdf backend health --service staging — staging health check
  • E2E: submit → approve → verify DOI + search via examples/test_staging_e2e.py

🤖 Generated with Claude Code

blaiszik and others added 18 commits January 31, 2026 10:13
- add authenticated BackendClient with service URL resolution and auth header injection
- add login/logout/whoami commands and unify publish/search auth flows
- add backend/stream subcommand auth callbacks and authenticated client routing
- migrate agent/skill handlers and standalone publish CLI to authenticated backend path
- deprecate legacy submit_submission path while keeping backward compatibility
- add focused tests for backend auth routing and auth lifecycle commands
- update v2-review.md with phase completion and verification
… submission model

- Add normalize_data_source() to convert Globus File Manager URLs and
  data.materialsdatafacility.org URLs to canonical globus:// URIs
- Update resolve_data_sources() to normalize URLs and handle stream:// pass-through
- Add standalone DataCite test minting script (examples/test_datacite_mint.py)
- Add demo scripts for full lifecycle and staging Globus integration
- Extend submission model and backend client for v2 API compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…refix 10.23677)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add test_submission_normalize.py: 14 tests for normalize_data_source()
  and resolve_data_sources() covering Globus File Manager URLs, MDF data
  domain, passthrough cases, and encoded paths
- Add test_staging_e2e.py: full submit → approve → publish E2E test
  against deployed staging backend

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests the full version lifecycle against deployed staging:
- v1.0 with mint_doi=True (dataset DOI)
- v1.1 with mint_doi=False (inherit, update metadata)
- v1.2 with mint_doi=True (version-specific DOI)
- Queries DataCite test API for relationship metadata
- Prints pass/fail assertions for post-implementation verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add `domains` (List[str]) for scientific domain categorization and
external import provenance fields (external_doi, external_url,
external_source) for datasets imported from other repositories.

Wired through ManifestConfig → to_metadata_payload() → Submission →
to_payload(). Includes 12 unit tests and an E2E staging test script.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix test_domains_external_e2e.py to read fields from dataset_mdata
(not top-level submission). Add domains/external import coverage to
test_staging_search_e2e.py with full pipeline verification (submit →
approve → publish → search index).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update v2-review.md with Part II+III implementation details (BackendClient
sync, curation/preview methods, confidential client auth, skill handlers).
Add check_search_index.py example. Add .DS_Store to .gitignore.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover install, CLI commands (auth, publish, stream, backend, search),
Python SDK usage, auth resolution, service targeting, and connection
to the backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace read_bytes() with 8MB chunked streaming to avoid OOM on large files
- Use granular httpx.Timeout (connect=30, read/write=300, pool=30)
- Upload to /mdf_open/{source_id}/ when source_id available, fallback to _uploads/{uuid}/
- Add progress_callback parameter threaded from publish() through _upload_local_files
- Show rich progress bar (filename, bar, size, speed) in CLI publish --submit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cuts

- Add GlobalConfig (~/.config/mdf_agent/config.json) with resolve_service(),
  dotted-key get/set, and record_publish() for last-published state
- Add `mdf config show/set/get/path` commands
- Rewrite `mdf publish` with dual mode: direct (data args + --title/--author)
  and repo mode; saves last_publish to config on success
- Rewrite `mdf status` to accept optional source_id arg, default from config,
  show both repo state and backend status
- Add top-level `mdf pending` (formatted curation list), `mdf approve <id>`,
  `mdf reject <id> --reason`
- Change all --service defaults from "staging" to None, resolve via config
- Fix HTTPS upload base URL (data.materialsdatafacility.org)
- Add Transfer API mkdir before HTTPS PUT uploads
- Plumb transfer_token through BackendClient for mkdir operations
- Remove dead cli/publish.py (was never registered)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set no_wrap=True on Source ID/ID columns so Rich never truncates them.
Move title truncation from hard [:40] slice to max_width=40 on the
column, letting Rich wrap titles to give ID columns room.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add list/show/versions/update commands with Rich formatting, shared
formatting layer, retry with backoff on 429/502/503/504, upload retry,
SDK methods (search/pending/approve/reject/versions/show/cite),
interactive init, config validation, backend CI job, and docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When local directories are uploaded, a data.zip is created and uploaded
to .mdf/data.zip alongside individual files. The archive URI is set as
download_url for one-click download. Includes 12 GB size guardrail and
symlink safety.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the custom staging/commit tracking system with real git
operations. Repository.init_repo() now calls git init, stage()
calls git add, commit() calls git commit, and publish uses
git ls-files for file enumeration.

- Rewrite repository.py to use subprocess git calls
- Add get_status(), get_tracked_files(), tag(), has_commits()
- Update agent.py to work with git-backed Repository
- Update submission.py to use git ls-files instead of commit history
- Update CLI status to show staged/modified/untracked from git
- Gut models/state.py (Commit, RepositoryState no longer needed)
- Rewrite test_repository.py with 27 git-aware tests
- Update test_cli.py and test_mdf_agent_basic.py
- All 220 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant