Skip to content

ENH: Add ExternalDataUpload skill for local developer and AI agent testing content-link upload workflow#6111

Draft
thewtex wants to merge 7 commits intoInsightSoftwareConsortium:mainfrom
thewtex:external-data-upload-skill
Draft

ENH: Add ExternalDataUpload skill for local developer and AI agent testing content-link upload workflow#6111
thewtex wants to merge 7 commits intoInsightSoftwareConsortium:mainfrom
thewtex:external-data-upload-skill

Conversation

@thewtex
Copy link
Copy Markdown
Member

@thewtex thewtex commented Apr 23, 2026

Adds Utilities/Maintenance/ExternalDataUpload/ with a Claude Code skill that uploads test data to IPFS under the UnixFS v1 2025 profile, pins on the redundant itk-pinata and itk-filebase remote services, optionally mirrors bytes into an ITKTestingData clone at CID/ (with a 50 MB guard for GitHub's per-file push limit), maintains a new Testing/Data/content-links.manifest index, batch-pins every manifest CID, and normalizes existing .md5 / .sha256 / .cid links by fetching through the gateway templates parsed directly from CMake/ITKExternalData.cmake and re-uploading under the current UnixFS profile. Documents the one-time Kubo + IPFS Desktop setup and references the skill from Testing/Data/README.md.

WIP Todos:

  • Run on existing data in the repository
  • Document in Documentation/docs

@thewtex thewtex requested a review from hjmjohnson April 23, 2026 16:46
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Apr 23, 2026
Comment thread Utilities/Maintenance/ExternalDataUpload/content-link-normalize.sh Outdated
thewtex added a commit to InsightSoftwareConsortium/ITKTestingData that referenced this pull request Apr 23, 2026
@github-actions github-actions Bot added area:Filtering Issues affecting the Filtering module type:Data Changes to testing data labels Apr 23, 2026
@thewtex thewtex force-pushed the external-data-upload-skill branch 2 times, most recently from 826ac24 to 1c13696 Compare April 23, 2026 17:36
@thewtex
Copy link
Copy Markdown
Member Author

thewtex commented Apr 23, 2026

@dzenanz related improvements made in 1c13696

@github-actions github-actions Bot added the area:Documentation Issues affecting the Documentation module label Apr 23, 2026
@thewtex thewtex force-pushed the external-data-upload-skill branch 2 times, most recently from da60d98 to 1e287d1 Compare April 23, 2026 18:22
@hjmjohnson
Copy link
Copy Markdown
Member

@greptileai, please review so this can be taken out of draft mode.

@greptile-apps

This comment was marked as resolved.

Comment thread Utilities/Maintenance/ExternalDataUpload/ipfs-pin-all.sh Outdated
Comment thread Utilities/Maintenance/ExternalDataUpload/content-link-normalize.sh Outdated
Comment thread Utilities/Maintenance/ExternalDataUpload/ipfs-upload.sh Outdated
Comment thread Utilities/Maintenance/ExternalDataUpload/ipfs-pin-all.sh Outdated
@hjmjohnson
Copy link
Copy Markdown
Member

@thewtex. FYI, I can't get the upload access to work for either of the recommended services. I am using this skill and resources to help me configure the upload mechanisms, but I seem to be running into roadblocks:

Pinata

# Pinata — app.pinata.cloud/developers/api-keys                                                                                                                                                               
# Create a new JWT with BOTH scopes: pinByHash AND pinFileToIPFS                                                                                                                                              
ipfs pin remote service rm itk-pinata                                                                                                                                                                         
printf "Pinata JWT: " && read -rs PINATA_JWT && echo                                                                                                                                                          
ipfs pin remote service add itk-pinata https://api.pinata.cloud/psa "$PINATA_JWT"         


                                                                                                                                                                                                          
❯ └─[0]  ipfs pin remote add --service=itk-pinata --name=smoketest --background bafkreicq27yz3am4sq4oixpqgtwnjkahhmwnwcgtdyvhrutaeu6a3zqwie                                                                     
  Error: reason: "PAID_FEATURE_ONLY", details: "You must be on a paid plan to pin by CID": 403 Forbidden                                                                                                        
  ⎿  UserPromptSubmit says: [memsearch] Memory available                                                                                                                                                        
                                                                                                                                                                                                                
⏺ Important architectural news: Pinata's free plan has disabled pin-by-CID entirely. The PSA (Pinning Service API) pin remote add <cid> endpoint is now a paid-only feature. This isn't a token scope issue — 
  it's a plan-tier restriction.                                                                                                                                                                                 
                                                                                                                                                                                                                
  ★ Insight ─────────────────────────────────────                                                                                                                                                               
  - Pinata's free tier now only accepts direct file uploads via pinFileToIPFS (their proprietary REST endpoint), not pin-by-CID via the standard PSA. ipfs pin remote add exclusively uses PSA, so no amount of 
  scripting around the Kubo CLI can work on a free Pinata plan.                                                                                                                                                 
  - The ITK skill's architecture assumes PSA works on both services (for CID-addressed pinning without uploading bytes). If Pinata-free is now PSA-locked, the skill's assumption is broken and the README needs
   updating too.                                                                                                                                                                                                
  - Filebase does support PSA pin-by-CID on its free tier (last I knew) — worth confirming before declaring full failure.                                                                                       
  ─────────────────────────────────────────────────                                                    

Filebase

  # Filebase — console.filebase.com/keys, under the IPFS bucket's pinning API endpoint                                                                                                                          
  ipfs pin remote service rm itk-filebase                                                                                                                                                                       
  printf "Filebase token: " && read -rs FILEBASE_TOKEN && echo                                                                                                                                                  
  ipfs pin remote service add itk-filebase https://api.filebase.io/v1/ipfs "$FILEBASE_TOKEN"
  
                                                                                                                                          
  If it still fails with ERR_INVALID_TOKEN, the token itself is bad (revoked/rotated/expired) — regenerate from the same card.                                                                                  
                                                                                                                                                                                                                
  If Filebase is permanently dead for you   

thewtex added 4 commits April 26, 2026 19:19
Adds Utilities/Maintenance/ExternalDataUpload/ with a Claude Code skill
that uploads test data to IPFS under the UnixFS v1 2025 profile, pins on
the redundant itk-pinata and itk-filebase remote services, optionally
mirrors bytes into an ITKTestingData clone at CID/<cid> (with a 50 MB
guard for GitHub's per-file push limit), maintains a new
Testing/Data/content-links.manifest index, batch-pins every manifest CID,
and normalizes existing .md5 / .sha256 / .cid links by fetching through
the gateway templates parsed directly from CMake/ITKExternalData.cmake
and re-uploading under the current UnixFS profile. Documents the one-time
Kubo + IPFS Desktop setup and references the skill from
Testing/Data/README.md.
Add `--background` to both `ipfs-upload.sh` and `content-link-normalize.sh`
to submit remote pin requests asynchronously via `ipfs pin remote add
--background`. The default remains synchronous (surfaces failures
immediately, safer for one-off uploads); `--background` is intended for
batch runs where waiting for each remote to reach `pinned` (minutes per
file on Filebase) would be impractical.

Also dedup remote-pin submission: before calling `ipfs pin remote add`,
query `ipfs pin remote ls --status=queued,pinning,pinned` for the CID
and skip the add if a pin already exists on that service. This avoids
Pinata's `DUPLICATE_OBJECT` (400) error on re-runs of previously
uploaded content, and prevents Filebase from accumulating duplicate
queue entries.

README.md and SKILL.md document the new flag, the synchronous vs.
asynchronous tradeoff, and the post-run verification command
(`ipfs pin remote ls --status=...`).
Convert the 24 `.md5` content links in
Modules/Filtering/AnisotropicDiffusionLBR/test/{Input,Baseline}/ to
`.cid` links under the UnixFS v1 2025 profile, produced by
`Utilities/Maintenance/ExternalDataUpload/content-link-normalize.sh
--hash-only --background`. Bytes were fetched through the gateway
templates in CMake/ITKExternalData.cmake, verified against each
declared MD5 hash, and re-uploaded; all new CIDs are pinned locally
plus on `itk-pinata` and `itk-filebase`.

Record the 24 new CIDs in Testing/Data/content-links.manifest along
with two additional entries picked up as a `--cid-only` sampling run
(CurvatureAnisotropicDiffusionImageFilter.2.png and warp3D.nii.gz),
both of which re-hashed to identical CIDs — confirming existing `.cid`
links in the tree are already compatible with the 2025 profile.

No test semantics change: `CMake/ITKExternalData.cmake` resolves
`DATA{...}` references by whichever `.md5` / `.sha256` / `.cid` link
sits next to the referenced path, so the filter tests continue to
fetch the same bytes.
In content-link-normalize.sh, the prerequisite warning pre-check was
iterating every sha variant (sha1/224/256/384/512) and requiring GNU
coreutils `*sum` binaries. Two issues:

  1. ITK content links in practice are only .md5 (legacy) and .sha512
     (current), so warning about missing sha224/sha384 tools was noise.
     Narrow the pre-check to md5 and sha512.

  2. macOS ships BSD `md5` and `shasum`, not coreutils `md5sum` /
     `sha512sum`. Warning on their absence was a false positive for
     macOS contributors, and the verification path invoked them by
     name ("$tool" "$file") so it would actually fail.

Replace `hash_tool_for_ext` (name-only) with `hash_cmd_for_ext` that
returns a full command line — preferring GNU `md5sum` / `shaNsum`
when present, falling back to `md5 -r` (BSD md5 with md5sum-compatible
output) and `shasum -a NNN` (BSD shasum). `verify_bytes` uses
intentional word-splitting so the multi-word fallback
(e.g. "shasum -a 256") expands to distinct argv entries.

Addresses review at
https://github.com/InsightSoftwareConsortium/ITK/pull/6111/files#r3132434963
@hjmjohnson hjmjohnson force-pushed the external-data-upload-skill branch from 1e287d1 to 65b04cd Compare April 27, 2026 00:19
@dzenanz
Copy link
Copy Markdown
Member

dzenanz commented Apr 27, 2026

Matt should take a look now.

Rewrite Documentation/docs/contributing/upload_binary_data.md and
data.md to describe the new Kubo + pinning-service workflow driven by
Utilities/Maintenance/ExternalDataUpload/ipfs-upload.sh, replacing the
obsolete web3.storage / w3cli and content-link-upload.itk.org
instructions. Document the one-time Kubo + itk-pinata / itk-filebase
setup, the upload script's behavior (CIDv1 under the UnixFS v1 2025
profile, synchronous vs. --background pinning, manifest update),
the optional --testing-data-repo mirror step with the 50 MB GitHub
limit, and the content-link-normalize.sh conversion workflow for
legacy .md5 / .sha256 / .sha512 links. Refresh the storage-location
list and testing-data figure caption to match the gateways enumerated
in CMake/ITKExternalData.cmake, and remove the now-orphaned
content-link-upload.png screenshot of the retired web app.
@thewtex thewtex force-pushed the external-data-upload-skill branch from 65b04cd to 65f7c84 Compare April 28, 2026 19:59
Pinata's `pin remote add` endpoint (the IPFS Pinning Service API) is
gated to paid plans — the free plan rejects pin-by-CID with
PAID_FEATURE_ONLY (HTTP 403), as reported by @hjmjohnson while
exercising the new ExternalDataUpload skill. Filebase's free tier still
accepts PSA pin-by-CID, so it remains the baseline pin provider for
contributors who don't have a paid Pinata account.

ipfs-upload.sh now splits its remote-pinning configuration into a
required list (`itk-filebase`) and an optional list (`itk-pinata`):
the script aborts if Filebase isn't registered, but logs an
informational notice and continues if Pinata isn't. The remote-pin
loop walks the merged ACTIVE_SERVICES list so Pinata is still pinned
to whenever it is configured. The reorder also surfaces Filebase
first in every user-facing list (storage locations, log lines,
manifest-skipped warnings, README setup section, contributor docs)
to match the new "required first, optional second" hierarchy.

Documentation in README.md, SKILL.md, Documentation/docs/contributing/
upload_binary_data.md, and Documentation/docs/contributing/data.md is
updated to reorder Filebase ahead of Pinata, mark Pinata as optional,
and explain the paid-plan restriction. README.md gains a troubleshooting
entry for the PAID_FEATURE_ONLY error pointing at
`ipfs pin remote service rm itk-pinata` as the cleanest fix when no
paid plan is available.

Agent-Session-Id: 40f8eba4-dc94-4d4f-94bd-ff3d2fccf04f

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@thewtex
Copy link
Copy Markdown
Member Author

thewtex commented Apr 30, 2026

@hjmjohnson @dzenanz — addressed the Pinata issue in 0e9dd0b.

itk-pinata is now optional in both the upload scripts and the docs. The change:

  • ipfs-upload.sh splits its remote-pinning configuration into REQUIRED_SERVICES=(itk-filebase) and OPTIONAL_SERVICES=(itk-pinata). Filebase is required (its free tier accepts PSA pin-by-CID); Pinata is attempted only if it's been registered with ipfs pin remote service add. If it isn't, the script logs ==> Optional pinning service 'itk-pinata' is not configured; skipping. and continues without aborting.
  • The remote-pin loop iterates over the merged ACTIVE_SERVICES list, so Pinata is still pinned to whenever a paid plan is configured.
  • ipfs-pin-all.sh already discovered remote services dynamically — its header comment is updated for accuracy but no logic change was needed.
  • README.md, SKILL.md, Documentation/docs/contributing/upload_binary_data.md, and Documentation/docs/contributing/data.md reorder Filebase ahead of Pinata in every list and explicitly mark Pinata as optional, paid plan only, with a callout that the free Pinata tier rejects PSA pin-by-CID with PAID_FEATURE_ONLY (HTTP 403).
  • README.md gains a troubleshooting entry recommending ipfs pin remote service rm itk-pinata as the cleanest fix when no paid plan is available, so the upload script just skips it.

So contributors without a paid Pinata plan can now configure only itk-filebase and the upload still succeeds — Filebase + the GitHub Pages mirror provide the redundancy, with public IPFS gateways as additional read paths.

@hjmjohnson
Copy link
Copy Markdown
Member

@thewtex

FYI: I can not get the pinning services to work. On both pinata and filebase I get API restrictions requiring a paid account.

I was trying to mirror the recent ITKTestingData additions to these external services for redundancy.

┌─[johnsonhj@ENGR-ECE-M030] - [~/src/XXX/ITK_REMOTE_MODULES_STABLE] - [2026-05-01 07:55:29]
└─[0] find . -name "*.md5" cd /Users/johnsonhj/src/XXX/ITK_REMOTE_MODULES_STABLE                                                                                  
  # Smoke (no --background to surface auth errors immediately):                                                                                                   
  ls ~/src/XXX/ITKTestingData/CID | head -5 | while read cid; do                                                                                                  
    ipfs pin remote add --service=itk-filebase --name="$cid" "$cid"                                                                                               
  done
find: cd: unknown primary or operator
Error: reason: "FORBIDDEN", details: "The Pinning Service API requires a paid account": 403 Forbidden
Error: reason: "FORBIDDEN", details: "The Pinning Service API requires a paid account": 403 Forbidden
Error: reason: "FORBIDDEN", details: "The Pinning Service API requires a paid account": 403 Forbidden
Error: reason: "FORBIDDEN", details: "The Pinning Service API requires a paid account": 403 Forbidden
Error: reason: "FORBIDDEN", details: "The Pinning Service API requires a paid account": 403 Forbidden

@github-actions github-actions Bot added the area:Python wrapping Python bindings for a class label May 1, 2026
Drops the local Kubo / IPFS-Desktop daemon, the `ipfs config profile
apply unixfs-v1-2025` setup step, the `ipfs pin remote service add`
PSA registrations (`itk-filebase`, `itk-pinata`), and the bash upload
trio (`ipfs-upload.sh`, `content-link-normalize.sh`, `ipfs-pin-all.sh`)
that drove them. The new contributor flow is pure Python on top of a
small pixi environment:

  1. `npx ipfs-car pack <file> --no-wrap` builds a CARv1 locally.
     ipfs-car v1+ defaults (1 MiB chunks, 1024 children, raw leaves,
     CIDv1) match the unixfs-v1-2025 / IPIP-0499 profile, so no extra
     flags are needed to produce a reproducible CID.
  2. `boto3` PUTs the CAR to a Filebase IPFS bucket through Filebase's
     S3-compatible REST API with `x-amz-meta-import: car`. Filebase
     imports the CAR server-side and exposes the resulting CID via
     `head_object` metadata.
  3. The local CID and the CID Filebase reports are compared, and on
     success the file is replaced with `<file>.cid`, the manifest at
     `Testing/Data/content-links.manifest` is updated, and the optional
     `--testing-data-repo` mirror step still copies the bytes into a
     local ITKTestingData clone (subject to the same 50 MB GitHub push
     limit as before).

Concretely:

- Add `boto3`, `nodejs`, and `requests` to a new
  `[tool.pixi.feature.external-data-upload]` feature plus an
  `external-data-upload` environment in `pyproject.toml`. Run
  `pixi install -e external-data-upload` once, then
  `pixi run -e external-data-upload python ...` for every upload.
- New `Utilities/Maintenance/ExternalDataUpload/upload.py` is the
  single-file uploader: input validation (in-repo, no whitespace, not
  already a content link), CAR build, boto3 put_object with the
  `import: car` metadata header, head_object CID round-trip, manifest
  update, optional ITKTestingData mirror, and the same `git rm` /
  `git add` instructions as before.
- New `Utilities/Maintenance/ExternalDataUpload/normalize.py` parses
  `ExternalData_URL_TEMPLATES` from `CMake/ITKExternalData.cmake` with a
  paren-aware scanner (the `%(hash)` / `%(algo)` substrings break naive
  `re.DOTALL` lazy matching), fetches each `.md5` / `.shaNNN` / `.cid`
  link via the gateway templates, verifies the bytes
  algorithmically (or via the `/ipfs/` server-side guarantee for
  CID links), and re-uploads through `upload.upload_file_to_filebase`.
- `Utilities/Maintenance/ExternalDataUpload/README.md` is rewritten end
  to end: pixi setup, Filebase S3-key creation, `FILEBASE_ACCESS_KEY` /
  `FILEBASE_SECRET_KEY` / `FILEBASE_BUCKET` env-var contract, new
  troubleshooting section (missing npx, missing credentials, Filebase
  did not return a CID, CID mismatch).
- `Utilities/Maintenance/ExternalDataUpload/SKILL.md` updated to
  describe the same flow for the AI agent: pixi env + Filebase
  credentials prerequisites; no Kubo, no PSA service registration.
- `Documentation/docs/contributing/upload_binary_data.md` and
  `Documentation/docs/contributing/data.md` rewrite the
  one-time-setup, upload-a-file, mirror, and normalize sections
  around the pixi + Filebase workflow. The storage-locations list and
  testing-data-figure caption are reworded so Filebase appears as the
  upload destination and Kubo / Pinata only show up as build-time read
  paths (gateways, not pinning targets).
- `Testing/Data/content-links.manifest` header rewritten to credit
  `upload.py` as the maintainer (previously named
  `ipfs-upload.sh`).

The Filebase free tier supports the S3 import-as-CAR path used here,
so the workflow needs no paid subscription — addressing the original
Pinata \`PAID_FEATURE_ONLY\` blocker reported by @hjmjohnson — and CI
runners can use the same env-var contract via GitHub Actions secrets.
@thewtex thewtex force-pushed the external-data-upload-skill branch from 7c8594a to 05be188 Compare May 1, 2026 20:46
@thewtex
Copy link
Copy Markdown
Member Author

thewtex commented May 1, 2026

@hjmjohnson thanks for the note and testing 🥇. I have paid accounts for Pinata and Filebase and did not know that these features are not available 🤦.

I pushed an update that just uses Filebase, but its S3 API, which is on the free tier. It also uses dependencies that are made available via pixi. And removes the kubo / ipfs-desktop installation / setup / run requirement.

@hjmjohnson
Copy link
Copy Markdown
Member

@thewtex I pushed many objects manually to ITKTestingData so that I could move forward on the remote module conversions. Would you mind uploading the cache of ITKTestingData blobs to the various resources that you have available for the purposes of replication?

@thewtex
Copy link
Copy Markdown
Member Author

thewtex commented May 5, 2026

@hjmjohnson I'll add a script that sync's the ITKTestingData Git repo to the content link manifest here, and I'll run the pinning script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Documentation Issues affecting the Documentation module area:Filtering Issues affecting the Filtering module area:Python wrapping Python bindings for a class type:Data Changes to testing data type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants