Skip to content

WIP: Add v4 remote-module ingestion (two-phase, per-commit hooks)#6204

Draft
hjmjohnson wants to merge 19 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:ingest-strategy-v4
Draft

WIP: Add v4 remote-module ingestion (two-phase, per-commit hooks)#6204
hjmjohnson wants to merge 19 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:ingest-strategy-v4

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

@hjmjohnson hjmjohnson commented May 4, 2026

v4 remote-module ingestion pipeline. Splits the v3 single-driver flow into two phases — Phase A is local + reversible; Phase B publishes upstream only after the Phase A PR has merged. Every historical commit is independently reformatted (clang-format / black / gersemi / trailing-ws / EOF / line-ending) and ghostflow-conformed (subject ≤ 78 chars, blank line after subject, exec-bit cleared on text), so each commit independently passes ITK's CI gates.

WIP — design feedback welcome. Validated on Cuberille (#6205), SubdivisionQuadEdgeMeshFilter (#6229, merged), AdaptiveDenoising (#6235), TextureFeatures (#6238), PrincipalComponentsAnalysis (#6240). Ghostflow now reports only the unavoidable upstream-root-commit error.

PROMPT: Use ~/src/ITK/Utilities/Maintenance/RemoteModuleIngest tooling to
ingest the remote module on the latest upstream/main branch for <<MODULE>>.
Two-phase rationale
Phase Script Local-clone fate Upstream side-effects
A — Ingestion ingest-module-v4.sh Throwaway None
B — Upstream archival archive-remote-module.sh Fresh clone Pushes removal commit + promotes MIGRATION_README.md to README.md + flips archived=true

Phases share one persistent artifact: the whitelist at Utilities/Maintenance/RemoteModuleIngest/whitelists/<Module>.list. Correctness property: upstream is never touched until ITK has accepted the migration.

Pre-commit + ghostflow coverage
Hook v4 coverage
clang-format (.c .cc .cxx .h .hxx) Content sniff
black (.py) Content sniff
gersemi (.cmake .wrap CMakeLists.txt) Content sniff
trailing-whitespace, end-of-file-fixer, mixed-line-ending Universal text fixer
Ghostflow rule v4 coverage
Subject ≤ 78 chars truncate_subject_to_body() — overflow moves to body, prefers word-break
Blank line after subject ensure_blank_line_after_subject()
No *.orig / *.rej / *.BACKUP.* / *.LOCAL.* / *.REMOTE.* / *.BASE.* Deny-pass
No exec-bit on text files commit_callback clears 100755 → 100644 for known text exts
No CI scaffolding under module subtree Deny-pass strips Dockerfile*, .github/*, azure-pipelines*, appveyor.yml, etc.
Root commit Unfixable — every upstream module has one
Heuristic commit-prefix mapping

When a historical commit subject lacks a recognized ITK prefix, sanitize-history.py auto-prepends one based on keywords in the subject. Decisions logged for operator audit.

Subject contains... Prefix
fix, bug, crash, segfault, leak, regression, corrupt, deadlock BUG:
cmake, compil, warning, build, ci, clang , gcc, msvc COMP:
doc, readme, comment, license, sphinx, doxygen DOC:
style, whitespace, format, rename, lint, clean up, reorder STYLE:
perf, optim, speed, faster, efficien PERF:
anything else ENH: (default)
Files added
  • Utilities/Maintenance/RemoteModuleIngest/INGESTION_STRATEGY_v4.md — design
  • Utilities/Maintenance/RemoteModuleIngest/ingest-module-v4.sh — Phase A driver
  • Utilities/Maintenance/RemoteModuleIngest/archive-remote-module.sh — Phase B driver
  • Utilities/Maintenance/RemoteModuleIngest/sanitize-history.pygit_filter_repo callback driver
  • Utilities/Maintenance/RemoteModuleIngest/whitelists/<Module>.list — per-module whitelist (one per consumer PR)

V3 scripts retained until v4 has produced two successful ingests.

Open review questions
  1. Local gersemi 0.24.0 vs ITK pre-commit-pinned 0.19.3 — post-merge gate currently catches the diff in a STYLE: follow-up. Acceptable, or verify version match before running?
  2. Subject truncation can fall mid-word when the first word is long (rare). Full subject preserved in body. Raise word-break minimum, or accept?
  3. check-shebang-scripts-are-executable deferred to post-merge gate. Add now?

@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots area:Python wrapping Python bindings for a class labels May 4, 2026
@@ -0,0 +1,25 @@
# IOMeshSTL — files that migrate into ITK at Modules/IO/MeshSTL/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there reason this list is kept per remote?

@@ -0,0 +1,539 @@
#!/usr/bin/env python3
"""sanitize-history.py — Apply ITK formatting and commit-prefix conventions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like "sanitize history" script already exists somewhere.

hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 4, 2026
Many ITK remote modules' itk-module.cmake reads
  file(READ "${MY_CURRENT_DIR}/README.rst" DOCUMENTATION)
to populate the CMake DESCRIPTION variable from the upstream README.

The v4 whitelist intentionally excludes the upstream README (the
operator ships a new README.md as part of the ingest PR), so without
this rewrite every commit's CMakeLists fails to configure post-ingest
until the operator manually patches itk-module.cmake.

sanitize-history.py now rewrites the file(READ ... README.rst ...)
reference to README.md as part of the cmake-blob transform, so every
historical commit on the rewritten branch is independently buildable.

Surfaced on the IOMeshSTL ingest (PR following PR InsightSoftwareConsortium#6204).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
….txt

In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Pre-emptive cleanup matching the
v4 ingest-pipeline rule codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
…mage

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  Pre-emptive cleanup
matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Matches the v4 ingest-pipeline
rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Matches the v4 ingest-pipeline
rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
…shMZ3

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  The 'set(ITK_DIR
${CMAKE_BINARY_DIR})' line that was inside the in-tree else branch
is preserved (now unconditional) since it actually had effect during
the in-tree build path.  Matches the v4 ingest-pipeline rules in
PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
@hjmjohnson hjmjohnson added this to the ITK 6.0 Release Candidate 1 milestone May 5, 2026
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
…shMZ3

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  The 'set(ITK_DIR
${CMAKE_BINARY_DIR})' line that was inside the in-tree else branch
is preserved (now unconditional) since it actually had effect during
the in-tree build path.  Matches the v4 ingest-pipeline rules in
PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson added a commit that referenced this pull request May 5, 2026
Four already-merged remote-module ingests still carry their
upstream-original boilerplate at the top of their module CMakeLists:

  - cmake_minimum_required(VERSION X.Y.Z) — redundant; ITK's top-level
    CMakeLists pins a higher minimum.
  - if(NOT ITK_SOURCE_DIR) ... find_package(ITK) ... else()
    itk_module_impl() endif() — dead code in-tree because
    ITK_SOURCE_DIR is always defined.

Strip both from:

  Modules/Filtering/AnisotropicDiffusionLBR
  Modules/Filtering/Cuberille
  Modules/Filtering/LabelErodeDilate
  Modules/Registration/Montage

Lines that had effect inside the in-tree else() branch
(set(ITK_DIR ${CMAKE_BINARY_DIR}) on AnisotropicDiffusionLBR and
Montage) are preserved unconditionally.

The other four merged ingests (GenericLabelInterpolator, MGHIO,
FastBilateral, MeshNoise) were already cleaned up during their
respective ingest PRs.

Matches the v4 ingest-pipeline rules now codified in PR #6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard); future ingests will not introduce
this idiom.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
…mage

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  Pre-emptive cleanup
matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
…mage

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  Pre-emptive cleanup
matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 5, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Matches the v4 ingest-pipeline
rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 6, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Matches the v4 ingest-pipeline
rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 6, 2026
Four already-merged remote-module ingests still carry their
upstream-original boilerplate at the top of their module CMakeLists:

  - cmake_minimum_required(VERSION X.Y.Z) — redundant; ITK's top-level
    CMakeLists pins a higher minimum.
  - if(NOT ITK_SOURCE_DIR) ... find_package(ITK) ... else()
    itk_module_impl() endif() — dead code in-tree because
    ITK_SOURCE_DIR is always defined.

Strip both from:

  Modules/Filtering/AnisotropicDiffusionLBR
  Modules/Filtering/Cuberille
  Modules/Filtering/LabelErodeDilate
  Modules/Registration/Montage

Lines that had effect inside the in-tree else() branch
(set(ITK_DIR ${CMAKE_BINARY_DIR}) on AnisotropicDiffusionLBR and
Montage) are preserved unconditionally.

The other four merged ingests (GenericLabelInterpolator, MGHIO,
FastBilateral, MeshNoise) were already cleaned up during their
respective ingest PRs.

Matches the v4 ingest-pipeline rules now codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard); future ingests will not introduce
this idiom.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 6, 2026
….txt

In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Pre-emptive cleanup matching the
v4 ingest-pipeline rule codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 6, 2026
…shMZ3

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  The 'set(ITK_DIR
${CMAKE_BINARY_DIR})' line that was inside the in-tree else branch
is preserved (now unconditional) since it actually had effect during
the in-tree build path.  Matches the v4 ingest-pipeline rules in
PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 6, 2026
…mage

In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version.  The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code.  Pre-emptive cleanup
matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 6, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the
per-module declaration is dead code.  Matches the v4 ingest-pipeline
rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 8, 2026
Phase A v4 ingest of the standalone remote module
ntustison/ITKAdaptiveDenoising into Modules/Filtering/AdaptiveDenoising/.
Whitelisted upstream history was sanitized via the v4 pipeline
(see InsightSoftwareConsortium#6204), then stripped of all raw
.nrrd test fixtures via 'git filter-repo --invert-paths' so that no
raw binary blobs land in ITK's git history.  Sibling .cid stubs are
added in a follow-up commit; their fetch path is published via
ITKTestingData PR InsightSoftwareConsortium#48.

Stacked on the pixi 0.68 multi-line cmd compatibility fix from InsightSoftwareConsortium#6236.
hjmjohnson added 15 commits May 8, 2026 11:47
Splits v3's single-driver flow into two phases.
Phase A (ingest-module-v4.sh): local + reversible; no upstream pushes.
Phase B (archive-remote-module.sh): only after Phase A merges; pushes
upstream removal commit + promotes MIGRATION_README.md to README.md
+ flips archived=true.

Each historical commit re-formatted by sanitize-history.py
(clang-format / black / gersemi / trailing-ws / EOF / line-ending)
and ghostflow-conformed (subject <=78, blank line, exec-bit cleared
on text). Persistent artifact: per-module whitelist.
Strips Dockerfile*, .github/*, azure-pipelines*, appveyor.yml,
circle.yml, .gitlab-ci.yml, .travis.yml from history so CI
scaffolding doesn't leak into the ingested subtree.
Validated on Cuberille (InsightSoftwareConsortium#6205): ghostflow now reports only the
unavoidable upstream root-commit error.
- truncate_subject_to_body() for subjects >78 chars
- ensure_blank_line_after_subject() inserts missing blank line
- commit_callback clears exec-bit on text-classified blobs
- deny-pass strips *.orig / *.rej / *.BACKUP.* / *.LOCAL.* /
  *.REMOTE.* / *.BASE.*
Many remote modules' itk-module.cmake reads README.rst via file(READ).
After ingest the .rst is stripped (README.md takes over per the
archival convention), so sanitize rewrites the file(READ) target to
README.md to keep itk-module.cmake configuring cleanly.
…terns

Documents the two recurring review patterns:
- ghostflow's single unfixable root-commit error per ingest
- Greptile draft-review request before promoting to ready-for-review
Strip ExternalData/ caches from history so cached binaries don't
leak through the whitelist boundary.
Many remote modules wrap their CMakeLists.txt in
'if(NOT ITK_SOURCE_DIR) ... else() ... endif()' so the standalone
build differs from the ITK-embedded build. After ingest the
standalone branch is dead code; sanitize unwraps the else-branch
(the ITK-embedded path) and drops the guard.
Adds a table separating what sanitize-history handles automatically
from what the operator must fix manually after Phase A merges.
After ingest the module's cmake_minimum_required is dead (top-level
ITK CMakeLists owns it). Sanitize strips it so multiple floors
don't compete.
…cmake

Replace 'file(READ ${...}/README.md DESCRIPTION)' in module
CMakeLists with a static set(DOCUMENTATION ...) so the module
configures cleanly even before the README is laid down.
@hjmjohnson hjmjohnson force-pushed the ingest-strategy-v4 branch from c7768a5 to d43cb90 Compare May 8, 2026 16:48
hjmjohnson added 2 commits May 8, 2026 13:24
Adds five fixups that reviewers have flagged on every recent ingest
(PRs InsightSoftwareConsortium#6238 TextureFeatures, InsightSoftwareConsortium#6240 PrincipalComponentsAnalysis):

  * itk_module_examples()   — examples/ is not ingested
  * cmake_policy(CMP...)    — ITK top-level pins all policies
  * cmake_dependent_option(Module_${X}_BUILD_DOCUMENTATION ...)
    and the matching if(...) add_subdirectory(doc) endif() block
  * per-module LICENSE / COPYING — ITK's root LICENSE applies in-tree
  * ${DOCUMENTATION} kept as-is when sourced from a literal
    set(DOCUMENTATION ...) block; only the file(READ README.md ...)
    form is replaced by a static one-liner.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 9, 2026
Replace the set(DOCUMENTATION "...") + DESCRIPTION "${DOCUMENTATION}"
indirection with a static one-liner literal.

itk_module() is a CMake macro, so ${ARGN} re-tokenizes its arguments
and list-splits any embedded ";". A future edit adding a semicolon or
"[" to the DOCUMENTATION string would silently produce spurious
"Unknown argument" AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111
on every configure (see PRs InsightSoftwareConsortium#6220, InsightSoftwareConsortium#6245).

The v4 ingestion pipeline (PR InsightSoftwareConsortium#6204) enforces this via
sanitize-history.py:patch_dynamic_description.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 9, 2026
Replace the set(DOCUMENTATION "...") + DESCRIPTION "${DOCUMENTATION}"
indirection with a static one-liner literal.

itk_module() is a CMake macro, so ${ARGN} re-tokenizes its arguments
and list-splits any embedded ";". A future edit adding a semicolon or
"[" to the DOCUMENTATION string would silently produce spurious
"Unknown argument" AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111
on every configure (see PRs InsightSoftwareConsortium#6220, InsightSoftwareConsortium#6245).

The v4 ingestion pipeline (PR InsightSoftwareConsortium#6204) enforces this via
sanitize-history.py:patch_dynamic_description.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 9, 2026
Replace the set(DOCUMENTATION "...") + DESCRIPTION "${DOCUMENTATION}"
indirection with a static one-liner literal.

itk_module() is a CMake macro, so ${ARGN} re-tokenizes its arguments
and list-splits any embedded ";". A future edit adding a semicolon or
"[" to the DOCUMENTATION string would silently produce spurious
"Unknown argument" AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111
on every configure (see PRs InsightSoftwareConsortium#6220, InsightSoftwareConsortium#6245).

The v4 ingestion pipeline (PR InsightSoftwareConsortium#6204) enforces this via
sanitize-history.py:patch_dynamic_description.
Two ghostflow-check-main violations slipped through v4 sanitization on
TextureFeatures ingest (PR InsightSoftwareConsortium#6238):

1. .sha512 / .cid content-link sidecars without trailing newline.
   is_skip_content() returned True for any single-line hex hash blob,
   bypassing apply_universal_text_fixers (and therefore fix_end_of_file).
   Hash content has zero risk from the universal fixers (rstrip + single
   '\n' append), so drop the hex-hash skip branch and let those blobs
   flow through the normal text-fixer path.

2. Editor backup files (*~) survived ingestion. Adds **/*~ to the
   filter-repo --invert-paths deny-pass alongside *.orig / *.rej /
   *.BACKUP.* / *.LOCAL.* / *.REMOTE.* / *.BASE.*.

Documented both in INGESTION_STRATEGY_v4.md's sanitizer table.
hjmjohnson added a commit that referenced this pull request May 9, 2026
Drop the file(READ README.md DOCUMENTATION) preamble and replace
DESCRIPTION "${DOCUMENTATION}" with a static one-liner.

itk_module() is a CMake macro, so ARGN re-tokenizes its arguments and
list-splits any embedded ';' characters. The PolarTransform README
contains 4 semicolons, producing 4 spurious "Unknown argument"
AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111 on every
configure.

Same canonical fix applied in PR #6220 to RLEImage, SplitComponents,
IOFDF, IOMeshMZ3, IOMeshSTL; PolarTransform was missed because it
landed in parallel. The v4 ingestion pipeline (PR #6204) prevents
this regression via sanitize-history.py:patch_dynamic_description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Python wrapping Python bindings for a class type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants