Skip to content

switch external-snapshotter to downstream VSC-only branch#32

Open
kneumoin wants to merge 41 commits into
mainfrom
63742164
Open

switch external-snapshotter to downstream VSC-only branch#32
kneumoin wants to merge 41 commits into
mainfrom
63742164

Conversation

@kneumoin
Copy link
Copy Markdown

Description

Why do we need it, and what problem does it solve?

What is the expected result?

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

AleksZimin and others added 30 commits December 15, 2025 15:42
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>
- Read StorageClass via APIReader only
- Remove local Condition/Reason constants
- Enforce UID barrier for OwnerReferences
- Unify terminal Ready semantics
Replace all occurrences of fox.flant.com/deckhouse/storage/storage-foundation
with github.com/deckhouse/storage-foundation in go.mod files and imports.

- Update module paths in api/go.mod and images/controller/go.mod
- Update all import statements across the codebase
- Fix missing imports in test files
- Remove unused getVolumeMode, getSize, checkAndHandleTTL functions
- Fix TTL scanner runnable logic in AddVolumeRestoreRequestControllerToManager
- Improve test reliability: strict terminal state checks, proper Status subresource usage
- Update ensureObjectKeeper comment for UID handling
…sioner.

This change allows external-provisioner to react to VolumeRestoreRequest objects and perform a restore operation by creating a CSI volume and corresponding Kubernetes PersistentVolume and PersistentVolumeClaim.

Key changes:
	•	Introduce a VRR handler wired via a dynamic informer
	•	Restore volumes from:
	•	VolumeSnapshotContent using CSI VolumeContentSource_Snapshot
	•	PersistentVolume using CSI clone (VolumeContentSource_Volume)
	•	Enforce strict driver filtering: VRR is processed only when StorageClass.provisioner matches the current provisioner
	•	Extract and reuse PV creation logic from existing provisioner code paths
	•	Ensure idempotent behavior:
	•	Handle cases where PV already exists from a previous attempt
	•	Safely continue to PVC creation even after partial execution
	•	Tolerate concurrent PV creation races
	•	Correctly propagate CSI semantics:
	•	Volume capabilities, access modes, volume ing for MULTI_NODE_READER_ONLY
	•	Topology information to PV node affinity
	•	Emit Kubernetes Events for success and failure
	•	Do not update VRR status or manage restore lifecycle

The external-provisioner performs best-effort execution only; retry policy, lifecycle management, and terminal state handling are delegated to higher-level controllers.
Transform VRRHandler from stateless executor to operation controller that
owns the restore lifecycle. This aligns with ADR v2 where external-provisioner
is responsible for restore execution, terminal decisions, and VRR.status updates.

Key changes:
- Add waitForPVCBound() to poll PVC status until Bound or timeout
- Add updateVRRStatus() with retry-on-conflict for robust status updates
- Add updateVRRReady() and updateVRRFailed() helpers
- Add isTerminalVRR() check to skip already-terminal VRRs
- Fix execution order: driver filter MUST be first executable step after
  minimal validation (security invariant for multi-CSI clusters)
- Update RBAC to grant write permissions for volumerestorerequests/status
- Add terminal VRR check after driver filter to avoid logging about
  VRRs from other drivers
- Add idempotency check (PVC Get) after driver filter to prevent accessing
  resources from other drivers

Testing:
- Add fake dynamic client support for VRR status updates in tests
- Add PVC reactor to automatically set Phase=Bound when VolumeName is set
- Add addVRRToDynamicClient() helper for test setup
- All existing tests pass with new controller behavior

Documentation:
- Update vrr-restore-implementation-plan.md to reflect operation controller role
- Remove "single-writer contract" and "stateless executor" references
- Clarify retry semantics and terminal state ownership
- Document polling rationale for waitForPVCBound

Vendor:
- Add storage-foundation/api module to vendor for -mod=vendor compatibility

This completes Stage 6 (PVC Bound waiting and status updates) from the
implementation plan.
Signed-off-by: Pavel Karpov <pavel.karpov@flant.com>
Signed-off-by: Pavel Karpov <pavel.karpov@flant.com>
krpsh123 and others added 8 commits January 20, 2026 14:23
Signed-off-by: Pavel Karpov <pavel.karpov@flant.com>
Document targets[] and status.dataRefs[], aggregate Ready semantics, one VCR per logical SnapshotContent, and the contract that unblocks state-snapshotter PR-4.
Loop spec.targets with one VSC per target, incremental status.dataRefs[],
aggregate Ready (Completed / TargetsPending / whole-VCR failure), and
retainer-vcr-{vcrUID} ObjectKeeper naming. Detach stays single-target.
ObjectKeeper UID guard before VSC create, merge dataRefs inside status
patch retries, sha256-based VSC names, remove markFailedSnapshot wrapper,
and two-artifact TTL cleanup tests.
@kneumoin kneumoin self-assigned this May 22, 2026
@kneumoin kneumoin force-pushed the 63742164 branch 5 times, most recently from fa96c0c to a11be14 Compare June 1, 2026 07:51
Ship the VolumeRestoreRequest executor into the csi-provisioner binary
via 002-vrr-executor.patch (applied on upstream external-provisioner
v6.2.0). The executor performs the restore (CSI CreateVolume -> PV/PVC ->
events) and never writes VRR status, which stays owned by the
storage-foundation VRR controller.
Document the temporary vendor-only API bootstrap and the follow-up
needed because werf currently runs rm -rf vendor before go mod download.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants