Skip to content

Introduce a generic reconciler abstraction to reduce controller boilerplate #257

@felix-kaestner

Description

@felix-kaestner

Summary

All 23+ controllers in this project share a substantial amount of identical boilerplate in their Reconcile() methods. This makes the codebase harder to maintain: any cross-cutting change (e.g. introducing pausing, adding the device locking mechanism, changing how finalizers are handled, adjusting how provider configs are fetched) must be replicated manually across every single controller. This issue proposes introducing a generic reconciler abstraction that encapsulates all common setup, leaving each controller to implement only its domain-specific Reconcile/Finalize logic, and migrating all existing controllers to use it.

Problem

Every controller currently follows the same boilerplate structure in its Reconcile() method:

  1. Fetch the resource (r.Get)
  2. Assert the provider implements the required interface
  3. Fetch the referenced Device object
  4. Check the pause annotation/flag (annotations.IsPaused)
  5. Acquire the per-device lock (r.Locker.AcquireLock) and defer its release
  6. Fetch the device connection (deviceutil.GetDeviceConnection)
  7. Optionally fetch the ProviderConfig
  8. Build a scope object containing device, connection, resource, provider, and config
  9. Handle deletion: check finalizer, call finalize(), remove finalizer
  10. Add finalizer if missing
  11. Initialize conditions if absent
  12. Defer metadata/status patch
  13. Call the domain-specific reconcile() method

This is ~120–150 lines of near-identical code per controller. Past examples of the pain this causes:

  • When the pausing mechanism was introduced, every controller needed the same check added.
  • When the resource locker was introduced for mutual-exclusion on device operations, it had to be wired into every controller independently.
  • When condition initialization semantics changed, all controllers needed updating.

Adding a new controller today means copy-pasting this boilerplate and hoping nothing is missed or subtly different.

Proposed Solution

Introduce a Reconciler in a shared internal/controller package that encodes the full common reconcile flow. Individual controllers implement a small two-method interface:

type Reconciler[O Object, P provider.Provider] interface {
    Reconcile(context.Context, *TypedScope[O, P]) error
    Finalize(context.Context, *TypedScope[O, P]) error
}

The generic reconciler handles all of the common setup steps (1–12 above) and calls Reconcile/Finalize at the appropriate point. A controller is wired in via:

func AsReconciler[T Object, P provider.Provider](
    c client.Client,
    p provider.ProviderFunc,
    locker *resourcelock.ResourceLocker,
    lockName string,
    rec Reconciler[T, P],
) reconcile.Reconciler

The TypedScope struct passed to both methods carries all pre-resolved context: the device, the device connection, the resource itself, the provider instance (already type-asserted), and the optional provider config.

type TypedScope[T client.Object, P provider.Provider] struct {
    Device         *v1alpha1.Device
    Connection     *deviceutil.Connection
    Resource       T
    Provider       P
    ProviderConfig *provider.ProviderConfig
}

The Object interface constrains the generic type parameter to the types this project knows how to handle:

type Object interface {
    client.Object
    metav1.ObjectMetaAccessor
    conditions.Setter

    GetDeviceRef() v1alpha1.LocalObjectReference
    GetProviderConfigRef() *v1alpha1.TypedLocalObjectReference
    GetStatus() any
    InitializeConditions() bool
}

Each API type implements InitializeConditions() bool to initialize its own set of conditions, returning true if any were changed, allowing the generic reconciler to handle the early-return status update without needing to know which condition types each resource uses.

Reference implementation sketch
// SPDX-FileCopyrightText: 2025 SAP SE or an SAP affiliate company and IronCore contributors
// SPDX-License-Identifier: Apache-2.0

package controller

import (
    "context"
    "errors"
    "reflect"
    "time"

    "k8s.io/apimachinery/pkg/api/equality"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    "k8s.io/apimachinery/pkg/api/meta"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    kerrors "k8s.io/apimachinery/pkg/util/errors"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"

    "github.com/ironcore-dev/network-operator/api/v1alpha1"
    "github.com/ironcore-dev/network-operator/internal/annotations"
    "github.com/ironcore-dev/network-operator/internal/conditions"
    "github.com/ironcore-dev/network-operator/internal/deviceutil"
    "github.com/ironcore-dev/network-operator/internal/provider"
    "github.com/ironcore-dev/network-operator/internal/resourcelock"
)

type Object interface {
    client.Object
    metav1.ObjectMetaAccessor
    conditions.Setter

    GetDeviceRef() v1alpha1.LocalObjectReference
    GetProviderConfigRef() *v1alpha1.TypedLocalObjectReference
    GetStatus() any
    InitializeConditions() bool
}

// Reconciler is a specialized reconciler that acts on instances of [Object].
// Depending on whether the object is being created/updated or deleted, either Reconcile
// or Finalize will be called.
type Reconciler[O Object, P provider.Provider] interface {
    Reconcile(context.Context, *TypedScope[O, P]) error
    Finalize(context.Context, *TypedScope[O, P]) error
}

type reconciler[O Object, P provider.Provider] struct {
    client.Client

    // Provider is the driver that will be used to create & delete the resource.
    Provider provider.ProviderFunc

    // Locker is used to synchronize operations on resources targeting the same device.
    Locker *resourcelock.ResourceLocker

    // LockName is the name used when acquiring/releasing the device lock.
    LockName string

    // Reconciler is the actual reconciler that will be called by this generic reconciler.
    Reconciler Reconciler[O, P]
}

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
//
// For more details about the method shape, read up here:
// - https://ahmet.im/blog/controller-pitfalls/#reconcile-method-shape
func (r *reconciler[O, P]) Reconcile(ctx context.Context, req ctrl.Request) (_ ctrl.Result, reterr error) {
    log := ctrl.LoggerFrom(ctx)
    log.Info("Reconciling resource")

    obj := reflect.New(reflect.TypeOf(*new(O)).Elem()).Interface().(O)
    if err := r.Get(ctx, req.NamespacedName, obj); err != nil {
        if apierrors.IsNotFound(err) {
            log.Info("Resource not found. Ignoring since object must be deleted")
            return ctrl.Result{}, nil
        }
        log.Error(err, "Failed to get resource")
        return ctrl.Result{}, err
    }

    prov, ok := r.Provider().(P)
    if !ok {
        cond := obj.GetConditions()
        if meta.SetStatusCondition(&cond, metav1.Condition{
            Type:    v1alpha1.ReadyCondition,
            Status:  metav1.ConditionFalse,
            Reason:  v1alpha1.NotImplementedReason,
            Message: "Provider does not implement provider.Provider",
        }) {
            obj.SetConditions(cond)
            return ctrl.Result{}, r.Status().Update(ctx, obj)
        }
        return ctrl.Result{}, nil
    }

    device, err := deviceutil.GetDeviceByName(ctx, r, obj.GetNamespace(), obj.GetDeviceRef().Name)
    if err != nil {
        log.Error(err, "Failed to get device")
        return ctrl.Result{}, err
    }

    if annotations.IsPaused(device, obj) {
        log.Info("Reconciliation is paused for this object")
        return ctrl.Result{}, nil
    }

    if err := r.Locker.AcquireLock(ctx, device.Name, r.LockName); err != nil {
        if errors.Is(err, resourcelock.ErrLockAlreadyHeld) {
            log.Info("Device is already locked, requeuing reconciliation")
            return ctrl.Result{RequeueAfter: time.Second * 5}, nil
        }
        log.Error(err, "Failed to acquire device lock")
        return ctrl.Result{}, err
    }
    defer func() {
        if err := r.Locker.ReleaseLock(ctx, device.Name, r.LockName); err != nil {
            log.Error(err, "Failed to release device lock")
            reterr = kerrors.NewAggregate([]error{reterr, err})
        }
    }()

    if !obj.GetDeletionTimestamp().IsZero() {
        if controllerutil.ContainsFinalizer(obj, v1alpha1.FinalizerName) {
            s, err := r.getScope(ctx, obj, device, prov)
            if err != nil {
                log.Error(err, "Failed to get scope for resource")
                return ctrl.Result{}, err
            }
            if err := r.Reconciler.Finalize(ctx, s); err != nil {
                log.Error(err, "Failed to finalize resource")
                return ctrl.Result{}, err
            }
            controllerutil.RemoveFinalizer(obj, v1alpha1.FinalizerName)
            if err := r.Update(ctx, obj); err != nil {
                log.Error(err, "Failed to remove finalizer from resource")
                return ctrl.Result{}, err
            }
        }
        log.Info("Resource is being deleted, skipping reconciliation")
        return ctrl.Result{}, nil
    }

    if !controllerutil.ContainsFinalizer(obj, v1alpha1.FinalizerName) {
        controllerutil.AddFinalizer(obj, v1alpha1.FinalizerName)
        if err := r.Update(ctx, obj); err != nil {
            log.Error(err, "Failed to add finalizer to resource")
            return ctrl.Result{}, err
        }
        log.Info("Added finalizer to resource")
        return ctrl.Result{}, nil
    }

    orig := obj.DeepCopyObject().(O)
    if obj.InitializeConditions() {
        log.Info("Initializing status conditions")
        return ctrl.Result{}, r.Status().Update(ctx, obj)
    }

    // Always attempt to update the metadata/status after reconciliation
    defer func() {
        if !equality.Semantic.DeepEqual(orig.GetObjectMeta(), obj.GetObjectMeta()) {
            if err := r.Patch(ctx, obj, client.MergeFrom(orig)); err != nil {
                log.Error(err, "Failed to update resource metadata")
                reterr = kerrors.NewAggregate([]error{reterr, err})
            }
            return
        }

        if !equality.Semantic.DeepEqual(orig.GetStatus(), obj.GetStatus()) {
            if err := r.Status().Patch(ctx, obj, client.MergeFrom(orig)); err != nil {
                log.Error(err, "Failed to update status")
                reterr = kerrors.NewAggregate([]error{reterr, err})
            }
        }
    }()

    s, err := r.getScope(ctx, obj, device, prov)
    if err != nil {
        log.Error(err, "Failed to get scope for resource")
        return ctrl.Result{}, err
    }

    if err := r.Reconciler.Reconcile(ctx, s); err != nil {
        log.Error(err, "Failed to reconcile resource")
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}

func (r *reconciler[O, P]) getScope(ctx context.Context, obj O, device *v1alpha1.Device, prov P) (*TypedScope[O, P], error) {
    conn, err := deviceutil.GetDeviceConnection(ctx, r, device)
    if err != nil {
        return nil, err
    }

    var cfg *provider.ProviderConfig
    if ref := obj.GetProviderConfigRef(); ref != nil {
        cfg, err = provider.GetProviderConfig(ctx, r, obj.GetNamespace(), ref)
        if err != nil {
            return nil, err
        }
    }

    return &TypedScope[O, P]{
        Device:         device,
        Connection:     conn,
        Resource:       obj,
        Provider:       prov,
        ProviderConfig: cfg,
    }, nil
}

// AsReconciler creates a [reconcile.Reconciler] based on the given [Reconciler].
func AsReconciler[T Object, P provider.Provider](c client.Client, p provider.ProviderFunc, locker *resourcelock.ResourceLocker, lockName string, rec Reconciler[T, P]) reconcile.Reconciler {
    return &reconciler[T, P]{
        Client:     c,
        Provider:   p,
        Locker:     locker,
        LockName:   lockName,
        Reconciler: rec,
    }
}

// TypedScope holds the different objects that are read and used during the reconcile.
type TypedScope[T client.Object, P provider.Provider] struct {
    Device         *v1alpha1.Device
    Connection     *deviceutil.Connection
    Resource       T
    Provider       P
    ProviderConfig *provider.ProviderConfig
}

Acceptance Criteria

  • The Reconciler interface and its unexported reconciler implementation are added to internal/controller/ (or a sub-package).
  • The Object interface is defined, and all existing network resource types satisfy it (via code generation or explicit method implementations on the API types).
  • AsReconciler constructor wires a typed Reconciler[O, P] into the generic machinery.
  • All 23 existing controllers are migrated to use the generic reconciler, with no behavior change.
  • All existing unit/integration tests continue to pass after the migration.
  • A brief note is added to the contributing guide (or controller README) explaining how to write a new controller using the generic reconciler.

References

Metadata

Metadata

Labels

area/switch-automationAutomation processes for network switch management and operations.enhancementNew feature or request
No fields configured for Feature.

Projects

Status
No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions