Add config-driven readiness check that fails deployment when an OSGi plugin does not start

## Description

dotCMS starts OSGi (Felix) bundles asynchronously during boot. `InitServlet` submits `OSGIUtil.initializeFramework()` to a background thread (when `START_CLIENT_OSGI_IN_SEPARATE_THREAD=true`, the default), and Felix file-install then resolves and starts each `.jar` in the deploy folder in parallel. Because of this, **`InitServlet` completing does not imply that plugins reached `ACTIVE`** — a regressed plugin in a new image can produce a pod that appears "up" while the plugin is silently stuck in `INSTALLED` or `RESOLVED`.

In a rolling deployment (Kubernetes, etc.), this is a cutover hazard: the new pod passes readiness, traffic shifts to it, and a critical plugin is dark.

We need a configuration switch that prevents a new deployment from going live if an installed OSGi plugin has not successfully started.

## Acceptance Criteria

- [ ] New readiness health check (`osgi-bundles`) that walks the OSGi framework and requires every non-system, non-fragment bundle to be `Bundle.ACTIVE`
- [ ] Configurable grace period (default 5 minutes) that begins when `OSGIUtil.initializeFramework()` completes — during the grace window the check reports `UP` even if bundles are still starting
- [ ] After the grace window expires, any bundle not `ACTIVE` returns `DOWN` with the failing bundle's symbolic name and state in the message
- [ ] Fragment bundles excluded (they legitimately never reach `ACTIVE`)
- [ ] Optional `required.bundles` allow-list (CSV of symbolic names) for cases where only specific plugins are deployment-critical
- [ ] Registered as a **readiness** check only — never liveness — so a broken plugin blocks the rollout but does not restart-loop the pod
- [ ] Honors existing `HealthCheckMode` convention: `PRODUCTION` / `MONITOR_MODE` / `DISABLED`
- [ ] Structured health data exposes `bundlesByState`, `notActiveBundles`, `elapsedSinceInitMs`, `gracePeriodMs` for ops visibility
- [ ] Unit test coverage for the four key transitions: pre-init, within-grace, post-grace-with-failure, post-grace-all-active, plus the fragment-exclusion case and required-bundles missing case

## Configuration

All keys are settable as `DOT_`-prefixed environment variables — dots and hyphens both map to underscores via `Config.envKey()`.

| Property key | Env var | Default | Purpose |
|---|---|---|---|
| `health.check.osgi-bundles.mode` | `DOT_HEALTH_CHECK_OSGI_BUNDLES_MODE` | `PRODUCTION` | Standard health-check safety mode (`PRODUCTION` / `MONITOR_MODE` / `DISABLED`) |
| `health.check.osgi-bundles.grace.period.ms` | `DOT_HEALTH_CHECK_OSGI_BUNDLES_GRACE_PERIOD_MS` | `300000` | Time window after OSGi init in which bundles may still be starting |
| `health.check.osgi-bundles.required.bundles` | `DOT_HEALTH_CHECK_OSGI_BUNDLES_REQUIRED_BUNDLES` | _(empty)_ | Optional CSV of symbolic names; when empty, every non-system, non-fragment bundle is required |

`MONITOR_MODE` is the recommended initial rollout path: the check logs and reports the failure for one or two releases without actually blocking pods, then ops flips to `PRODUCTION` once confident.

## Why readiness, not hard fail

`System.exit` on init-thread failure is a worse contract than failing readiness:

- Readiness failure: Kubernetes/the LB simply never sends traffic to the new pod, the old (working) pod keeps serving, and rollout halts on its own. No request loss.
- Hard fail: forces a CrashLoopBackOff state, can trigger restart-storm alerting, and is harder to recover from cleanly.

Both achieve "do not cut over." Readiness is strictly better.

## Out of scope

- Detecting bundles that are `ACTIVE` but whose internal services are broken (a deeper plugin-specific contract — not enforceable from outside the bundle).

## Priority

Medium

Property key	Env var	Default	Purpose
`health.check.osgi-bundles.mode`	`DOT_HEALTH_CHECK_OSGI_BUNDLES_MODE`	`PRODUCTION`	Standard health-check safety mode (`PRODUCTION` / `MONITOR_MODE` / `DISABLED`)
`health.check.osgi-bundles.grace.period.ms`	`DOT_HEALTH_CHECK_OSGI_BUNDLES_GRACE_PERIOD_MS`	`300000`	Time window after OSGi init in which bundles may still be starting
`health.check.osgi-bundles.required.bundles`	`DOT_HEALTH_CHECK_OSGI_BUNDLES_REQUIRED_BUNDLES`	(empty)	Optional CSV of symbolic names; when empty, every non-system, non-fragment bundle is required

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config-driven readiness check that fails deployment when an OSGi plugin does not start #35806

Description

Acceptance Criteria

Configuration

Why readiness, not hard fail

Out of scope

Priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add config-driven readiness check that fails deployment when an OSGi plugin does not start #35806

Description

Description

Acceptance Criteria

Configuration

Why readiness, not hard fail

Out of scope

Priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions