Skip to content

feat(health): add VelocityHealthCheck for K8s readiness probe (#35602)#35771

Open
dsolistorres wants to merge 2 commits into
mainfrom
issue-35602-velocity-health-check
Open

feat(health): add VelocityHealthCheck for K8s readiness probe (#35602)#35771
dsolistorres wants to merge 2 commits into
mainfrom
issue-35602-velocity-health-check

Conversation

@dsolistorres
Copy link
Copy Markdown
Member

Summary

Closes #35602 (follow-up implementation from spike #35329, companion to #35601 / #35768).

Adds a readiness-only health check that verifies the Velocity engine's global macro library is registered. When the library failed to load (see spike #35329), engine.init() can return successfully while every public page renders #renderMarks / #editContentlet as literal text. This check surfaces that state through /readyz so K8s can stop routing traffic to the affected pod.

Defense-in-depth alongside the fail-loud flag in #35601: that flag handles the common case (init throws → existing ServletContainerHealthCheck gates traffic via dotcms.started.up). This check covers the opt-out case (velocimacro.library.fail-on-missing=false) and any future post-init macro-registry corruption.

Changes

  • VelocityHealthCheck (new)com.dotcms.health.checks.cdi.VelocityHealthCheck extends HealthCheckBase. Evaluates the probe #renderMarks($null) through VelocityUtil.getEngine() and returns DOWN when the rendered output contains the literal #renderMarks( marker.
  • CoreHealthCheckProvider — register the new check alongside the existing CDI checks (database, cache, elasticsearch).
  • VelocityHealthCheckTest (new) — three unit tests covering the UP path, the literal-rendering DOWN path, and a propagated-exception DOWN path.

Design notes

  • isLivenessCheck() returns false — a one-time init failure should remove the pod from the load balancer, not trigger a restart loop.
  • isReadinessCheck() returns true.
  • getOrder() returns 110 — runs after database (default 100), cache (30), and elasticsearch (40).
  • Default mode is PRODUCTION (no monitor-mode downgrade) — when this is DOWN the system genuinely cannot render correctly.
  • Skips during shutdown.
  • The check is small and cheap — 0ms in practice on a healthy engine.

Test plan

  • VelocityHealthCheckTest — 3/3 pass:
    • UP when probe renders macro body output
    • DOWN when probe renders the literal directive (macro not registered)
    • DOWN when VelocityUtil.getEngine() throws
  • Local smoke test on a real dotCMS startup: just dev-run, then GET /dotmgt/health:
    {
      "name": "velocity",
      "status": "UP",
      "durationMs": 0,
      "message": "Velocity macro library registered (#renderMarks resolved) [0ms]",
      "monitorModeApplied": false
    }
    Confirmed alongside the other 10 health checks (database, cache, elasticsearch, etc.).

🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

Claude finished @dsolistorres's task in 3m 27s —— View job


🔍 dotCMS Backend Review — Complete

  • Load dotCMS conventions (CLAUDE.md + backend docs)
  • Get Java diff between base and head commits (e1f9d744...946a40f6)
  • Spawn 4 specialized sub-agents in parallel (Security, Database, Java Standards, REST API)
  • Consolidate findings and post review comment (review)
  • Submit formal GitHub review — skipped: this integration cannot submit formal PR approvals (FAQ)

Sub-agent results

Domain Result
Security NO_FINDINGS
Database NO_FINDINGS
Java Standards NO_FINDINGS
REST API 1 Low-severity observation (below Critical/High/Medium threshold — not surfaced)

The REST API sub-agent noted that the IllegalStateException message in VelocityHealthCheck.java:87-89 references public GitHub issue numbers (#35329 / #35601) which propagate to /readyz and /dotmgt/health responses. The issues are public in an open-source repo and serve a valid diagnostic purpose, so this was rated Low and falls below the reporting threshold. Mentioned here only for transparency — no action required.

Outcome

No Critical or High severity issues. The standalone backend review comment shows: ✅ no issues found.
· Branch: issue-35602-velocity-health-check

@dsolistorres dsolistorres self-assigned this May 20, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 20, 2026

dotCMS Backend Review: no issues found.

@dsolistorres
Copy link
Copy Markdown
Member Author

Heads up — the intermittent Postman failures in Template_Resource and AI collections (with language_id = -1 FK violation) are unrelated to this PR. Tracked separately in #35780, which includes the root-cause analysis and three suggested fixes (preferred: fail-fast 400 in WorkflowResource instead of silent -1L default).

Integration Tests - Junit5 Suite 1 passed in the same run (it needs dotCMS up and serving requests), so startup isn't impacted by the new VelocityHealthCheck. A re-run should clear the matrix.

@dsolistorres dsolistorres force-pushed the issue-35602-velocity-health-check branch from 1600fcb to 2503fb3 Compare May 21, 2026 16:31
dsolistorres added a commit that referenced this pull request May 21, 2026
…35602)

Addresses PR #35771 review feedback — a specific unchecked type
communicates intent more clearly than a raw checked Exception. The
surrounding measureExecution(...) machinery still catches it as Exception,
so behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dsolistorres added a commit that referenced this pull request May 21, 2026
…35602)

Addresses PR #35771 review feedback — a specific unchecked type
communicates intent more clearly than a raw checked Exception. The
surrounding measureExecution(...) machinery still catches it as Exception,
so behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dsolistorres dsolistorres force-pushed the issue-35602-velocity-health-check branch from 771e368 to c166ce0 Compare May 21, 2026 17:21
dsolistorres added a commit that referenced this pull request May 21, 2026
…35602)

Addresses PR #35771 review feedback — a specific unchecked type
communicates intent more clearly than a raw checked Exception. The
surrounding measureExecution(...) machinery still catches it as Exception,
so behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dsolistorres dsolistorres force-pushed the issue-35602-velocity-health-check branch from c166ce0 to dd5f22b Compare May 21, 2026 17:26
@dsolistorres dsolistorres force-pushed the issue-35602-velocity-health-check branch from dd5f22b to 02679cb Compare May 21, 2026 19:52
dsolistorres added a commit that referenced this pull request May 21, 2026
…35602)

Addresses PR #35771 review feedback — a specific unchecked type
communicates intent more clearly than a raw checked Exception. The
surrounding measureExecution(...) machinery still catches it as Exception,
so behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dsolistorres dsolistorres force-pushed the issue-35602-velocity-health-check branch from 02679cb to ebe1e38 Compare May 21, 2026 22:20
dsolistorres added a commit that referenced this pull request May 21, 2026
…35602)

Addresses PR #35771 review feedback — a specific unchecked type
communicates intent more clearly than a raw checked Exception. The
surrounding measureExecution(...) machinery still catches it as Exception,
so behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dsolistorres and others added 2 commits May 21, 2026 18:49
Adds a readiness-only health check that verifies the Velocity engine's global
macro library is registered by evaluating "#renderMarks(\$null)" through
VelocityUtil.getEngine() and detecting the literal directive marker in the
output. When the macro library failed to load (see spike #35329), the engine
reports init success while every public page renders macros as literal text;
this check surfaces that state through /readyz so K8s can stop routing traffic.

isLivenessCheck() returns false — a one-time init failure should remove the
pod from the load balancer, not trigger a restart loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…35602)

Addresses PR #35771 review feedback — a specific unchecked type
communicates intent more clearly than a raw checked Exception. The
surrounding measureExecution(...) machinery still catches it as Exception,
so behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dsolistorres dsolistorres force-pushed the issue-35602-velocity-health-check branch from ebe1e38 to 946a40f Compare May 22, 2026 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add VelocityHealthCheck for Kubernetes readiness probe (follow-up to spike #35329)

2 participants