Skip to content

Add TableConfigValidator and InstanceConfigValidator SPI for batch restart enforcement#18167

Open
suvodeep-pyne wants to merge 2 commits intoapache:masterfrom
suvodeep-pyne:spyne/data-707-batch-restart-validation-spi
Open

Add TableConfigValidator and InstanceConfigValidator SPI for batch restart enforcement#18167
suvodeep-pyne wants to merge 2 commits intoapache:masterfrom
suvodeep-pyne:spyne/data-707-batch-restart-validation-spi

Conversation

@suvodeep-pyne
Copy link
Copy Markdown
Contributor

Summary

  • Adds TableConfigValidator and InstanceConfigValidator SPI interfaces in pinot-spi with list-based registries, enabling pre-mutation validation of table and instance configs
  • Adds InstanceUtils.toInstance(InstanceConfig) reverse converter for the updateInstanceTags validation path
  • Wires validator call sites into PinotTableRestletResource, PinotHelixResourceManager, PinotInstanceRestletResource, and TableConfigsRestletResource
  • Introduces ConfigValidationException (extends RuntimeException) mapped to HTTP 400 at the REST boundary

Test plan

  • InstanceUtilsTest.testToInstance() — roundtrip fidelity for all 4 instance types + unknown type error
  • InstanceConfigValidatorRegistryTest — 5 tests: no-op when empty, rejection propagates, short-circuit, ordering, reset
  • TableConfigValidatorRegistryTest — 5 tests: same coverage as above
  • PinotHelixResourceManagerConfigValidationTest — 4 mocked tests: addInstance/updateInstance/updateInstanceTags rejection before persistence, passing validator allows persistence

…estart enforcement

Introduces validation SPI interfaces in pinot-spi for intercepting table and
instance config mutations before persistence. This enables StarTree to enforce
batch restart invariants (pool tags, replica groups) at mutation time rather
than only at restart time.

Changes:
- TableConfigValidator/InstanceConfigValidator interfaces in pinot-spi
- List-based registries (CopyOnWriteArrayList, first-rejection short-circuits)
- ConfigValidationException for rejection signaling (maps to HTTP 400)
- InstanceUtils.toInstance() reverse converter (InstanceConfig -> Instance)
- Validator call sites in PinotTableRestletResource, PinotHelixResourceManager,
  PinotInstanceRestletResource, TableConfigsRestletResource
- Unit tests for registries, toInstance() roundtrip, and resource manager
  validation wiring with mocked Helix dependencies
…onUtils

Consolidate SPI validation into TableConfigValidationUtils.validateTableConfig()
so every caller gets it automatically. This also adds the missing validation
to the copyTable endpoint, which previously had a TODO placeholder.
@xiangfu0 xiangfu0 added the enhancement Improvement to existing functionality label Apr 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SPI hooks to validate table and instance configuration mutations before they’re persisted, enabling enforcement of batch restart / mutation rules and returning validation failures as client errors.

Changes:

  • Introduces TableConfigValidator / InstanceConfigValidator SPIs and in-memory registries in pinot-spi.
  • Adds ConfigValidationException for validator rejections and wires validator invocations into controller mutation paths.
  • Adds InstanceUtils.toInstance(InstanceConfig) to reconstruct Instance objects for validation (notably for updateInstanceTags) and adds unit tests for the new behavior.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/TableConfigValidator.java New SPI interface for table config mutation validation.
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/TableConfigValidatorRegistry.java Registry for table validators; invoked before persistence.
pinot-spi/src/main/java/org/apache/pinot/spi/config/instance/InstanceConfigValidator.java New SPI interface for instance mutation validation.
pinot-spi/src/main/java/org/apache/pinot/spi/config/instance/InstanceConfigValidatorRegistry.java Registry for instance validators; invoked before persistence.
pinot-spi/src/main/java/org/apache/pinot/spi/exception/ConfigValidationException.java Exception type used by validators to reject mutations.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java Calls instance validators for add/update/updateTags prior to Helix persistence.
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/TableConfigValidationUtils.java Calls table validators as part of existing table config validation flow.
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/TableConfigsRestletResource.java Calls table validators when validating composite TableConfigs.
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotTableRestletResource.java Adds table config validation on the copy-table flow.
pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java Maps ConfigValidationException to HTTP 400 for instance mutation endpoints.
pinot-common/src/main/java/org/apache/pinot/common/utils/config/InstanceUtils.java Adds toInstance(InstanceConfig) reverse conversion for validation usage.
pinot-common/src/test/java/org/apache/pinot/common/utils/config/InstanceUtilsTest.java Tests round-trip fidelity and unknown-prefix handling for toInstance.
pinot-spi/src/test/java/org/apache/pinot/spi/config/table/TableConfigValidatorRegistryTest.java Tests validator registry semantics (ordering, short-circuit, reset).
pinot-spi/src/test/java/org/apache/pinot/spi/config/instance/InstanceConfigValidatorRegistryTest.java Tests validator registry semantics (ordering, short-circuit, reset).
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManagerConfigValidationTest.java Verifies validators reject before persistence and allow persistence when passing.

Comment on lines +363 to +364
TableConfigValidationUtils.validateTableConfig(
realtimeTableConfig, schema, null, _pinotHelixResourceManager, _controllerConf, _pinotTaskManager);
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyTable() now calls TableConfigValidationUtils.validateTableConfig(...), which can throw ConfigValidationException for a user-supplied config. The method currently catches the exception under the generic catch (Exception) and returns HTTP 500, which contradicts the intended “mapped to 400 at the REST boundary” behavior and will look like a server error to clients. Handle ConfigValidationException (and/or wrap it) as a BAD_REQUEST response here, consistent with other table mutation endpoints.

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +191
// Read host/port from ZNRecord simple fields (source of truth, updated by updateHelixInstanceConfig)
String host = instanceConfig.getHostName();
int port = Integer.parseInt(instanceConfig.getPort());

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toInstance() reads host directly from instanceConfig.getHostName(). InstanceUtils.getServerAdminEndpoint() already documents/handles a legacy case where the hostname can be stored as Server_<hostname>. If such legacy configs exist, toInstance() will reconstruct an Instance with an incorrect host value, which can cause validators (and any downstream logic) to behave incorrectly. Consider applying the same backward-compatible stripping when type == SERVER (and potentially other types if applicable), or otherwise normalizing the host field before constructing the Instance.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 86.15385% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.13%. Comparing base (397704e) to head (e6cf8b3).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...er/api/resources/PinotInstanceRestletResource.java 0.00% 6 Missing ⚠️
...pinot/spi/exception/ConfigValidationException.java 50.00% 2 Missing ⚠️
...oller/api/resources/PinotTableRestletResource.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18167   +/-   ##
=========================================
  Coverage     63.13%   63.13%           
  Complexity     1616     1616           
=========================================
  Files          3213     3216    +3     
  Lines        195730   195795   +65     
  Branches      30240    30246    +6     
=========================================
+ Hits         123569   123621   +52     
- Misses        62288    62298   +10     
- Partials       9873     9876    +3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.10% <86.15%> (-0.01%) ⬇️
java-21 63.09% <86.15%> (+0.01%) ⬆️
temurin 63.13% <86.15%> (+<0.01%) ⬆️
unittests 63.13% <86.15%> (+<0.01%) ⬆️
unittests1 55.36% <96.15%> (+<0.01%) ⬆️
unittests2 34.78% <72.30%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-signal issues inline.

}
}
TaskConfigUtils.validateTaskConfigs(tableConfigs.getOffline(), schema, _pinotTaskManager, typesToSkip);
TableConfigValidatorRegistry.validate(offlineTableConfig, schema);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validator still runs inside validateConfig(), but addConfig()/updateConfig() call tuneConfig() only afterwards. That means the SPI is evaluating a different TableConfig than the one that is actually persisted, unlike the /tables endpoints where tuning happens before validation. A validator can therefore reject or allow the same mutation depending on which API the caller uses. If this SPI is meant to protect the stored config, it needs to run after tuning or in PinotHelixResourceManager.

}

// Read host/port from ZNRecord simple fields (source of truth, updated by updateHelixInstanceConfig)
String host = instanceConfig.getHostName();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinot still carries legacy server InstanceConfigs whose hostName is stored as Server_ (the codebase already has compatibility handling for that form). toInstance() returns that prefixed hostname verbatim, so validators reached from updateInstanceTags() will observe host=Server_localhost and any recomputed Helix id becomes Server_Server_localhost_. On upgraded clusters this breaks tag-update validation for older server instances unless the hostname is normalized here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants