Skip to content

[voltdb] Switch to native VoltDB Python client#23667

Open
akhanzode wants to merge 17 commits into
DataDog:masterfrom
akhanzode:voltdb-native-client
Open

[voltdb] Switch to native VoltDB Python client#23667
akhanzode wants to merge 17 commits into
DataDog:masterfrom
akhanzode:voltdb-native-client

Conversation

@akhanzode
Copy link
Copy Markdown

@akhanzode akhanzode commented May 11, 2026

What does this PR do?

Adds the native VoltDB Python client (binary protocol on the VoltDB client port, default 21212) as a new transport for the integration, alongside the existing HTTP/JSON transport that talks to the VoltDB Management Center (VMC).

The transport is selected by which option is set in voltdb.d/conf.yaml:

  • url: → HTTP/JSON via VMC (existing behavior, kept verbatim)
  • host: / hosts: → native binary client (new)

For multi-node clusters, the new hosts: option takes a list of hostname or hostname:port strings; the Agent connects to the first reachable entry and silently fails over to the others if a node becomes unavailable:

instances:
  - hosts:
      - voltdb-1.example:21212
      - voltdb-2.example:21212
      - voltdb-3.example:21212
    username: datadog-agent
    password: "<PASSWORD>"

Statistics columns are now resolved by name against the VoltDB response metadata, so the check tolerates VoltDB releases that add or drop columns to @Statistics outputs (verified against 14.2 and 15.3 locally).

Not a breaking change

Existing url: configurations continue to work without any modifications. This PR is additive:

  • Every option the previous releases supported (url, username, password, password_hashed, tls_cert, tls_ca_cert, tls_verify, tls_private_key, tls_ciphers, proxy, skip_proxy, headers, extra_headers, connect_timeout, read_timeout, timeout, request_size, etc.) is preserved via the same instances/http template the integration used before.
  • A user with an existing url: http://vmc.example:8080 config (or pointing at the database HTTP port) does not need to change anything. The check picks the HTTP transport and emits the same metrics it always has against the same endpoint.
  • The Datadog Agent's user account, role assignment, and password rotation policy on the VoltDB side are unchanged — both transports authenticate against the same <user> entries in deployment.xml.

The changelog entry is added (not changed/removed), so this is a minor version bump (6.4.1 → 6.5.0).

Motivation

Submitted on behalf of Volt Active Data. The native Python client is the supported integration surface for direct database-node connections and exposes the same @Statistics data over a stable binary protocol. Adding it as an option (without removing HTTP/VMC) lets operators choose the right transport for their topology: native for direct connections to nodes, HTTP via VMC when only the management interface is reachable.

Verified live (VoltDB 14.2 + 15.3, single-node)

Mode Setup Result
Native, single host host: localhost 44 metric families / 184 series, service check OK
Native, multi-host failover hosts: [dead.example:21212, localhost:21212] First entry skipped with a warning, localhost picks up, 44 families / 232 series
HTTP via VMC url: http://localhost:8080 44 metric families / 208 series, service check OK

Plus 35 unit tests covering: URL→HTTP-mode dispatch, host→native-mode dispatch, hosts-list expansion, hosts: precedence over host:, port-parse validation, failover behavior in the native client, end-to-end HTTP request shape (via requests.Session.get mock), defensive @Statistics column-by-name lookup, procedure_timeout default of 60s.

Notes for reviewers

  • The voltdb/tests/compose/docker-compose.yaml matrix file changes its published port from 8080 (HTTP) to 21212 (native) so the integration tests can exercise the new transport. The existing TLS/non-TLS variants in hatch.toml are kept.
  • voltdb/datadog_checks/voltdb/config_models/instance.py is autogenerated from assets/configuration/spec.yaml; run ddev validate config -s voltdb --sync and ddev validate models -s voltdb --sync to regenerate if you tweak the spec.
  • The password_hashed config-side affordance only takes effect in HTTP mode (the native voltdbclient library always hashes the cleartext password client-side, so a pre-hashed value can't be wired through that auth handshake). README and conf example call this out.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Migrate from the deprecated HTTP/JSON interface to the native voltdbclient
Python package, which speaks the binary protocol on the VoltDB client port.

Instance configuration now takes 'host' and 'port' (default 21212) instead of
'url'. TLS is configured via 'use_ssl' and 'ssl_config_file'. The legacy 'url'
option is still accepted with a deprecation warning so existing deployments
keep working unchanged: the host is parsed from the URL and the default
native client port is used.

Statistics columns are now resolved by name against the VoltDB response
metadata so the check tolerates VoltDB releases that add or drop columns
to @statistics outputs.
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Major version bump
The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6abbd0a571

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread voltdb/datadog_checks/voltdb/config.py Outdated
@akhanzode
Copy link
Copy Markdown
Author

Confirming the major version bump is intentional. This PR swaps the wire protocol from HTTP/JSON to VoltDB's native binary protocol, which means:

  • Operators must open port 21212 to the Agent host (instead of 8080/8443).
  • The password_hashed option is no longer supported (the native client always hashes cleartext client-side; use the Agent secrets backend to keep credentials off disk).
  • TLS configuration moves from PEM-only options (tls_cert, tls_ca_cert, tls_verify, etc.) to a single ssl_config_file pointing at a VoltDB SSL properties file (supports JKS, PKCS12, and PEM).
  • HTTP-specific options (proxy, headers, auth_type, request_size, Kerberos/NTLM/AWS auth, etc.) are removed because they don't apply to the native protocol.

A changed classification (→ 7.0.0) is correct so operators read the upgrade notes before deploying. The legacy url: config option is still accepted with a deprecation warning so the most common deployment style (url: http://host:8080) keeps working transparently.

Anish Khanzode and others added 5 commits May 11, 2026 12:51
Addresses Codex review feedback: the previous code path passed
procedure_timeout=None to FastSerializer when the option was omitted,
which means 'wait indefinitely'. The old HTTP integration had a 10s
default timeout, so leaving the native client with no default is a
regression that could block a check run forever on a hung procedure.

Default to 60s (matching the example we ship in conf.yaml.example).
Setting procedure_timeout to 0 or any non-positive number restores the
'wait indefinitely' behavior for users who explicitly want it.
The CI 'Lint' step failed because ruff lint/isort is invoked from inside
the integration directory with --config ../pyproject.toml, where the
local package (datadog_checks.voltdb) is treated as first-party and
needs its own import block. Running ruff from the repo root (as I did
locally) didn't catch this. Apply the isort fix that ruff --fix produces
under that working directory.

Also regenerated LICENSE-3rdparty.csv via 'ddev validate licenses --sync'
to add voltdbclient's MIT license entry, which CI flagged as missing.
The other line changes in that file are ddev's current copyright-parser
output for existing entries (no upstream changes), kept to satisfy the
validator.
The previous commits replaced the HTTP/JSON transport with the native
binary client. As feedback noted, some operators connect to VoltDB
through the VoltDB Management Center (VMC) rather than directly to
database nodes — those deployments need the HTTP transport.

Make the transport choice config-driven instead of removing one of
them: setting 'url' selects the HTTP client (talks to VMC), and
setting 'host' selects the native binary client. Everything else
(auth, statistics components, custom queries, tags) is shared.

This restores full backwards compatibility for existing 'url'-based
configs along with all their HTTP-only options (password_hashed,
proxy, tls_cert / tls_ca_cert / tls_verify, headers, etc.) via the
instances/http template, and reclassifies the changelog entry from
'changed' to 'added' since nothing is being removed.

Common response shape: HttpClient wraps the JSON response so the
check code reads response.tables[i].columns[j].name / .tuples on
both paths, with no mode-specific branching in _execute_query_raw.

New unit test 'test_http_mode_end_to_end' patches requests.Session.get
and walks the HTTP code path against a fixture; existing native
tests stay green. 27 unit tests pass; live native mode against
VoltDB 14.2 still emits 44 metric families / 184 series cleanly.
Lets the Agent connect to whichever VoltDB cluster member is reachable
instead of pinning to a single host. New `hosts` instance option takes
a list of `hostname` or `hostname:port` strings; the native Client
tries them in order on each (re)connect and surfaces the last error
only when every endpoint refuses.

Backwards compatible: single-`host:` configs keep working unchanged
(they expand to a one-entry endpoint list). `hosts` takes precedence
when both are set so users can opt into failover with a single add.

Tested live against a local VoltDB 15.3 cluster:
- `host: localhost` -> 44 metric families, 184 series (unchanged).
- `hosts: [dead.example:21212, localhost:21212]` -> dead endpoint is
  skipped with a warning, real cluster picks up, active_endpoint
  correctly reflects 'localhost:21212'.

Also tested live HTTP/VMC mode against the local VMC at port 8080:
44 metric families, 208 series, service check OK. Confirms the HTTP
client wraps VMC's JSON response into the same shape `_execute_query_raw`
expects and the unified code path works for both transports.
@akhanzode
Copy link
Copy Markdown
Author

Updating my earlier confirmation: this is no longer a major version bump.

After feedback that the HTTP transport needed to stay available for users connecting through VoltDB Management Center (VMC), I restored the HTTP client as a parallel transport rather than replacing it. Both transports now coexist, with url: selecting HTTP and host:/hosts: selecting the native client. Every config option from previous releases is preserved.

Because nothing is removed and the wire-level behavior of existing url:-based configurations is unchanged, the changelog fragment is now voltdb/changelog.d/23667.added (not .changed). Next release will be a minor version bump (6.4.1 → 6.5.0), and operators on url:-based configs do not need to update anything.

Summary of what's preserved verbatim:

  • url, username, password, password_hashed
  • tls_cert, tls_ca_cert, tls_verify, tls_private_key, tls_ciphers, tls_protocols_allowed, tls_use_host_header, tls_ignore_warning
  • proxy, skip_proxy, headers, extra_headers
  • connect_timeout, read_timeout, timeout, request_size
  • log_requests, persist_connections, allow_redirects, auth_type, use_legacy_auth_encoding

Summary of what's added (new options, all optional):

  • host / hosts (native binary client, with multi-node failover via hosts:)
  • use_ssl, ssl_config_file (native-client TLS via VoltDB SSL properties file)
  • procedure_timeout (per-procedure timeout for the native client; default 60s)

Live-tested against a local VoltDB 15.3 cluster: 44 metric families with url: http://localhost:8080, 44 metric families with host: localhost, and the multi-host failover correctly skips an unreachable first endpoint and connects to the next one. The Major version bump warning the bot posted earlier no longer applies.

Anish Khanzode added 3 commits May 11, 2026 14:17
CI runs ruff 0.11.10 under --config ../pyproject.toml from inside the
integration directory. That older ruff is stricter about the isort
boundary between third-party and first-party imports than the 0.15+
I had installed locally, so what passed on my machine still failed
on the runner with 7 I001 errors across the tests/ tree.

Pinned my local ruff to 0.11.10 to match CI exactly, ran
`ruff check --fix --config ../pyproject.toml .`, and confirmed the
resulting layout is what CI expects. 35 unit tests still pass.

Also picks up the license-header fix on the new http_client.py
(2020-present -> 2026-present, what ddev validate license-headers
--fix expects for a newly added file) and the latest sync of
LICENSE-3rdparty.csv.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 95.47739% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.04%. Comparing base (89441e6) to head (67d70b7).

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Anish Khanzode added 2 commits May 11, 2026 14:42
voltdbclient supports pointing ssl_config_file at a PEM truststore
without needing a Java keystore properties file. The compose fixtures
already ship ca.pem; just reuse it for both host-side integration
tests (tests/common.py:TLS_CONFIG_FILE) and the agent-container e2e
path (tests/conftest.py:dd_environment). Fixes the FileNotFoundError
for client_ssl.properties on the with-tls matrix variants.
Codecov flagged 53 lines without coverage on the patch. Most were in
client.py's call_procedure retry-once path, type inference, and
close() error swallowing, plus HttpClient's parameter encoding.

New tests added (8):
- test_client_requires_at_least_one_endpoint
- test_client_call_procedure_returns_response
- test_client_call_procedure_retries_once_on_stale_connection
- test_client_raise_for_status (success + VoltDBError path)
- test_client_close_is_idempotent (no-conn + open-then-close)
- test_infer_volt_type_distinguishes_bool_int_float_string
- test_http_client_serializes_list_params_as_json
- test_http_client_raise_for_status

Local unit-test coverage:
  client.py      53% -> 91%  (+38pp)
  http_client.py 89% -> 94%  (+5pp)
  total          88% -> 94%  (+6pp)

All 43 unit tests pass; ruff check and ruff format --diff --check
both green under CI's pinned ruff 0.11.10.
@akhanzode
Copy link
Copy Markdown
Author

Confirming the major version bump is intentional. This PR swaps the wire protocol from HTTP/JSON to VoltDB's native binary protocol, which means:

  • Operators must open port 21212 to the Agent host (instead of 8080/8443).
  • The password_hashed option is no longer supported (the native client always hashes cleartext client-side; use the Agent secrets backend to keep credentials off disk).
  • TLS configuration moves from PEM-only options (tls_cert, tls_ca_cert, tls_verify, etc.) to a single ssl_config_file pointing at a VoltDB SSL properties file (supports JKS, PKCS12, and PEM).
  • HTTP-specific options (proxy, headers, auth_type, request_size, Kerberos/NTLM/AWS auth, etc.) are removed because they don't apply to the native protocol.

A changed classification (→ 7.0.0) is correct so operators read the upgrade notes before deploying. The legacy url: config option is still accepted with a deprecation warning so the most common deployment style (url: http://host:8080) keeps working transparently.

Not a swap old protocol (http(s)) works as before allows native client to be used for those who disable vmc or is removed from voltdb servers in newer versions.

@akhanzode
Copy link
Copy Markdown
Author

⚠️ Major version bump The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Using added

@akhanzode
Copy link
Copy Markdown
Author

Thanks for the approval, @brett0000FF! 🙏

Two remaining checks need someone with write access to the repo to finish — I tried both as the PR author from my fork and got 403 Forbidden:

  • dependency-wheel-promotion — needs ddev dep promote https://github.com/DataDog/integrations-core/pull/23667 to copy the new voltdbclient wheel from the dev to stable GCS prefix.
  • devflow/mergegate — Datadog-internal Atlas workflow; presumably resolves once dependency-wheel-promotion succeeds.
  • validate-assets — pending against the datadog-assets GitHub App.

When you have a minute, would you mind running the ddev dep promote step (or pointing me to who can)? Happy to drive any final tweaks if anything else surfaces.

Status snapshot:

  • 48 GH checks pass, 0 fail.
  • Codecov patch coverage: 95.48% (570 / 18 miss / 9 partial).
  • ci_passed: True per codecov API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants