Skip to content

DLPX-96312 Add InfluxDB/Telegraf infrastructure for Engine Performance Analytics#119

Open
dbshah12 wants to merge 4 commits intodevelopfrom
dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb
Open

DLPX-96312 Add InfluxDB/Telegraf infrastructure for Engine Performance Analytics#119
dbshah12 wants to merge 4 commits intodevelopfrom
dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb

Conversation

@dbshah12
Copy link
Copy Markdown

@dbshah12 dbshah12 commented Mar 31, 2026

Design Doc

Problem

Telegraf is already collecting engine performance metrics and writing them to local JSON files on the appliance. However, there is no local time-series database to store and serve these metrics, making it difficult for tools like DCT Smart Proxy to query historical performance data from the engine directly.

Solution

Add InfluxDB 2.x infrastructure to the appliance, mirroring the existing Telegraf setup pattern. This includes:

  • influxdb/influxdb.toml — InfluxDB daemon config: bound to 127.0.0.1:8086, with bolt/engine paths matching the installed package (/var/lib/influxdb/). Named .toml (not .conf) because InfluxDB uses the Viper config library, which determines the file format from the extension — .conf is not recognized and is silently ignored, causing influxd to fall back to defaults (~/.influxdbv2/).
  • influxdb/influxdb-init.conf — Tunable init config (org, bucket, retention period, readiness wait parameters) sourced by the init script. Change values here without touching the script.
  • influxdb/delphix-influxdb-init — One-time init script that:
    • Exits immediately if /etc/influxdb/influxdb_meta already exists (safe on upgrades and reboots).
    • Waits for InfluxDB to be ready via the /health endpoint.
    • Calls /api/v2/setup to create the org, bucket, and admin credentials (one-shot; uses curl directly, no influx CLI dependency).
    • Is crash-safe: persists a setup state file (including the generated admin password) immediately after /api/v2/setup so a re-run resumes without repeating the one-shot call and without mismatching the stored password.
    • Creates a write-only token for Telegraf and a read-only token for DCT Smart Proxy.
    • Writes the [[outputs.influxdb_v2]] stanza (with the write token, chmod 640) to /etc/telegraf/telegraf.outputs.influxdb and touches /etc/telegraf/INFLUXDB_ENABLED to enable it by default.
    • Atomically writes /etc/influxdb/influxdb_meta (chmod 600) containing: INFLUXDB_ORG, INFLUXDB_BUCKET, INFLUXDB_ADMIN_USER, INFLUXDB_ADMIN_PASSWORD, INFLUXDB_WRITE_TOKEN, INFLUXDB_READ_TOKEN.
  • influxdb/delphix-influxdb-service — Wrapper that starts influxd with INFLUXD_CONFIG_PATH=/etc/influxdb/influxdb.toml in the background, runs the init script, then waits on the daemon PID.
  • influxdb/delphix-influxdb.service — Systemd unit following the same structure as delphix-telegraf.service (PartOf=delphix.target, Restart=on-failure, runs as root).
  • influxdb/perf_influxdb — Toggle script (mirrors perf_playbook) to enable/disable InfluxDB metric output from Telegraf without stopping InfluxDB itself. Manages the /etc/telegraf/INFLUXDB_ENABLED flag and restarts Telegraf.
  • telegraf/delphix-telegraf-service — Updated to conditionally append telegraf.outputs.influxdb to the assembled telegraf.conf when the INFLUXDB_ENABLED flag exists (same pattern as PLAYBOOK_ENABLED).
  • telegraf/telegraf.base — Replaced the old commented-out v1 InfluxDB stanza with a comment pointing to perf_influxdb enable|disable.
  • debian/rules — Installs all influxdb files: scripts to /usr/bin/, systemd unit to /lib/systemd/system/, configs to /etc/influxdb/.
  • debian/control — Added influxdb2 and curl to Depends (see notes below).

All influxdb files are installed to /etc/influxdb/, mirroring the /etc/telegraf/ convention used for Telegraf.

Notes to Reviewers

Runtime dependency decisions (debian/control)

When someone runs apt install performance-diagnostics, APT checks each package listed in Depends:

  • If already installed → skip (no reinstall, no harm).
  • If not installed → automatically download and install it.

So listing a package that is already present on the appliance is always safe — it is simply a no-op.

The init script (delphix-influxdb-init) relies on curl, openssl, and python3 at runtime. Here is why only curl is explicitly added to Depends:

Dependency Decision Reason
openssl Not added Already a runtime dependency of delphix-platform (debian/rules) — guaranteed on every Delphix appliance.
python3 Not added Already present via python3-minimal in our existing Depends.
curl Added Only in delphix-platform's Build-Depends (build-time only, not a guaranteed runtime dep) — so explicitly declared here to be safe.

Why influxdb.toml instead of influxdb.conf

InfluxDB 2.x uses Viper for config parsing, which determines the file format from the extension. Only .json, .toml, .yaml, and .yml are recognized — .conf is silently ignored and influxd falls back to defaults (~/.influxdbv2/ for root). Verified on InfluxDB v2.8.0: INFLUXD_CONFIG_PATH=influxdb.conf → paths/settings ignored; INFLUXD_CONFIG_PATH=influxdb.toml → config fully respected.

Testing Done

/etc/influxdb# ls -l
total 4
-rw-r--r-- 1 root root  86 Mar 31 09:56 config.toml
-rw-r--r-- 1 root root 357 Mar 31 09:19 influxdb-init.conf
-rw-r--r-- 1 root root 274 Mar 31 09:19 influxdb.toml
-rw------- 1 root root 347 Mar 31 12:24 influxdb_meta

/etc/influxdb# ls -l /var/lib/influxdb
total 23
drwxr-x--- 5 influxdb influxdb      5 Mar 31 12:46 engine
-rw------- 1 influxdb influxdb  65536 Mar 31 12:46 influxd.bolt
-rw-r----- 1 influxdb influxdb      4 Mar 31 12:22 influxd.pid
-rw-r----- 1 influxdb influxdb 122880 Mar 31 12:23 influxd.sqlite
  • InfluxDB setup is also completed, and I can see data there in the UI:
Screenshot 2026-03-31 at 6 27 22 PM

perf_influxdb enable/disable testing

Test Result
INFLUXDB_ENABLED flag exists on fresh boot
telegraf.outputs.influxdb exists with correct perms (-rw-r-----)
Telegraf loaded influxdb_v2 output on boot
perf_influxdb disable removes flag and telegraf.conf has no influxdb_v2 stanza
After disable, Telegraf loaded file (4x) only (no influxdb output)
perf_influxdb enable recreates flag and telegraf.conf has stanza with real token
After enable, Telegraf loaded influxdb_v2 output
Non-root user blocked with clear error (must be run as root)
No errors in journalctl

@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch 2 times, most recently from bb1bd01 to 985a3ac Compare March 31, 2026 08:49
@dbshah12 dbshah12 requested a review from Copilot March 31, 2026 08:53

This comment was marked as resolved.

@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch from 6286549 to 2a39e0c Compare March 31, 2026 09:16
@dbshah12 dbshah12 marked this pull request as ready for review March 31, 2026 10:57
@dbshah12 dbshah12 marked this pull request as draft March 31, 2026 11:01
@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch 2 times, most recently from bad0342 to df102c9 Compare March 31, 2026 13:03
@dbshah12 dbshah12 marked this pull request as ready for review March 31, 2026 13:07
@dbshah12 dbshah12 requested a review from sebroy March 31, 2026 13:07
@dbshah12 dbshah12 self-assigned this Mar 31, 2026
@dbshah12 dbshah12 requested a review from Copilot April 1, 2026 05:37

This comment was marked as resolved.

This comment was marked as spam.

@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch 3 times, most recently from 34acba1 to 02cc5df Compare April 1, 2026 06:28
@dbshah12 dbshah12 requested a review from Copilot April 1, 2026 06:29

This comment was marked as spam.

@delphix delphix deleted a comment from Copilot AI Apr 1, 2026
@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch from 02cc5df to 7095d33 Compare April 1, 2026 06:43

This comment was marked as resolved.

@delphix delphix deleted a comment from Copilot AI Apr 2, 2026
@delphix delphix deleted a comment from Copilot AI Apr 2, 2026
@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch from 0f08795 to 53b635b Compare April 7, 2026 12:00
@dbshah12 dbshah12 force-pushed the dlpx/pr/dbshah12/5d79e679-49b6-4c0a-8241-1c919bfcaedb branch from 53b635b to e5e0587 Compare April 8, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants