DLPX-96312 Add InfluxDB/Telegraf infrastructure for Engine Performance Analytics#119
Open
DLPX-96312 Add InfluxDB/Telegraf infrastructure for Engine Performance Analytics#119
Conversation
bb1bd01 to
985a3ac
Compare
6286549 to
2a39e0c
Compare
bad0342 to
df102c9
Compare
34acba1 to
02cc5df
Compare
02cc5df to
7095d33
Compare
7095d33 to
0f08795
Compare
0f08795 to
53b635b
Compare
53b635b to
e5e0587
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Design Doc
Problem
Telegraf is already collecting engine performance metrics and writing them to local JSON files on the appliance. However, there is no local time-series database to store and serve these metrics, making it difficult for tools like DCT Smart Proxy to query historical performance data from the engine directly.
Solution
Add InfluxDB 2.x infrastructure to the appliance, mirroring the existing Telegraf setup pattern. This includes:
influxdb/influxdb.toml— InfluxDB daemon config: bound to127.0.0.1:8086, with bolt/engine paths matching the installed package (/var/lib/influxdb/). Named.toml(not.conf) because InfluxDB uses the Viper config library, which determines the file format from the extension —.confis not recognized and is silently ignored, causing influxd to fall back to defaults (~/.influxdbv2/).influxdb/influxdb-init.conf— Tunable init config (org, bucket, retention period, readiness wait parameters) sourced by the init script. Change values here without touching the script.influxdb/delphix-influxdb-init— One-time init script that:/etc/influxdb/influxdb_metaalready exists (safe on upgrades and reboots)./healthendpoint./api/v2/setupto create the org, bucket, and admin credentials (one-shot; usescurldirectly, noinfluxCLI dependency)./api/v2/setupso a re-run resumes without repeating the one-shot call and without mismatching the stored password.[[outputs.influxdb_v2]]stanza (with the write token, chmod 640) to/etc/telegraf/telegraf.outputs.influxdband touches/etc/telegraf/INFLUXDB_ENABLEDto enable it by default./etc/influxdb/influxdb_meta(chmod 600) containing:INFLUXDB_ORG,INFLUXDB_BUCKET,INFLUXDB_ADMIN_USER,INFLUXDB_ADMIN_PASSWORD,INFLUXDB_WRITE_TOKEN,INFLUXDB_READ_TOKEN.influxdb/delphix-influxdb-service— Wrapper that startsinfluxdwithINFLUXD_CONFIG_PATH=/etc/influxdb/influxdb.tomlin the background, runs the init script, then waits on the daemon PID.influxdb/delphix-influxdb.service— Systemd unit following the same structure asdelphix-telegraf.service(PartOf=delphix.target,Restart=on-failure, runs as root).influxdb/perf_influxdb— Toggle script (mirrorsperf_playbook) to enable/disable InfluxDB metric output from Telegraf without stopping InfluxDB itself. Manages the/etc/telegraf/INFLUXDB_ENABLEDflag and restarts Telegraf.telegraf/delphix-telegraf-service— Updated to conditionally appendtelegraf.outputs.influxdbto the assembledtelegraf.confwhen theINFLUXDB_ENABLEDflag exists (same pattern asPLAYBOOK_ENABLED).telegraf/telegraf.base— Replaced the old commented-out v1 InfluxDB stanza with a comment pointing toperf_influxdb enable|disable.debian/rules— Installs all influxdb files: scripts to/usr/bin/, systemd unit to/lib/systemd/system/, configs to/etc/influxdb/.debian/control— Addedinfluxdb2andcurltoDepends(see notes below).All influxdb files are installed to
/etc/influxdb/, mirroring the/etc/telegraf/convention used for Telegraf.Notes to Reviewers
Runtime dependency decisions (
debian/control)When someone runs
apt install performance-diagnostics, APT checks each package listed inDepends:So listing a package that is already present on the appliance is always safe — it is simply a no-op.
The init script (
delphix-influxdb-init) relies oncurl,openssl, andpython3at runtime. Here is why onlycurlis explicitly added toDepends:openssldelphix-platform(debian/rules) — guaranteed on every Delphix appliance.python3python3-minimalin our existingDepends.curldelphix-platform'sBuild-Depends(build-time only, not a guaranteed runtime dep) — so explicitly declared here to be safe.Why
influxdb.tomlinstead ofinfluxdb.confInfluxDB 2.x uses Viper for config parsing, which determines the file format from the extension. Only
.json,.toml,.yaml, and.ymlare recognized —.confis silently ignored and influxd falls back to defaults (~/.influxdbv2/for root). Verified on InfluxDB v2.8.0:INFLUXD_CONFIG_PATH=influxdb.conf→ paths/settings ignored;INFLUXD_CONFIG_PATH=influxdb.toml→ config fully respected.Testing Done
perf_influxdb enable/disable testing
INFLUXDB_ENABLEDflag exists on fresh boottelegraf.outputs.influxdbexists with correct perms (-rw-r-----)influxdb_v2output on bootperf_influxdb disableremoves flag andtelegraf.confhas noinfluxdb_v2stanzafile (4x)only (no influxdb output)perf_influxdb enablerecreates flag andtelegraf.confhas stanza with real tokeninfluxdb_v2outputmust be run as root)