Skip to content

CSPL-4006: Fix SSL replication port deleted on first indexer cluster join#905

Open
gabrielm-splunk wants to merge 1 commit into
splunk:developfrom
gabrielm-splunk:fix/cspl-4006-ssl-replication-port
Open

CSPL-4006: Fix SSL replication port deleted on first indexer cluster join#905
gabrielm-splunk wants to merge 1 commit into
splunk:developfrom
gabrielm-splunk:fix/cspl-4006-ssl-replication-port

Conversation

@gabrielm-splunk
Copy link
Copy Markdown
Contributor

Problem

Customers configuring SSL replication port via splunk.conf.server.content (e.g. replication_port-ssl://9887) find the stanza deleted after the indexer joins the cluster on first startup. The SSL config only persists after a second pod restart, requiring manual intervention.

Customer Configuration

splunk:
  conf:
    server:
      content:
        'replication_port-ssl://9887':
          serverCert: /mnt/peers-splunk-cert/tls.crt
          sslVersions: tls1.2
          disabled: false

Root Cause

edit cluster-config -replication_port always writes a [replication_port://PORT] (non-SSL) stanza to server.conf, overwriting the customer's [replication_port-ssl://PORT] stanza.

PR #903 attempted to fix this by omitting -replication_port for SSL deployments, but Splunk's REST API requires the -replication_port parameter regardless of SSL mode. This caused the command to fail with parameter=replication_port not present and loop indefinitely — the indexer never joined the cluster.

Fix

  1. Always pass -replication_port so edit cluster-config succeeds (REST API requirement)
  2. Re-apply the full server.conf stanza block immediately after cluster join to restore the [replication_port-ssl://PORT] entry

SSL mode is detected via either:

  • Explicit splunk.idxc.replication_ssl: true flag, or
  • Auto-detection of replication_port-ssl:// keys in splunk.conf.server.content

This means customers with only the conf-based SSL config (no extra flag) are handled automatically.

Files Changed

  • roles/splunk_indexer/tasks/indexer_clustering.yml — single-site cluster join
  • roles/splunk_indexer/tasks/setup_multisite.yml — multi-site cluster join

Verification

Tested on Splunk Enterprise 10.4.0 on EKS (k8s 1.34):

Before fix: parameter=replication_port not present error on every retry — indexer never joins cluster.

After fix: Both stanzas present in server.conf on first startup, no manual restart needed:

[replication_port://9887]        ← written by edit cluster-config (required)
[replication_port-ssl://9887]    ← restored by re-apply task
disabled = False

Backward Compatibility

  • Non-SSL deployments: unchanged (re-apply task is skipped when use_ssl_replication is false)
  • SSL deployments: fixed — SSL stanza now persists on first startup

Jira: https://splunk.atlassian.net/browse/CSPL-4006

…join

Splunk's `edit cluster-config` REST API requires `-replication_port` even
when SSL replication is configured. The previous fix (PR splunk#903) omitted the
flag for SSL deployments, causing the command to fail with
"parameter=replication_port not present" and loop indefinitely.

The correct approach:
- Always pass `-replication_port` so the CLI command succeeds
- Immediately re-apply the full server.conf stanza block after cluster join
  to restore the `[replication_port-ssl://PORT]` entry that the CLI overwrites

SSL detection uses both explicit `splunk.idxc.replication_ssl: true` flag
and auto-detection of `replication_port-ssl://` keys in splunk.conf.server.content,
so customers don't need to add an extra flag if they only configure SSL via conf.

Affects both single-site (indexer_clustering.yml) and multi-site
(setup_multisite.yml) cluster join paths.

Verified on Splunk Enterprise 10.4.0 on EKS: SSL stanza persists on first
startup without manual pod restart.

Jira: https://splunk.atlassian.net/browse/CSPL-4006

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gabrielm-splunk gabrielm-splunk requested a review from a team as a code owner May 28, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant