Security Hardening Guide

This guide covers security hardening for Countly Helm deployments in regulated environments (healthcare, financial services, government).

Network Isolation

Enable Network Policies

All five charts ship with networkPolicy.enabled: false by default. Enable them in production:

# In each chart's environment file
networkPolicy:
  enabled: true
  allowedNamespaces:
    - countly
    - kafka
    - clickhouse
    - mongodb
    - observability
  monitoring:
    namespace: observability

Network policies restrict pod-to-pod communication to only the namespaces that need it. Without them, any pod in the cluster can reach your databases.

Ingress

Use TLS in production (global.tls: letsencrypt or global.tls: provided)
The none (HTTP) profile should only be used for local development
For internal-only deployments, consider selfSigned with your own CA

Encryption

In Transit

Path	Default	Hardened
Client to Ingress	Depends on `global.tls`	`letsencrypt` or `provided`
Ingress to Countly pods	HTTP (in-cluster)	Enable NGINX backend TLS if required
Countly to MongoDB	Plaintext	Set `mongodb.tls.enabled: true` in mongodb.yaml
Countly to ClickHouse	Plaintext	Configure ClickHouse TLS via operator settings
Countly to Kafka	Plaintext	Configure Strimzi TLS listeners
Observability collectors	HTTP	Configure mTLS on Alloy endpoints

At Rest

Storage encryption depends on your Kubernetes cluster's StorageClass:

AWS EKS: Use gp3 StorageClass with EBS encryption enabled (default in most configurations)
GKE: Uses Google-managed encryption by default; enable CMEK for customer-managed keys
Azure AKS: Uses Azure Disk Encryption by default; enable SSE with customer-managed keys
Self-managed: Configure your CSI driver to use LUKS or dm-crypt

Set global.storageClass to an encryption-enabled StorageClass:

global:
  storageClass: encrypted-gp3

Secret Management

For regulated environments, avoid storing secrets as plain values:

Method	Compliance Level	Setup
`values` (default)	Development only	Secrets in gitignored YAML files
`existingSecret`	Acceptable	Pre-create K8s Secrets via your secrets pipeline
`externalSecret`	Recommended	External Secrets Operator + AWS Secrets Manager / Vault / GCP Secret Manager

See SECRET-MANAGEMENT.md for setup instructions.

Secret Rotation

Change secrets.rotationId to a new value to trigger secret rotation on the next deploy. This recreates all secrets without changing passwords (the lookup-or-create pattern preserves existing values).

To rotate actual passwords:

Update passwords in your secret source (Vault, AWS SM, etc.)
Bump secrets.rotationId
Run helmfile apply
Restart affected pods

Pod Security

Security Contexts

The observability chart's alloy-otlp deployment runs with restricted security contexts:

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: [ALL]
  seccompProfile:
    type: RuntimeDefault

The alloy DaemonSet (log collector) requires elevated privileges (SYS_PTRACE, root) to read container logs from host paths. This is expected. If log collection is not needed, disable it by setting global.observability: disabled or global.observability: external.

Pod Disruption Budgets

The production sizing profile enables PDBs for:

All Countly components (api, frontend, ingestor, aggregator)
ClickHouse server and keeper
MongoDB replica set

Verify PDBs are active: kubectl get pdb --all-namespaces

Anti-Affinity

The production profile uses preferred anti-affinity by default. For stricter guarantees (e.g., pods MUST be on separate nodes), override in your environment:

# environments/my-deployment/countly.yaml
api:
  scheduling:
    antiAffinity:
      type: required

Audit and Observability

Application Audit Trail

Countly maintains internal audit logs. Ensure the aggregator and API components have sufficient resources to avoid dropped events.

Infrastructure Observability

Use global.observability: full to deploy the complete monitoring stack. Key dashboards:

Overview: Cluster health, pod status, resource utilization
Platform: Node metrics, network I/O, disk pressure
Data: ClickHouse query performance, Kafka consumer lag
Countly: Application-level metrics, request rates, error rates

Log Retention

Configure retention periods based on your compliance requirements:

# In observability.yaml
prometheus:
  retention:
    time: "90d"      # Metrics retention
loki:
  retention: "90d"   # Log retention
tempo:
  retention: "336h"  # Trace retention (Go duration format, no 'd')

Backup and Disaster Recovery

What to Back Up

Component	Data	Method
MongoDB	Application data, user accounts	`mongodump` or volume snapshots
ClickHouse	Analytics/drill data	ClickHouse backup tool or volume snapshots
Kafka	Event stream (transient)	Usually not backed up; replay from source
Helm releases	Release state	`helm get all` or GitOps (helmfile in git)
Secrets	Credentials	External secret store (Vault, AWS SM)

Volume Snapshots

If your StorageClass supports VolumeSnapshot:

# Create snapshot of MongoDB PVC
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongodb-backup-$(date +%Y%m%d)
  namespace: mongodb
spec:
  source:
    persistentVolumeClaimName: data-volume-countly-mongodb-0
EOF

Recovery Procedure

Restore PVCs from snapshots or backups
Deploy with helmfile apply (same environment config)
Verify data integrity with ./scripts/smoke-test.sh

Upgrade Safety

Pre-Upgrade Checklist

Back up all databases (MongoDB, ClickHouse)
Review CHANGELOG.md for breaking changes
Test upgrade in a staging environment first
Ensure PDBs are healthy: kubectl get pdb --all-namespaces
Verify sufficient cluster capacity for rolling updates

Upgrade Command

helmfile -e my-deployment apply

Helmfile handles dependency ordering. Each chart waits for health checks before proceeding to the next.

Rollback

helm rollback <release-name> <revision> -n <namespace>

Or rollback all charts:

helmfile -e my-deployment apply  # with previous git commit checked out

Artifact Signing and Supply Chain Security

All Helm charts published to ghcr.io/countly are signed and attested at build time:

Control	Implementation
Artifact signing	Cosign keyless (Sigstore OIDC) — identity bound to GitHub Actions workflow
SBOM	CycloneDX JSON generated by Syft, attached to each OCI artifact
Provenance	SLSA provenance via GitHub Artifact Attestation API
Transparency	All signatures logged in the Sigstore Rekor transparency log

Consumers can verify chart authenticity before deployment. See VERIFICATION.md for step-by-step instructions including:

cosign verify for signature verification
cosign download sbom for SBOM inspection and vulnerability scanning
gh attestation verify for SLSA provenance auditing
Kyverno/Gatekeeper policy examples for admission-time enforcement

Compliance Checklist

Requirement	How Addressed
Encryption in transit	TLS profiles (`letsencrypt`, `provided`)
Encryption at rest	StorageClass with encryption
Access control	NetworkPolicy, RBAC (operator-managed)
Secret management	External Secrets Operator integration
Audit logging	Application audit trail, observability stack
High availability	Production sizing profile (PDBs, anti-affinity, multi-replica)
Backup/recovery	Volume snapshots, database dump tools
Monitoring	Full observability stack (metrics, logs, traces, profiling)
Vulnerability scanning	CI/CD integration (add Trivy/Snyk to your pipeline)
Supply chain integrity	Cosign keyless signing, SLSA provenance, CycloneDX SBOM
Change management	GitOps via helmfile, CHANGELOG.md, release-gated OCI publishing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Hardening Guide

Network Isolation

Enable Network Policies

Ingress

Encryption

In Transit

At Rest

Secret Management

Secret Rotation

Pod Security

Security Contexts

Pod Disruption Budgets

Anti-Affinity

Audit and Observability

Application Audit Trail

Infrastructure Observability

Log Retention

Backup and Disaster Recovery

What to Back Up

Volume Snapshots

Recovery Procedure

Upgrade Safety

Pre-Upgrade Checklist

Upgrade Command

Rollback

Artifact Signing and Supply Chain Security

Compliance Checklist

FilesExpand file tree

HARDENING.md

Latest commit

History

HARDENING.md

File metadata and controls

Security Hardening Guide

Network Isolation

Enable Network Policies

Ingress

Encryption

In Transit

At Rest

Secret Management

Secret Rotation

Pod Security

Security Contexts

Pod Disruption Budgets

Anti-Affinity

Audit and Observability

Application Audit Trail

Infrastructure Observability

Log Retention

Backup and Disaster Recovery

What to Back Up

Volume Snapshots

Recovery Procedure

Upgrade Safety

Pre-Upgrade Checklist

Upgrade Command

Rollback

Artifact Signing and Supply Chain Security

Compliance Checklist