Common issues encountered during deployment and their solutions.
Error: Listen [::]:9009 failed: Address family for hostname not supported
Fix: Add listen_host: "0.0.0.0" and interserver_listen_host: "0.0.0.0" to ClickHouse extraConfig. The default chart values already include this for GKE compatibility.
Error: Listen [0.0.0.0]:9363 failed: Address already in use
Cause: The ClickHouse Operator configures Prometheus independently. Do not add a prometheus block in extraConfig — it conflicts with the operator's own configuration.
Error: getaddrinfo ENOTFOUND countly-clickhouse.clickhouse.svc
Cause: The official ClickHouse Operator creates services with the pattern <cr-name>-clickhouse-headless, not <cr-name>. The correct service name is countly-clickhouse-clickhouse-headless.clickhouse.svc.
Fix: Already handled in the chart helpers. If you override secrets.clickhouse.host, use the full service name.
Error: Exporter container CreateContainerConfigError — secret not found.
Cause: The MongoDB operator creates connection string secrets only after the replica set is ready, but pod readiness requires all containers to be running.
Fix: The chart uses optional: true on the exporter's secretKeyRef. The exporter container starts without the secret and stabilizes once the operator creates it.
Symptom: Countly fails to create or access databases beyond countly and countly_drill.
Fix: The app user needs readWriteAnyDatabase on admin (default in the chart). If you've overridden the roles, ensure they match:
users:
app:
roles:
- { name: readWriteAnyDatabase, db: admin }
- { name: dbAdmin, db: countly }Error: Failed to create topic drill-events: Topic creation errors
Cause: Default COUNTLY_CONFIG__KAFKA_REPLICATIONFACTOR: "2" but the local profile deploys only 1 broker.
Fix: The local profile already sets config.kafka.COUNTLY_CONFIG__KAFKA_REPLICATIONFACTOR: "1". If you use a custom profile with fewer than 2 brokers, override this value.
Error: The 'None' policy does not allow 'max.poll.records' to be overridden
Cause: connector.client.config.override.policy was set to None. The ClickHouse sink connector needs to tune consumer settings for high-throughput batching.
Fix: Already set to All in chart defaults. If you've overridden kafkaConnect.workerConfig, ensure:
kafkaConnect:
workerConfig:
connector.client.config.override.policy: AllSymptom: Countly health manager reports Kafka Connect unreachable on port 8083.
Cause: Strimzi auto-creates a NetworkPolicy on Kafka Connect pods that only allows traffic from other Connect pods and the cluster-operator.
Fix: The chart creates an additional NetworkPolicy allowing the countly namespace to reach port 8083. Ensure networkPolicy.allowedNamespaces includes your Countly namespace.
Error: Startup probe failed: dial tcp <pod-ip>:3020: connect: connection refused
Cause: Missing HOST: "0.0.0.0" config for the component. Kubernetes probes connect via the pod IP, not localhost.
Fix: Already included in chart defaults. All components have COUNTLY_CONFIG__<COMPONENT>_HOST: "0.0.0.0" in their config sections.
Error: Countly pods reference a secret that exists in a different namespace.
Cause: Kubernetes secrets are namespace-scoped. The MongoDB operator creates secrets in the mongodb namespace, but Countly runs in the countly namespace.
Fix: The chart computes the MongoDB connection string from service DNS and creates its own secret in the countly namespace. Do not set secrets.mongodb.existingSecret to a cross-namespace secret — instead provide secrets.mongodb.password or secrets.mongodb.connectionString.
Symptom: kubectl describe ingress countly -n countly shows AddedOrUpdatedWithError events.
Cause: F5 NIC validates annotations strictly and rejects invalid ones. Common mistakes:
- Using
"on"/"off"instead of"True"/"False"fornginx.org/proxy-buffering - Missing
ssuffix on timeouts (e.g.,"60"instead of"60s") - Using old
nginx.ingress.kubernetes.io/*annotations (community ingress-nginx)
Fix: Check the events section of kubectl describe ingress — F5 NIC logs the reason for rejection. Update annotations to use nginx.org/* format.
Error: nginx reload failed: "proxy_http_version" directive is duplicate
Cause: F5 NIC auto-injects proxy_http_version 1.1 when nginx.org/keepalive > 0. If your location-snippets also include proxy_http_version 1.1, it duplicates.
Fix: Remove proxy_http_version 1.1 from nginx.org/location-snippets. The chart defaults already exclude it.
Error: OTel export failure: DNS resolution failed for alloy-otlp.observability.svc.cluster.local:4317
Cause: The OTEL exporter endpoint is configured in f5-nginx-values.yaml but the Alloy-OTLP collector is not deployed.
Fix: Either deploy the observability stack (helm install countly-observability ...) or remove the otel-exporter-endpoint from f5-nginx-values.yaml. This error is benign and does not affect traffic.
Error: TLS secret countly-tls is invalid: secret doesn't exist or of an unsupported type
Cause: The ingress references a TLS secret that doesn't exist. By default, TLS is disabled (ingress.tls.mode: http). If you enabled TLS but the secret hasn't been created yet, this error appears.
Fix: Set ingress.tls.mode: letsencrypt for automatic certificate provisioning via cert-manager, ingress.tls.mode: existingSecret with a pre-created TLS secret, or ingress.tls.mode: http to disable TLS. See DEPLOYMENT-MODES.md.
Cause: Backend service not ready or datasource URL incorrect.
Fix: Verify the backend pods are running:
kubectl get pods -n observabilityCheck Grafana datasource configuration:
kubectl get configmap -n observability -l app.kubernetes.io/component=grafana -o yaml | grep -A5 "url:"Error: permission denied reading /var/log/pods
Cause: The Alloy DaemonSet runs as root to read host log files. Some hardened clusters block this.
Fix: Ensure the Alloy pods are running as root (default in the chart). Check pod security policies/standards:
kubectl get pods -n observability -l app.kubernetes.io/component=alloy -o yaml | grep -A3 securityContextCause: Both Alloy DaemonSet and Alloy-Metrics scraping the same targets.
Fix: By design, the chart enforces a clean split — Alloy DaemonSet handles logs only, Alloy-OTLP handles traces/profiles, and Alloy-Metrics handles ALL Prometheus scraping. If you see duplicates, check for custom scrape configs.
Error: Grafana shows empty dashboard folder.
Cause: Dashboard JSON files are loaded via .Files.Get in the chart. If the dashboards/ directory is missing from the chart package, ConfigMaps will be empty.
Fix: Ensure the chart was packaged with helm package (which includes the dashboards/ directory) or deployed from the local chart directory.
Cause: External endpoint URLs not configured or auth missing.
Fix: Verify external URLs are set:
helm get values countly-observability -n observability | grep -E "(remoteWriteUrl|pushUrl|otlpGrpcEndpoint|ingestUrl)"Check Alloy logs for connection errors:
kubectl logs -n observability daemonset/$(kubectl get ds -n observability -l app.kubernetes.io/component=alloy -o name) | grep -i errorCause: This should not happen — the chart conditionally omits metrics_generator from Tempo config when metrics.enabled=false.
Fix: If you see this error, verify your values:
helm get values countly-observability -n observability | grep -A2 metricsEnsure metrics.enabled is explicitly set to false, not just that Prometheus is missing.
Error: no matches for kind "Certificate" in version "cert-manager.io/v1"
Fix: Install cert-manager first. See PREREQUISITES.md.
Charts must be installed in this order:
countly-mongodb+countly-clickhouse(no dependencies between them)countly-kafka(depends on ClickHouse for the sink connector)countly(depends on all three)
Helmfile's needs: configuration enforces this automatically.