fix(infra): align rabbitmq servicemonitor with tls-only listener#3580
Draft
manamana32321 wants to merge 1 commit into
Draft
fix(infra): align rabbitmq servicemonitor with tls-only listener#3580manamana32321 wants to merge 1 commit into
manamana32321 wants to merge 1 commit into
Conversation
skkuding/codedang#3445에서 disableNonTLSListeners: true 가 머지된 뒤, RabbitMQ Operator가 Service에서 비-TLS prometheus 포트(15692, name=prometheus)를 제거하고 TLS 포트(15691, name=prometheus-tls)만 노출하도록 변경됐다. 하지만 ServiceMonitor는 여전히 port: prometheus 로 매칭을 시도해 endpoint 0개 → Prometheus target 0개 → rabbitmq_queue_messages_* 메트릭 부재 → Grafana DatasourceNoData 알림 7일 연속 firing 상태가 됐다. ServiceMonitor를 prometheus-tls 포트 + HTTPS + tlsConfig (cert-manager 발급 CA 검증)으로 갱신해 TLS 전용 리스너와 정렬한다. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the RabbitMQ ServiceMonitor configuration to enable TLS for metrics scraping. It changes the endpoint port to prometheus-tls, sets the scheme to https, and adds a tlsConfig block referencing the rabbitmq-server-certs secret. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
grafana.stage.codedang.com에서DatasourceNoData알림 (RabbitMQ 메시지 적체/RabbitMQ 미확인 메시지 적체)이 7일 연속 매일 firing되던 문제 해결.Root cause
disableNonTLSListeners: true) 머지 후 RabbitMQ Operator가 Service에서 비-TLS prometheus 포트(15692, name=prometheus)를 제거하고 TLS 포트(15691, name=prometheus-tls)만 노출하도록 변경됨port: prometheus로 매칭 시도 → endpoint 0개 → Prometheus target 0개 →rabbitmq_queue_messages_*메트릭 부재 →DatasourceNoData발화Fix
ServiceMonitor를 TLS 리스너와 정렬:
port: prometheus→port: prometheus-tlsscheme: https+tlsConfig로 cert-manager 발급 CA 검증 (rabbitmq-server-certs/ca.crt)serverName: rabbitmq.rabbitmq.svc(cert SAN과 매칭)Additional context
라이브 검증 (stage)
up{job="rabbitmq-monitor"}결과 0개,rabbitmq_queue_messages_ready시리즈 없음rabbitmq_queue_messages_ready 2) — Operator-managed Service의 포트 라우팅 단계에서만 끊겨있음 확인up{job="rabbitmq-monitor"} == 1+ 알림 resolve 재확인 예정연관 PR
Followup 필요
release브랜치 추적 중이라 feat(infra): add TLS encryption for RabbitMQ AMQP communication #3445/feat(infra): add rabbitmq monitoring with prometheus and grafana dashboard #3508 미포함 상태. release rebase 시 본 fix가 함께 들어가야 prod에서도 monitoring 회복.rabbitmq-stageArgoCD app이 RabbitmqCluster CRD patch 시resourceVersion: 0오류로 OutOfSync Degraded — 본 PR 범위 밖, 별도 트러블슈팅 필요.Before submitting the PR, please make sure you do the following
fixes #123).