Skip to content

Add queue observability for liaison internal pipelines #13775

@hanahmily

Description

@hanahmily

Problem

Liaison queue behavior is not easy to observe end-to-end during chunked sync pressure and failures.

Current context

  • Queue-sub already exposes chunk-ordering/error metrics in banyand/queue/sub/server.go.
  • Liaison wires sub.NewServerWithPorts(..., "liaison-server", ...) in pkg/cmdsetup/liaison.go.
  • Existing dashboards aggregate some queue errors (banyandb_queue_sub_total_msg_sent_err) in docs/operation/observability.md and docs/operation/grafana-cluster.json.

Proposal

  • Add/extend liaison-focused queue metrics (depth, retries, per-topic throughput/latency, failed-part counters).
  • Add dashboard panels and alert suggestions for liaison queue health.
  • Document metric meanings and troubleshooting paths.

Acceptance criteria

  • New metrics are exported and documented.
  • Dashboards include liaison queue saturation/failure visibility.
  • Integration/e2e validation demonstrates metric changes under injected failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseBanyanDB - SkyWalking native database

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions