-
Notifications
You must be signed in to change notification settings - Fork 640
Description
Describe the bug
Hello, I use a KeyDB multimaster cluster behind an HAProxy load balancer.
This works most of the time, but in certain situations I get the Follwoing Error:
2025/09/30 15:02:24.000 [E] write tcp 10.34.5.49:38184->10.34.5.81:6379: write: broken pipe
2025/09/30 15:02:24.001 [D] | 185.73.121.250| 503 | 2.115088ms| nomatch| GET /api/get-account
This mostly happens when a user hits the login page.
My docker-compose looks like this
services:
casdoor:
image: registry.integral-systems.ch/cache_docker/casbin/casdoor:v2.55.0
environment:
TZ: "Europe/Zurich"
dbName: ${DB_NAME}
driverName: postgres
dataSourceName: "user=${DB_USER} password=${DB_PASSWORD} host=${DB_HOST} port=${DB_PORT} sslmode=${DB_SSL} dbname=${DB_NAME}"
appname: ${APP_NAME}
httpport: 8000
runmode: prod
redisEndpoint: ${REDIS_HOST}:6379,${REDIS_DB},${REDIS_PASSWORD}
radiusServerPort: 1812
radiusSecret: ${RADIUS_SECRET}
origin: https://auth.sunnysideup.so
originFrontend: https://auth.sunnysideup.so
volumes:
- type: cluster
source: casdoor_data
target: /files
ports:
- target: 8000
published: 32009
networks:
- loadbalancer
cap_drop:
- ALL
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 5s
retries: 15
start_period: 30s
The Haproxy Config looks like this
global
maxconn 50000
log stdout format raw local0 info
nbthread 4
defaults
mode tcp
log global
# Timeout values should be configured for your specific use.
# See: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout%20connect
timeout connect 5s
timeout client 5m
timeout server 5m
timeout tunnel 1h
# TCP keep-alive on client side. Server already enables them.
option clitcpka
option srvtcpka
retries 3 # Retry up to 3 times before marking a node as failed
option redispatch # Redispatch to another node if one fails during a session
option log-health-checks
listen KeyDB
bind *:6379
maxconn 40000
mode tcp
timeout client 15m
timeout server 15m
hash-type consistent
balance source
option tcplog
option tcp-check
#uncomment these lines if you have basic auth
tcp-check send AUTH\ PASSWORD\r\n
tcp-check expect string "+OK"
tcp-check send "PING\r\n" comment "Ping phase"
tcp-check expect string "+PONG"
tcp-check send "info replication\r\n" comment "Role (active-replica)phase"
tcp-check expect string "role:active-replica"
tcp-check send "QUIT\r\n" comment "Disconnect phase"
tcp-check expect string "+OK"
default-server inter 2s fall 3 rise 2 slowstart 60s
server KeyDB-01 kv-01.cluster:6379 maxconn 20000 check
server KeyDB-02 kv-02.cluster:6379 maxconn 20000 check
server KeyDB-03 kv-03.cluster:6379 maxconn 20000 check
To reproduce
Use haporxy between a a multimaster Cluster and casdoor
Expected behavior
If one of those node fails Haproxy should handle this so that Casdoor (Beego) can continue working without an error.
services:
keydb:
image: registry.integral-systems.ch/cache_docker/eqalpha/keydb:alpine_x86_64_v6.3.4
container_name: keydb
labels:
ch.integral-systems.group: "database"
ch.integral-systems.deployment: "redis"
ch.integral-systems.health_monitor: "true"
ch.integral-systems.customer: "false"
ch.integral-systems.infrastructure: "false"
ch.integral-systems.services: "true"
extra_hosts:
- "KeyDB-01:172.16.65.1"
- "KeyDB-02:172.16.65.2"
- "KeyDB-03:172.16.65.3"
env_file:
- .env
command: keydb-server /etc/redis.conf --requirepass $REDIS_PASSWORD --masterauth $REDIS_PASSWORD --port 6379 --replicaof KeyDB-02 6379 --replicaof KeyDB-03 6379
volumes:
- ./config.conf:/etc/redis.conf:Z
- /data/key-db:/data:Z
network_mode: "host"
mem_limit: 2G
mem_reservation: 2G
restart: unless-stopped
Additional information
Original Issue
casdoor/casdoor#4218 (comment)