Skip to content

DB connection is interrupted with multiple management servers #10469

@Luskan777

Description

@Luskan777

problem

Hello, I am using an environment with 2 management servers, using MariaDB Galera as the database server.

A few minutes after starting the management server service, the connection to the database is dropped, and the management server logs show the following record:

ERROR [c.c.u.d.T.Transaction] (AsyncJobMgr-Heartbeat-1:[ctx-c66c8126]) (logid:cfb46db0) Unexpected exception: java.sql.SQLTransientConnectionException: cloud - Connection is not available, request timed out after 80000ms (total=1000, active=1000, idle=0, waiting=22)

My current MariaDB server configuration is as follows:

[mysqld]
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=1000
log-bin=mysql-bin
max_allowed_packet=1024M
net_read_timeout=120
net_write_timeout=120

And the configuration of access to the management servers database (db.properties) are these:

# CloudStack database tuning parameters
db.cloud.connectionPoolLib=hikaricp
db.cloud.maxActive=1000
db.cloud.maxIdle=100
db.cloud.maxWait=900000
db.cloud.minIdleConnections=20
db.cloud.connectionTimeout=80000
db.cloud.keepAliveTime=800000
db.cloud.validationQuery=/* ping */ SELECT 1
db.cloud.testOnBorrow=true
db.cloud.testWhileIdle=true
db.cloud.timeBetweenEvictionRunsMillis=40000
db.cloud.minEvictableIdleTimeMillis=240000
db.cloud.poolPreparedStatements=false
db.cloud.url.params=prepStmtCacheSize=517&cachePrepStmts=true&sessionVariables=sql_mode='STRICT_TRANS_T ABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION'&serverTimezone=UTC

PS: I changed some parameters of both the database server (max_connections, max_allowed_packet and timeout parameters) and the connection pools in the db.properties file (db.cloud.maxActive, db.cloud.connectionTimeout, db.cloud.minIdleConnections), but none of them seem to have solved it, it just takes a little longer to interrupt.

When I check the database server logs I can only get the following error records:

Feb 26 10:19:14 acs-stg-mngt-01 mariadbd[20527]: 2025-02-26 10:19:14 3320 [Warning] Aborted connection 3320 to db: 'cloud_usage' user: 'cloud' host: 'IP_MNGT_SERVER' (Got an error reading communication packets)
Feb 26 10:18:34 acs-stg-mngt-01 mariadbd[20527]: 2025-02-26 10:18:34 2282 [Warning] Aborted connection 2282 to db: 'cloud' user: 'cloud' host: 'IP_MNGT_SERVER' (Got an error reading communication packets)

PS: I noticed that this started happening after installing cloudstack-usage, this same problem already occurred in a Standalone installation (without HA) of Cloudstack and it occurred precisely after installing cloudstack-usage, but I'm still debugging to find out if it is really related to the current problem.

versions

Cloudstack: 4.20
MariaDB: 10.6

The steps to reproduce the bug

  1. Configure Cloudstack in HA mode (MariaDB, HA Proxy)
  2. Add 3 KVM hosts
  3. Enable HA hosts

What to do about it?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions