Skip to content

Add MySQL topology server implementation#1

Merged
morgo merged 4 commits intorelease-23.0from
mysql-topo-v23
Mar 12, 2026
Merged

Add MySQL topology server implementation#1
morgo merged 4 commits intorelease-23.0from
mysql-topo-v23

Conversation

@morgo
Copy link
Copy Markdown
Collaborator

@morgo morgo commented Mar 12, 2026

What's this?

A MySQL-based topology backend (mysqltopo) for Vitess, so we can use MySQL (including RDS) as the topology store instead of etcd/zk/consul.

This reduces operational complexity in environments where MySQL is already available.

How it works

  • Implements the full topo.Factory interface backed by a MySQL database
  • Uses MySQL replication (binlog) for real-time change notifications instead of polling
  • Auto-detects RDS endpoints and configures TLS with embedded CA certs
  • Elections and locks use MySQL's GET_LOCK() with configurable TTLs
  • All cells share the global connection (single DB), controlled via HasGlobalReadOnlyCell

What's changed

New files (19):

  • go/vt/topo/mysqltopo/ — full implementation: server, elections, locking, watches, notifications, tests
  • Plugin registrations for vtctld, vtgate, vttablet, vtorc, topo2topo, vtctldclient
  • examples/common/scripts/mysql-up.sh — local dev setup script

Modified files (10):

  • go/vt/topo/server.go — reuse global connection for cells when factory supports it
  • go/cmd/vtctldclient/command/root.go — register mysqltopo import
  • go/flags/endtoend/*.txt — register --topo-mysql-election-ttl and --topo-mysql-lock-ttl flags
  • Example scripts and CI workflow — add mysql topo support

Base

This is one commit on top of vitess v23.0.3. The release-23.0 branch points at the unmodified tag.


Ported from the mysql-topo-wip branch — squashed into a single commit for review.

Adds a MySQL-based topology backend (mysqltopo) for Vitess, allowing
MySQL to be used as the topology store instead of etcd/zk/consul.

This is useful in environments where MySQL is already available and
adding etcd/zk would increase operational complexity.

New files:
- go/vt/topo/mysqltopo/ - Full topo.Factory implementation using MySQL
  including server, elections, locking, watches, and notifications
- Plugin registrations for vtctld, vtgate, vttablet, vtorc, topo2topo,
  and vtctldclient
- Example scripts for local development with MySQL topo

Modified files:
- go/vt/topo/server.go - Reuse global connection for cells when the
  factory supports HasGlobalReadOnlyCell (shared DB backends)
- go/flags/endtoend/*.txt - Register --topo-mysql-election-ttl and
  --topo-mysql-lock-ttl flags
- .github/workflows/local_example.yml - Add mysql to CI matrix

Based on vitess v23.0.3.
@morgo morgo marked this pull request as draft March 12, 2026 02:17
morgo added 2 commits March 11, 2026 21:20
Signed-off-by: Morgan Tocker <mtocker@squareup.com>
…count

Server.Close() unconditionally called releaseNotificationSystem, but the
notification system is only acquired when Watch/WatchRecursive is called.
This caused unbalanced refcounts: servers that never watched would
decrement without incrementing, and servers that watched multiple times
would increment multiple times but only decrement once.

Fix by tracking whether each Server has acquired a reference via a
hasNotificationSystem flag. getNotificationSystemForServer now only
increments the refcount on the first call, and Close() only releases
if a reference was actually acquired.

Also fix TestMySQLTopo to close the topo.Server, which owns additional
mysqltopo.Server instances for globalCell and local cell connections.

Signed-off-by: Morgan Tocker <mtocker@squareup.com>
@morgo morgo marked this pull request as ready for review March 12, 2026 11:42
@morgo morgo requested a review from aparajon March 12, 2026 13:42
Comment thread examples/common/scripts/mysql-up.sh
Copy link
Copy Markdown
Collaborator

@aparajon aparajon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 These comments are Claude-generated (prompted by @armand-block for observability review).

Focused on logging gaps that would help debug production issues in downstream consumers that rely heavily on Watch, WatchRecursive, and CRUD operations.

Comment thread go/vt/topo/mysqltopo/notification.go
Comment thread go/vt/topo/mysqltopo/notification.go
Comment thread go/vt/topo/mysqltopo/watch.go
Comment thread go/vt/topo/mysqltopo/notification.go
Comment thread go/vt/topo/mysqltopo/file.go Outdated
Comment thread go/vt/topo/mysqltopo/server.go
Comment thread go/vt/topo/mysqltopo/notification.go Outdated
Comment thread go/vt/topo/mysqltopo/notification.go
Address review feedback from @aparajon focused on production
debuggability:

- Mark notification system as dead after exhausting retries and warn
  callers that watches will not receive updates
- Add 5s timeout for slow consumers in notifyChange/notifyDeletion to
  prevent a single blocked watcher from stalling all binlog processing
- Add lifecycle logging for watch/recursive watch registration and
  deregistration
- Log server Close() with root/schema context, and warn on db.Close()
  errors instead of discarding them
- Log notification system refcount transitions (increment/decrement/zero)
- Log rollback errors in file.go CRUD operations instead of discarding
  (filtering out expected sql.ErrTxDone)
- Log GTID extraction failures instead of silently swallowing them
- Log topo change and deletion detection in checkForTopoDataChanges
- Add Docker command example to mysql-up.sh error message
@morgo morgo merged commit a47ce93 into release-23.0 Mar 12, 2026
70 of 110 checks passed
@morgo morgo deleted the mysql-topo-v23 branch March 12, 2026 15:18
aparajon pushed a commit that referenced this pull request Apr 21, 2026
* Add MySQL topology server implementation

Adds a MySQL-based topology backend (mysqltopo) for Vitess, allowing
MySQL to be used as the topology store instead of etcd/zk/consul.

This is useful in environments where MySQL is already available and
adding etcd/zk would increase operational complexity.

New files:
- go/vt/topo/mysqltopo/ - Full topo.Factory implementation using MySQL
  including server, elections, locking, watches, and notifications
- Plugin registrations for vtctld, vtgate, vttablet, vtorc, topo2topo,
  and vtctldclient
- Example scripts for local development with MySQL topo

Modified files:
- go/vt/topo/server.go - Reuse global connection for cells when the
  factory supports HasGlobalReadOnlyCell (shared DB backends)
- go/flags/endtoend/*.txt - Register --topo-mysql-election-ttl and
  --topo-mysql-lock-ttl flags
- .github/workflows/local_example.yml - Add mysql to CI matrix

Based on vitess v23.0.3.

* Fix ConnForCell to create cell-specific connections with correct root

Signed-off-by: Morgan Tocker <mtocker@squareup.com>

* mysqltopo: fix goroutine leak from unbalanced notification system refcount

Server.Close() unconditionally called releaseNotificationSystem, but the
notification system is only acquired when Watch/WatchRecursive is called.
This caused unbalanced refcounts: servers that never watched would
decrement without incrementing, and servers that watched multiple times
would increment multiple times but only decrement once.

Fix by tracking whether each Server has acquired a reference via a
hasNotificationSystem flag. getNotificationSystemForServer now only
increments the refcount on the first call, and Close() only releases
if a reference was actually acquired.

Also fix TestMySQLTopo to close the topo.Server, which owns additional
mysqltopo.Server instances for globalCell and local cell connections.

Signed-off-by: Morgan Tocker <mtocker@squareup.com>

* Improve observability for mysqltopo

Address review feedback from @aparajon focused on production
debuggability:

- Mark notification system as dead after exhausting retries and warn
  callers that watches will not receive updates
- Add 5s timeout for slow consumers in notifyChange/notifyDeletion to
  prevent a single blocked watcher from stalling all binlog processing
- Add lifecycle logging for watch/recursive watch registration and
  deregistration
- Log server Close() with root/schema context, and warn on db.Close()
  errors instead of discarding them
- Log notification system refcount transitions (increment/decrement/zero)
- Log rollback errors in file.go CRUD operations instead of discarding
  (filtering out expected sql.ErrTxDone)
- Log GTID extraction failures instead of silently swallowing them
- Log topo change and deletion detection in checkForTopoDataChanges
- Add Docker command example to mysql-up.sh error message

---------

Signed-off-by: Morgan Tocker <mtocker@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants