A Spring Boot backend project that replicates data from MySQL to PostgreSQL with a full-load snapshot, MySQL binlog CDC, and checkpoint-based recovery.
This repository is a local database replication service built for backend portfolio and interview demos. It reads from a MySQL source database, performs an initial snapshot copy into PostgreSQL, and then keeps the target in sync by consuming row-based MySQL binlog events.
In 30 seconds, the repo shows:
- full load vs incremental replication
- binlog-based CDC
- ordered event processing
- checkpoint recovery after restart
- pragmatic modular Java service design
Real migration systems rarely stop at a one-time copy. They usually need to:
- copy the existing dataset safely
- continue replicating changes after the snapshot
- recover from restarts without reprocessing everything
This project exists to demonstrate those ideas clearly in a self-contained Java codebase that is easy to run on a laptop and easy to explain in interviews.
+------------------------+
| MySQL Source DB |
| schema + seed data |
| row-based binlog |
+-----------+------------+
|
JDBC snapshot / metadata
|
+-----------------v-----------------+
| SchemaDiscoveryService |
+-----------------+-----------------+
|
v
+-----------------+-----------------+
| FullLoadService |
| TableCopier |
+-----------------+-----------------+
|
batched idempotent writes
|
v
+------------------------+
| PostgreSQL Target DB |
| auto-created tables |
+------------------------+
^
|
+-----------------+-----------------+
| TargetApplier |
| retry + SQL generation |
+-----------------+-----------------+
^
|
+-----------------+-----------------+
| EventTransformer |
+-----------------+-----------------+
^
|
+-----------------+-----------------+
| BinlogCdcReader |
| ordered binlog event stream |
+-----------------+-----------------+
|
v
+------------------------+
| checkpoint.json |
| file + position |
+------------------------+
- Initial full-load snapshot from MySQL into PostgreSQL
- Row-based MySQL binlog CDC for
INSERT,UPDATE, andDELETE - Checkpoint persistence and restart recovery
- Ordered event application based on binlog sequence
- Automatic target table creation from source schema metadata
- Idempotent target writes with PostgreSQL upsert behavior
- Retry handling for transient target write failures
- Docker Compose environment for easy local demos
- Spring Boot CLI commands for
full-load,cdc, andrun-all
- Java 17
- Maven
- Spring Boot
- Spring JDBC
- MySQL JDBC driver
- PostgreSQL JDBC driver
mysql-binlog-connector-java- Jackson
- SLF4J + Logback
- Docker Compose
.
├── .env.example
├── .github/
├── checkpoint/
├── docker-compose.yml
├── pom.xml
├── README.md
├── scripts/
│ ├── reset_demo.sh
│ └── run_demo.sh
├── sql/
│ ├── mysql_init.sql
│ ├── postgres_init.sql
│ └── seed.sql
└── src/
├── main/
│ ├── java/com/example/migrationservice/
│ │ ├── cdc/
│ │ ├── checkpoint/
│ │ ├── cli/
│ │ ├── config/
│ │ ├── fullload/
│ │ ├── replication/
│ │ ├── schema/
│ │ └── util/
│ └── resources/
└── test/java/com/example/migrationservice/
- Discover tables from the configured MySQL schema.
- Read column metadata and primary key metadata from
INFORMATION_SCHEMA. - Create matching PostgreSQL tables if they do not exist.
- Stream rows from MySQL through JDBC.
- Copy rows to PostgreSQL in batches.
- Use upsert semantics so reruns are safe for the demo.
For run-all, the service captures the current MySQL binlog position before the snapshot begins. That snapshot boundary is used as the CDC starting point after the full load completes.
- Open a binlog stream against the MySQL source.
- Read ordered row events from the binlog.
- Convert raw binlog payloads into
ReplicationEventobjects. - Apply those events to PostgreSQL:
INSERT-> upsertUPDATE-> primary-key update, with upsert fallbackDELETE-> primary-key delete
- Persist the next binlog position after a successful apply.
The pipeline is intentionally single-threaded so event ordering stays simple and easy to reason about.
Checkpoint state is stored in:
checkpoint/checkpoint.json
The file records:
- current binlog filename
- current binlog position
- last update timestamp
Recovery behavior:
cdcresumes from the checkpoint if it existsrun-allresumes CDC from the checkpoint if it exists- if no checkpoint exists, the service captures the current source binlog position and starts from there
- the checkpoint only moves forward after a target write succeeds
- Java 17
- Maven 3.9+
- Docker Desktop or Docker Engine with Compose support
To avoid clashing with local database installs, the demo uses:
- MySQL:
localhost:3307 - PostgreSQL:
localhost:5433
The Docker setup uses MySQL 8.0 because it works cleanly with the chosen binlog connector in this local demo.
mvn test./scripts/reset_demo.shFull load only:
mvn spring-boot:run -Dspring-boot.run.arguments=full-loadCDC only:
mvn spring-boot:run -Dspring-boot.run.arguments=cdcFull pipeline:
mvn spring-boot:run -Dspring-boot.run.arguments=run-allDemo script:
./scripts/run_demo.sh run-all./scripts/reset_demo.shmvn spring-boot:run -Dspring-boot.run.arguments=run-allExpected result:
- target tables are created automatically
- existing MySQL rows are copied to PostgreSQL
- the service stays running and starts listening to the MySQL binlog
Open a second terminal and run the SQL commands in the next section.
Use the verification commands in the demo SQL section below.
docker exec -i mini-migration-mysql mysql -uroot -prootpassword source_db <<'SQL'
INSERT INTO customers (email, full_name, status, loyalty_points)
VALUES ('demo.user@example.com', 'Demo User', 'ACTIVE', 25);
UPDATE products
SET price = 44.99
WHERE product_id = 102;
DELETE FROM inventory
WHERE inventory_id = 5003;
SQLdocker exec -i mini-migration-postgres psql -U migration_user -d target_db -c "SELECT customer_id, email, loyalty_points FROM public.customers ORDER BY customer_id;"
docker exec -i mini-migration-postgres psql -U migration_user -d target_db -c "SELECT product_id, price, active FROM public.products ORDER BY product_id;"
docker exec -i mini-migration-postgres psql -U migration_user -d target_db -c "SELECT inventory_id, product_id, warehouse_code FROM public.inventory ORDER BY inventory_id;"You should see:
- the inserted
customersrow in PostgreSQL - the updated
products.price - the deleted
inventoryrow removed from PostgreSQL
- Stop the running Spring Boot process with
Ctrl+C. - Make another MySQL change.
- Restart the service with:
mvn spring-boot:run -Dspring-boot.run.arguments=cdc- The service resumes from
checkpoint/checkpoint.jsonand applies the missed change.
Main configuration lives in:
src/main/resources/application.yml- environment variables from
.env.example
Default local values:
- MySQL host:
localhost - MySQL port:
3307 - MySQL database/schema:
source_db - PostgreSQL host:
localhost - PostgreSQL port:
5433 - PostgreSQL database:
target_db - PostgreSQL schema:
public - checkpoint file:
checkpoint/checkpoint.json
- Focused on a single MySQL source schema and a single PostgreSQL target schema.
- Assumes row-based MySQL binlog events with full row images.
- Supports the common column types used in the demo schema, not the full MySQL type system.
- Does not replicate DDL changes or schema drift.
- Does not preserve source indexes, foreign keys, or default expressions on the target.
- Primary key updates are not modeled as delete-plus-insert.
- Snapshot consistency is pragmatic rather than production-grade.
- CDC processing is intentionally single-threaded to keep correctness and ordering easy to explain.
- Full load vs incremental replication: This repo cleanly separates the snapshot problem from the continuous replication problem.
- Binlog event handling: CDC is driven by ordered MySQL row events, not periodic polling.
- Event ordering: Changes are applied in binlog order, which keeps correctness straightforward.
- Checkpoint recovery: The service stores binlog file and position locally and resumes from a known boundary.
- Idempotent writes: Inserts use upsert semantics and updates can fall back to upsert when needed.
- Modular backend design: Schema discovery, full load, CDC reading, event transformation, target apply, and checkpointing are separated into focused services.
- Failure handling: Transient target-side failures retry; unrecoverable errors fail loudly.
- DDL change handling
- richer type mapping and compatibility validation
- target index and constraint replication
- metrics and health endpoints
- dead-letter handling for poison events
- configurable table filtering and richer selection rules
- stronger snapshot consistency for more demanding production scenarios