LiteCDC is a lightweight Change Data Capture (CDC) solution designed for PostgreSQL databases, with optional replication to MySQL. It enables you to efficiently capture and track change events in your PostgreSQL databases, before sending them to Kafka for future consumption through an external application or using one of the internal MySQL sinks.
This behavior is determined by a user-defined pipeline, the architecture of which is illustrated below:
For a more comprehensive overview of the system, please refer to the project report (report.pdf
).
PostgreSQL:
- Version:
9.4+
- wal_level:
logical
The (optional) sinks should work with most versions of MySQL but the application has been tested with versions 8.0+
.
The provided example docker-compose.yml
file spins up the necessary components for local testing. Please note that this setup is intended for development and testing purposes only and should not be used for production environments.
-
Build the Project:
mvn clean install
-
Start the Docker Containers
docker-compose --profile local-postgres --profile local-mysql up --build
Import the example Postman API collection (postman_collection.json
) to view some sample requests and start up a demo pipeline.
The following requests are available:
- Run Pipeline:
POST /pipeline/run
- Halt Pipeline:
POST /pipeline/halt
- Get Pipeline Status:
GET /pipeline/status
- Get Snapshot Metrics:
GET /pipeline/metrics/snapshot
- Get (General) Metrics:
GET /pipeline/metrics
When running the tests, ensure that the follow environment variables are set in your run configuration:
- PG_PASS =
pg_password
- MYSQL_PASS =
mysql_password
Known Issue: The end-to-end tests (EndToEnd___Test
and NoDestination___Test
) currently fail when ran in the same run configuration. These tests should be run in isolation.
Using a very crude local approach to performance testing, the following behaviour was observed:
Snapshotting
80,000,000
rows across 4 tables (13.7 GB
)- Rows processed in
655s
. - Upper Limit Throughput:
~ 122,137 rows/s
Streaming
20,000,000
rows from a single table (3.4 GB
).- These rows were generated using a SQL statement, which committed transactions of 5000 insert operations until 20 million rows were inserted.
- The system processed the changes at a rate faster than they were being generated, maintaining a consistent source-producer time lag of less than 70 ms.
- Changes processed in
332s
. - Upper Limit Throughput:
> 60,240 rows/s
MySQL Sinks
10,000,000
rows across 4 tables (1.7 GB
)- Used a similar method of generating change data as the streaming performance test, using different mixes of event types to show the range in performance.
- Batching:
(15,175 ~ 15,267) rows/s
- Transactional:
(6,165 ~ 6,978) rows/s