Experience building CDC With Debezium, Kafka and Kafka Connect
Change Data Capture (or CDC) has been around for a while and there are many ways to achieve it with the most famous widely used is from reading logs of source database and replicate into the sink database. MySQL has achieved this with Binary Logs and PostgreSQL with its Write Ahead Log (or WAL). However, it is still currently not possible to do this cross-database-engine, from MySQL to PostgreSQL or vice versa. To be precise, there are solutions but they are commercial and may not use log-reading method like the original. However, that has changed with the coming of Debezium, the new open-source projects from Red Hat Enterprise created as a Kafka Connect modules that can read the log of different database engines, product a universal JSON format that can be ingested by another Kafka Connect connector to update the tables in the sink database.
In this post, I would not talk about how to implement them (yet) because it would be too long and because it has been written nicely in this article Streaming data to a downstream database by the staff working there so you can check it out. This post focuses on what are the problems when deploying into production. Of course, I will show you the solutions that I used, even though it may not be optimal, and there are some that I have not solved yet. However, I hope to share my knowledge with others and glad that I can contribute something back to the community. So no more introduction, let’s get to the show!
##
Leave a comment