Change Data Capture with Kafka and Debezium. pt.2

TL;DR

Debezium reads the WAL file. User code in Debezium puts the changed record’s keys into a user-defined Kafka topic where it’s horizontally scaleable and is read by other systems. No DDL required.

Part 1 outlined a simple CDC system using PostgreSQL’s LISTEN and NOTIFY pub/sub mechanism. It’s easy to create and use, but sometimes requirements are more stringent.

Part 2 - Why Debezium/Kafka?

We can send CDC messages uses LISTEN/NOTIFY in PostgreSQL, but each message requires some SQL code, usually on a trigger or in a function. Sometimes, we don’t have the luxury of being able to modify our database, or even to have made concrete decisions on what data we want and when we want it. Sometimes, we don’t have the luxury of using PostgreSQL. This is where Debezium and Kafka come in.

RedHat’s Debezium connector works on many RDBMS’s. For PostgreSQL, it reads the WAL file (a.k.a. transaction log/redo file etc) and doesn’t require any PostgreSQL DDL changes. The Debezium configuration JSON file contains a list of tables in which we have an interest:

{
  "name": "bip-connector",
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "plugin.name": "pgoutput",
  "topic.prefix": "bips",
    
    ...
  
  "table.include.list": "bips.t_bips"
}

All events in the bips.t_bips table are now watched by Debezium. We just have to create some code to decide what to do with them. This code pushes the updates into a different Kafka topic depending upon the contents. If the content contains the proscribed words, the record is pushed to a Kafka Filth topic, otherwise they go to Clean.

    private final static List<String> matchList = Arrays.asList("fun", "happy", "joy", "laugh");


    public static Topology makeStream(StreamsBuilder streamsBuilder) {

        KStream<String, String> kStream = streamsBuilder.stream(TOPIC);
            
        kStream.split()
        .branch((key, value) -> matchList.stream().anyMatch(value::contains),
                Branched.withConsumer((ks) -> ks.to(TOPICFILTH)))
        .defaultBranch(Branched.withConsumer((ks) -> ks.to(TOPICCLEAN)));
        return streamsBuilder.build();
   
    }

Once they’re in Kafka, the topic subscribers pick up the messages and do whatever business they need to; perhaps querying a replica DB for the full record and associated data.

We’ve avoided DDL here and can make our decisions about the tables that interest us at a later date.

Example code.

The snippets taken in this post are taken from the KafkaPlayground repository.

https://github.com/WebTargetLtd/KafkaPlayground

The code is complete with source database, and I welcome you to get a copy and docker-compose up

What else should I know?

The cloud is slow.*

Yazz and the Plastic Population would have you believe that the Only Way is Up (baby), but their upbeat lightweight dance pop is no match for the crushing power of the tech billionaires and their favourite Yazz remix; The Only Way is Sideways (Kerching mix). Horizontal scaling is where the cloud is. Vertical scaling options are the Elysian Fields where on-prem people play, laughing, frolicking and guzzling ambrosia.

Ingesting (seeding) or updating large amounts of data in PostgreSQL can be troublesome. If you’ve been sold a dream by a multi-national consulting organisation (MNCO), only for them to hand over the work to fresh offshore graduates (who did a whole module on databases at college), then the pain you will encounter will make you wince and have you talking to your legal team in a desperate search for loopholes. Don’t bother; MNCO have been here before, time and time again. It’s watertight.

Kafka was created for the cloud as it clusters so well, giving horizontal, cloud-compatible scaling and is used as part of a well architected PostgreSQL solution, removing load from the DB where it’s cumbersome and deliterious to live performance, keeping db statistics up to date and shortening transaction locks.

* If you’re a tech billionaire with a well-funded legal department, then you probably mis-read that. Nothing to see here.