Change Data Capture in PostgreSQL. pt.1

February 7, 2025

TL;DR

PostgreSQL offers an inbuilt pub/sub functionality that can be used for Change Data Capture. We use it for cache update notifications.

Part 1 - Quick and Easy

If you need to know when data changes in your database, then you’ll be wanting to look at Change Data Capture (CDC). This article outlines a simple way of detecting changes and notifying the application server of said changes.

Our requirement was simple, in an application we cache some complex database records in Redis/Valkey. When these records are required by the users we pull them from the cache or, if not in the cache, from the database and then we push them into the cache.

Because of the record’s complexity, there’s a performance implication when the database constructs them. Obviously all queries are tuned and query plans optimised to within a processor cycle of wonder, but it’s still a hit. This being so, we cache the retrieved record into Valkey so subsequent retrievals are virtually instantaneous.

When the user alters the record, this is sent to the database and the cache.

Q. What happens if an outside process or the database itself has changed the record?

Our cache is out of date and the application doesn’t know it. If a user wanted the latest record, the one stored in the cache would be out of date and the cache doesn’t know it.

A. Notify and Listen are your friends.

NOTIFY (or it’s helper function pg_notify() ) sends (with a small payload of up to 8kb) a message out into the ether. If your application server is LISTENing then it can decide what to do with this new found knowledge.

Here’s a notification event being sent from an Insert trigger on the task table:

        
    _rec t_msg; -- _rec is of type t_task; the task table's name.
    ..
    ..
    ..
    _rec.msg_id = NEW.msg_id;
    _rec.msg_historyid = NEW.msg_historyid;        
    
    perform pg_notify('msg_new', select row_to_json(_rec))::text;    
    -- e.g. {"msg_id":101,"msg_historyid":"12"}

For our new record, we put the primary keys of the new record into a JSON payload and broadcast this fact to anyone who’s listening.

Why don’t we broadcast the entire record?

Remember we said notifications are limited to 8K? In our instance, the records are often more and the broadcast will fail.

Also, it’s polite. Why broadcast large amounts of data if no one’s listening? You may have systems that are listening to the notifications who aren’t privvy to the record’s contents. If you broadcast the entire record, you’re showing your hand to all and sundry when you have no need to.

B. How to LISTEN

In the SQL code snippet, we saw how to broadcast a notification. Here, we’ll see how to receive them.

All we need to do is run a DB query. In our NodeJS systems, we start a separate service for listening to DB notifications, away from the main application’s thread.

await dbClient.query(`LISTEN msg_new`)

Parse the notification and decide with your business logic what needs to be done.

In this example, you’ll see the notification being intercepted and logged to the server’s console. The server code iterates over each listener looking for a match.

alt text

As the LISTEN event is not specific to any client side language, any language with a PostgreSQL driver can implement this.

Summary

LISTEN and NOTIFY are simple and configurable database functions that broadcast user-defined notifications. They’re useful as a lightweight cross-platform CDC mechanism, amongst other things.

In Part 2 (coming soon), you’ll see another PostgreSQL CDC mechanism; Debezium and Kafka