This is the third article about using PostgreSQL as a queue. The previous article can be read here.
The Background
Although the tone of the previous article was somewhat self-congratulatory; (implementing our waterfall method increased performance by 4x), there’s always more to give. Trying various timings and record volume parameters showed us that we’d hit load-max performance with what we’d got. But what if HTTP itself was getting in our way? After all, it is a protocol wrapper over TCP.
Avoiding Inefficiencies
Running the waterfall method showed that the processor was running flat out on all cores. Some of that processing would have been consumed unwrapping the messages from HTTP - how much remained to be seen.
Here’s a Rust snippet where each incoming line is destructured into the Bip struct, added to an array which is converted into a bulk INSERT by the process_records function.
tokio::spawn(async move {
let mut reader = BufReader::new(stream);
let mut line = String::new();
while reader.read_line(&mut line).await.unwrap() > 0 {
if let Ok(bip) = serde_json::from_str::<Bip>(&line) {
let mut recs = records.lock().await;
recs.push(bip);
if recs.len() >= max_rec {
println!("Max records reached, processing {} records", recs.len());
process_records(&recs, &rec_target).await;
recs.clear();
}
}
line.clear();
}
});
Results: Have a free 21% improvement.
The target machine differed from the previous one, it’s a 4 core i5 from 2016-ish. Remember that this single machine has to read files, decompress them, stream them, read them, destructure them, build an insert and store them in a locally running PostgreSQL (a vanilla Docker container version).
lspcu tells us:
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz
To make sure that we’re comparing apples to apples, we re-ran the original Rust loaders; (normal and waterfall) before the TCP run on this machine.
That’s a 21% boost in loading performance over the HTTP waterfall method, and 59% over the basic recieve/insert method. That’s over 75,000 records per second!
But there’s another upside too. Look at the load:

This crappy old machine isn’t stressed.
And there’s more: Another 31%!
Our primary key is a UUID v4. PostgreSQL doesn’t like them and they’re pretty inefficient. PostgreSQL v18 will address this by introducing UUID v7. As we’re using PostgreSQL as a stack, we just want records to land and we’ll pull them off the stack in chunks when we have capacity.
Something like this:
DELETE FROM bips.t_bip WHERE ctid IN (
SELECT ctid FROM bips.t_bip
LIMIT 1000
)
RETURNING *;
Removing the primary key has no effect on our required functionality, but will reduce the load on the database (or adding a BIGSERIAL id)
That’s nearly 100,000 records processed per second on cheap, limited hardware. A quick re-run of the NestJS version (where we started from) showed 1430 records per second. We’re now 69x faster than our original run.
So what? I’ll just get more machines.
When the day comes that your favourite cloud provider charges by the kilowatt hour, you’re going to need a sit down.
In the next and final installment, we’ll run over the costs involved in running such a system.
Addendum
20 October 2025
At the time of writing the original post, the most flavoursome version of PostgreSQL was v17. Since then, v18 has been release. For completeness, we’ve re-run the TCP tests with the new version. The configuration is the same as previous tests - a vanilla official docker container.
The io_uring feature that we wrote about here won’t come into play in this test as in v18 it’s for SELECTs only.
For the first time, we’ve breached 100,000 records per second. That’s an increase of about 1.45%. Not the largest and probably within the realms of variability, but a psychological victory nonetheless.