Scaling PostgreSQL with Read-Write Splitting: A Docker & SQLAlchemy Tutorial

Introduction: Understanding Read-Write Splitting and Database Architecture for Scalability

As web applications grow, scaling database performance becomes crucial. One effective strategy is to implement read-write splitting, where write operations (such as INSERT, UPDATE, and DELETE) are directed to the primary database. At the same time, read operations (such as SELECT queries) are routed to replicas. This approach improves the efficiency of database interactions and enables better load balancing and high availability.

In this guide, we will explore how to set up PostgreSQL read-write splitting using Docker to create a mock Aurora PostgreSQL cluster, HAProxy for load balancing, and SQLAlchemy in Python to manage read-write query separation.

Creating a Mock Aurora PostgreSQL Cluster with Docker Compose

To simulate an Aurora PostgreSQL cluster locally, we will use Docker Compose to spin up one primary and two replica PostgreSQL instances.

Here’s a simple docker-compose.yml to set up the cluster:

version: ‘3’

services:

primary-db:

image: postgres:13

environment:

POSTGRES_USER: admin

POSTGRES_PASSWORD: admin_pass

POSTGRES_DB: test_db

ports:

– “5432:5432”

volumes:

– pgdata:/var/lib/postgresql/data

replica-db1:

image: postgres:13

environment:

POSTGRES_USER: admin

POSTGRES_PASSWORD: admin_pass

POSTGRES_DB: test_db

depends_on:

– primary-db

command: postgres -c ‘hot_standby=on’

ports:

– “5433:5432”

volumes:

– pgdata:/var/lib/postgresql/data

links:

– primary-db

replica-db2:

image: postgres:13

environment:

POSTGRES_USER: admin

POSTGRES_PASSWORD: admin_pass

POSTGRES_DB: test_db

depends_on:

– primary-db

command: postgres -c ‘hot_standby=on’

ports:

– “5434:5432”

volumes:

– pgdata:/var/lib/postgresql/data

links:

– primary-db

volumes:

pgdata:

This setup mimics a master-slave replication environment, where primary-db handles all write operations, and replica-db1 and replica-db2 serve as read-only replicas.

Configuring HAProxy for Load Balancing Read Queries

To balance the read queries between replicas, we’ll use HAProxy. HAProxy will route write operations to the primary instance and distribute read queries across the replicas. Create the following haproxy.cfg configuration file:

frontend postgres

bind *:5432

mode tcp

default_backend primary

backend primary

mode tcp

server primary-db primary-db:5432 check

backend replicas

mode tcp

balance roundrobin

server replica-db1 replica-db1:5432 check

server replica-db2 replica-db2:5432 check

To run HAProxy in Docker:

docker run -d –name haproxy \

-v $(pwd)/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro \

-p 5432:5432 haproxy:latest

All write queries will be sent to primary-db, and read queries will be load-balanced between replica-db1 and replica-db2.

Implementing Read-Write Query Splitting in SQLAlchemy with Python

Using separate sessions, we will now configure SQLAlchemy in Python to direct write operations to the primary database and read operations to the replicas.

Here is an example of how to configure SQLAlchemy for read-write splitting:

from sqlalchemy import create_engine, sessionmaker

from sqlalchemy.orm import scoped_session

# Create engine for write (primary) database

write_engine = create_engine(‘postgresql://admin:admin_pass@localhost:5432/test_db’)

# Create engine for read (replicas) databases

read_engine = create_engine(‘postgresql://admin:admin_pass@localhost:5432/test_db’)

# Session maker for write session

write_session = scoped_session(sessionmaker(bind=write_engine))

# Session maker for read session

read_session = scoped_session(sessionmaker(bind=read_engine))

# Example: Write to the primary database

def create_data():

with write_session() as session:

session.execute(“INSERT INTO users (name) VALUES (‘John Doe’)”)

session.commit()

# Example: Read from the replica database

def read_data():

with read_session() as session:

result = session.execute(“SELECT * FROM users”)

for row in result:

print(row)

In this setup, the create_data() function writes data to the primary database, while read_data() reads from the replicas.

Testing the Implementation: Creating and Reading Data with Separate Sessions

Now that we have configured read-write splitting let’s test it by creating and reading data:

Inserting Data: Use the create_data() function to insert a new user record into the database. The primary instance will handle this.
Reading Data: To retrieve the data, call the read_data() function. HAProxy will load-balance the request to one of the replicas.

Addressing Limitations and Considering Production-Ready Solutions

While this setup works for local development and testing, it comes with several limitations for production use:

Single point of failure: In this case, HAProxy becomes a single point of failure, so configuring high availability for it is necessary.
Data consistency: Replication lag may cause stale reads from the replicas. Solutions such as quorum-based reads or stale reads tolerances may be considered.
Connection Pooling: In production, optimizing connection pooling and ensuring session management is essential to avoid database overload.

Conclusion and Next Steps: Exploring Further Optimizations and Enhancements

You have now set up a mock Aurora PostgreSQL cluster with read-write splitting using Docker, HAProxy, and SQLAlchemy. While this setup is great for testing, production deployments should focus on optimizing HAProxy, replication management, and implementing fallback strategies for consistency issues.

Further optimizations include integrating connection pooling and monitoring replication health to ensure the availability of the read replicas.