pgraft Performance Tuning
Resource Sizing
CPU
Assign at least 4 CPU cores per node. pgraft leverages PostgreSQL background workers plus the Go Raft process, so reserve dedicated cores for Raft RPC handling under sustained write loads.
Memory
Allocate shared_buffers at 25% of RAM with a minimum of 1 GB. Additional memory keeps snapshots and replication buffers hot and reduces disk churn during catch-up.
Storage
Prefer NVMe SSDs for WAL and Raft logs. Configure wal_keep_size large enough to withstand follower outages (≥ 4 GB recommended).
Consensus Timing Profiles
Select heartbeat and election timeouts that balance failure detection with leader stability.
Low latency (LAN)
pgraft.election_timeout = 400
pgraft.append_batch_size = 256
pgraft.replay_parallelism = 4Balanced (Default)
pgraft.heartbeat_interval = 100
pgraft.election_timeout = 1000
pgraft.append_batch_size = 512
pgraft.replay_parallelism = 6Geo-distributed
pgraft.heartbeat_interval = 180
pgraft.election_timeout = 2200
pgraft.append_batch_size = 1024
pgraft.replay_parallelism = 8Set these values in postgresql.conf or persist them using SELECT pgraft_set_config(...) followed by pgraft_save_config().
Batching & Log Throughput
Adjust batching parameters to match transaction volume. Larger batches increase throughput at the expense of latency.
Recommended batching settings
# Control the size of each AppendEntries RPC (entries)
pgraft.append_batch_size = 512
# Allow pipelining multiple AppendEntries in flight
pgraft.max_inflight_batches = 4
# Commit when a majority acknowledges (default) -- keep enabled
pgraft.strict_quorum_commit = onMonitor batching efficiency
SELECT avg_batch_size,
avg_append_latency_ms,
pending_batches
FROM pgraft_log_get_stats();Disk & WAL Optimization
Ensure WAL and Raft logs are flushed efficiently:
- Enable
wal_compression = onto reduce network bandwidth for AppendEntries. - Consider
wal_recycle = onto reuse WAL files and mitigate filesystem fragmentation. - Use dedicated WAL storage or
wal_keep_sizeto buffer follower downtime without forcing snapshot installs. - Monitor
pg_stat_bgwriterfor checkpoints that could stall Raft application.
Checkpoint tuning
# Write smaller checkpoints more frequently to avoid bursts
checkpoint_timeout = '5min'
max_wal_size = '8GB'
min_wal_size = '2GB'Read Scaling & Consistency
pgraft allows follower reads when configured appropriately. Adjust staleness tolerances to satisfy query requirements.
Follower read configuration
# Permit follower reads with bounded staleness
pgraft.read_consistency = 'bounded_staleness'
pgraft.read_staleness_max_ms = 500
# Optional: strongly consistent reads (leader only)
# pgraft.read_consistency = 'leader'Check read routing
SELECT node_id,
read_role,
last_apply_lsn
FROM pgraft_get_nodes();Benchmarking & Observability
Use built-in metrics to validate tuning changes and detect regressions.
Key metrics queries
-- Throughput (transactions committed per second)
SELECT date_trunc('minute', event_time) AS minute,
SUM(committed_entries) AS entries_committed
FROM pgraft_metrics_rolling
GROUP BY 1
ORDER BY 1 DESC
LIMIT 10;
-- Latency distribution for AppendEntries RPCs
SELECT percentile_bucket,
avg_latency_ms,
count
FROM pgraft_rpc_latency_histogram;Recommended alert thresholds
# Lag warning
SELECT node_id, replication_lag_bytes
FROM pgraft_get_nodes()
WHERE replication_lag_bytes > 67108864; -- 64 MB
# Leadership churn
SELECT COUNT(*)
FROM pgraft_get_events()
WHERE event_type = 'election'
AND event_timestamp > now() - interval '10 minutes';Troubleshooting Performance
High replication lag
- Verify network RTT; consider increasing
pgraft.append_batch_size. - Ensure followers have sufficient I/O bandwidth. Watch
pg_stat_iocounters. - Check for slow checkpoints or autovacuum activity on followers.
Frequent elections
- Increase
pgraft.election_timeoutto account for busy leader workloads. - Inspect
pgraft_log_get_stats()for RPC failures indicating network issues. - Confirm CPU saturation is not preventing timely heartbeat processing.
Slow snapshot installs
- Upgrade follower disk throughput or reduce snapshot size via
pgraft.snapshot_threshold. - Take manual base backups and use
pg_basebackupfor extremely large datasets.
Write latency spikes
- Inspect
avg_append_latency_msviapgraft_log_get_stats(). - Verify synchronous replication is not waiting on a failed follower (consider temporarily demoting it).