Documentationpgraft Documentation

pgraft Performance Tuning

Resource Sizing

CPU

Assign at least 4 CPU cores per node. pgraft leverages PostgreSQL background workers plus the Go Raft process, so reserve dedicated cores for Raft RPC handling under sustained write loads.

Memory

Allocate shared_buffers at 25% of RAM with a minimum of 1 GB. Additional memory keeps snapshots and replication buffers hot and reduces disk churn during catch-up.

Storage

Prefer NVMe SSDs for WAL and Raft logs. Configure wal_keep_size large enough to withstand follower outages (≥ 4 GB recommended).

Consensus Timing Profiles

Select heartbeat and election timeouts that balance failure detection with leader stability.

Low latency (LAN)

pgraft.election_timeout = 400
pgraft.append_batch_size = 256
pgraft.replay_parallelism = 4

Balanced (Default)

pgraft.heartbeat_interval = 100
pgraft.election_timeout = 1000
pgraft.append_batch_size = 512
pgraft.replay_parallelism = 6

Geo-distributed

pgraft.heartbeat_interval = 180
pgraft.election_timeout = 2200
pgraft.append_batch_size = 1024
pgraft.replay_parallelism = 8

Set these values in postgresql.conf or persist them using SELECT pgraft_set_config(...) followed by pgraft_save_config().

Batching & Log Throughput

Adjust batching parameters to match transaction volume. Larger batches increase throughput at the expense of latency.

Recommended batching settings

# Control the size of each AppendEntries RPC (entries)
pgraft.append_batch_size = 512

# Allow pipelining multiple AppendEntries in flight
pgraft.max_inflight_batches = 4

# Commit when a majority acknowledges (default) -- keep enabled
pgraft.strict_quorum_commit = on

Monitor batching efficiency

SELECT avg_batch_size,
       avg_append_latency_ms,
       pending_batches
  FROM pgraft_log_get_stats();

Disk & WAL Optimization

Ensure WAL and Raft logs are flushed efficiently:

  • Enable wal_compression = on to reduce network bandwidth for AppendEntries.
  • Consider wal_recycle = on to reuse WAL files and mitigate filesystem fragmentation.
  • Use dedicated WAL storage or wal_keep_size to buffer follower downtime without forcing snapshot installs.
  • Monitor pg_stat_bgwriter for checkpoints that could stall Raft application.

Checkpoint tuning

# Write smaller checkpoints more frequently to avoid bursts
checkpoint_timeout = '5min'
max_wal_size = '8GB'
min_wal_size = '2GB'

Read Scaling & Consistency

pgraft allows follower reads when configured appropriately. Adjust staleness tolerances to satisfy query requirements.

Follower read configuration

# Permit follower reads with bounded staleness
pgraft.read_consistency = 'bounded_staleness'
pgraft.read_staleness_max_ms = 500

# Optional: strongly consistent reads (leader only)
# pgraft.read_consistency = 'leader'

Check read routing

SELECT node_id,
       read_role,
       last_apply_lsn
  FROM pgraft_get_nodes();

Benchmarking & Observability

Use built-in metrics to validate tuning changes and detect regressions.

Key metrics queries

-- Throughput (transactions committed per second)
SELECT date_trunc('minute', event_time) AS minute,
       SUM(committed_entries) AS entries_committed
  FROM pgraft_metrics_rolling
 GROUP BY 1
 ORDER BY 1 DESC
 LIMIT 10;

-- Latency distribution for AppendEntries RPCs
SELECT percentile_bucket,
       avg_latency_ms,
       count
  FROM pgraft_rpc_latency_histogram;

Recommended alert thresholds

# Lag warning
SELECT node_id, replication_lag_bytes
  FROM pgraft_get_nodes()
 WHERE replication_lag_bytes > 67108864;  -- 64 MB

# Leadership churn
SELECT COUNT(*)
  FROM pgraft_get_events()
 WHERE event_type = 'election'
   AND event_timestamp > now() - interval '10 minutes';

Troubleshooting Performance

High replication lag

  • Verify network RTT; consider increasing pgraft.append_batch_size.
  • Ensure followers have sufficient I/O bandwidth. Watch pg_stat_io counters.
  • Check for slow checkpoints or autovacuum activity on followers.

Frequent elections

  • Increase pgraft.election_timeout to account for busy leader workloads.
  • Inspect pgraft_log_get_stats() for RPC failures indicating network issues.
  • Confirm CPU saturation is not preventing timely heartbeat processing.

Slow snapshot installs

  • Upgrade follower disk throughput or reduce snapshot size via pgraft.snapshot_threshold.
  • Take manual base backups and use pg_basebackup for extremely large datasets.

Write latency spikes

  • Inspect avg_append_latency_ms via pgraft_log_get_stats().
  • Verify synchronous replication is not waiting on a failed follower (consider temporarily demoting it).