Documentationpgraft Documentation

pgraft Raft Protocol

Consensus Overview

Raft keeps pgraft clusters converged on the same WAL-derived state, even with failures. One leader accepts client writes, followers replicate entries, and terms track leadership epochs. When the leader changes, Raft ensures only the freshest log can win, eliminating double commits.

Leader election

Deterministic timeouts pick a single leader per term. Heartbeats advertise leadership and reset follower timers.

Log replication

Write-ahead log summaries propagate via AppendEntries RPCs. Followers persist and acknowledge entries sequentially.

Consensus safety

Quorum commits, term tracking, and log matching prevent divergent history while keeping read replicas available.

Node Roles

Leader

Accepts SQL writes, appends them to the Raft log, and streams AppendEntries to followers until a quorum confirms.

Leadership probe

SELECT pgraft_is_leader();

Follower

Receives AppendEntries, caches uncommitted entries, and applies changes once commit_index advances.

Follower heartbeat lag

SELECT node_id,
       last_heartbeat_ms
  FROM pgraft_get_nodes()
 WHERE node_id = pg_backend_pid();

Candidate

Transitional role triggered when heartbeats stop. Requests votes and upgrades to leader after majority approval.

Election metrics

SELECT current_term,
       elections_triggered
  FROM pgraft_get_cluster_status()
 LIMIT 1;

Leader Election Flow

Each node runs an independent timer between heartbeats. Randomized offsets avoid simultaneous elections, while vote rules enforce log freshness.

1

Election timeout

Follower stops receiving heartbeats within the configured election timeout (default 1000 ms).

2

Candidate declaration

Follower increments the term, becomes candidate, and votes for itself.

3

Vote solicitation

Candidate issues RequestVote RPCs to the rest of the cluster and compares log freshness.

4

Leader promotion

Candidate becomes leader after receiving a majority of votes and immediately begins heartbeats.

Manual leadership transfer

-- Redirect leadership before maintenance
SELECT pgraft_transfer_leadership(2);
SELECT pgraft_get_leader();

Log Replication Lifecycle

Leaders batch WAL summaries into Raft entries, replicate them concurrently, and advance commit indexes once a quorum acknowledges.

Replication vitals

SELECT node_id,
       match_index,
       next_index,
       commit_index,
       lag_entries,
       state
  FROM pgraft_log_get_replication_status()
 ORDER BY lag_entries DESC;

Commit index

Highest log entry known to be replicated on a majority of nodes. Entries at or below this index are durable.

Last applied

Highest log entry applied to the PostgreSQL state machine. Followers eventually converge to the leader’s commit index.

Consensus Safety

Raft’s invariants guarantee that clients never observe divergent history, even if the cluster re-elects multiple leaders during failover or partition events.

Election safety

At most one leader can be elected in a given term, preventing split-brain.

Leader append-only

Leaders only append entries to their log; previously committed history is immutable.

Log matching

If two logs contain the same index and term, all prior entries are identical.

Leader completeness

Committed entries must exist in every subsequent leader for higher-numbered terms.

Tuning Consensus Timing

Adjust timeouts and batching to match latency budgets. Keep election_timeout roughly 10× the heartbeat interval to avoid premature elections under load.

postgresql.conf

# postgresql.conf (excerpt)
pgraft.heartbeat_interval = '75ms'
pgraft.election_timeout   = '900ms'
pgraft.append_batch_size  = 512
pgraft.snapshot_threshold = 8000

Verify effective values

SELECT name,
       setting,
       source
  FROM pgraft_get_config()
 WHERE name IN ('heartbeat_interval', 'election_timeout', 'append_batch_size', 'snapshot_threshold');