Documentationpgraft Documentation

pgraft Cluster Management

Bootstrap a New Cluster

Run these commands after installing pgraft and configuring the leader node. They initialize metadata, elect the first leader, and confirm that the cluster is healthy.

Initialize Raft metadata

-- Run on the leader after CREATE EXTENSION
SELECT pgraft_init();

-- Optional: set a human-friendly cluster label
SELECT pgraft_set_config('cluster_name', 'production-cluster');

-- Verify leader election and quorum
SELECT pgraft_is_leader() AS is_leader,
       pgraft_get_term() AS current_term,
       pgraft_quorum_met() AS quorum_ready;

Review current members

-- Shows the local node (leader) after initialization
SELECT * FROM pgraft_get_nodes();

-- Detailed cluster status including commit indexes
SELECT * FROM pgraft_get_cluster_status();

Add and Remove Nodes

Prepare each follower with the same postgresql.conf identity settings, then register it with the leader. Removing nodes requires leader confirmation to maintain quorum.

Add follower nodes

-- Execute on the elected leader once the follower database is running
SELECT pgraft_add_node(2, '10.0.0.12', 7002);
SELECT pgraft_add_node(3, '10.0.0.13', 7003);

-- Monitor replication catch-up
SELECT node_id,
       state,
       match_index,
       commit_index
  FROM pgraft_get_nodes();

Remove a node gracefully

-- Triggered from the leader to revoke membership
SELECT pgraft_remove_node(3);

-- Confirm removal and quorum health
SELECT pgraft_quorum_met() AS quorum_ok,
       pgraft_get_nodes();

Operational Monitoring

pgraft exposes diagnostic functions for Raft internals. Use them to track leadership, log replication, and worker health in dashboards or alerts.

Health overview

-- Leader identity, Raft term, and election metrics
SELECT * FROM pgraft_get_cluster_status();

-- Per-node connectivity and Raft lag
SELECT node_id,
       state,
       last_heartbeat_ms,
       replication_lag_bytes
  FROM pgraft_get_nodes();

Log and snapshot telemetry

-- Append/commit counts, snapshot cadence, and RPC statistics
SELECT * FROM pgraft_log_get_stats();

-- Inspect last five leadership transitions
SELECT *
  FROM pgraft_get_events()
 ORDER BY event_timestamp DESC
 LIMIT 5;

Failover & Leadership Control

Automatic elections occur when the leader misses heartbeat deadlines. Use the following procedures to simulate failover, promote a new leader, or pause elections during maintenance.

Manual leadership transfer

-- Ask the current leader to step down and trigger an election
SELECT pgraft_transfer_leadership(2);

-- Pause elections when taking the leader offline (e.g., maintenance)
SELECT pgraft_set_config('failover_enabled', 'false');
-- Resume elections after maintenance concludes
SELECT pgraft_set_config('failover_enabled', 'true');

Failover drill checklist

# 1. Confirm cluster is healthy
psql -c "SELECT pgraft_quorum_met();"

# 2. Trigger leadership transfer
psql -c "SELECT pgraft_transfer_leadership(2);"

# 3. Validate new leader
psql -c "SELECT pgraft_is_leader(), pgraft_get_leader();"

# 4. Re-enable automatic failover if disabled
psql -c "SELECT pgraft_set_config('failover_enabled', 'true');"

Rolling Maintenance Workflow

Keep quorum while patching or restarting individual members. Always drain workload and verify replication catch-up before shutting down a node.

  1. 1. Drain client traffic. Redirect application connections away from the target node or remove it from connection poolers.
  2. 2. Ensure follower status. If the node is leader, run pgraft_transfer_leadership() to promote another server.
  3. 3. Wait for log sync. Use SELECT replication_lag_bytes from pgraft_get_nodes() to confirm lag is zero.
  4. 4. Stop PostgreSQL. Apply OS patches or package upgrades and restart the instance.
  5. 5. Rejoin the cluster. After startup, pgraft automatically reconnects and catches up; monitor state = 'follower'.