pgraft Cluster Management
Bootstrap a New Cluster
Run these commands after installing pgraft and configuring the leader node. They initialize metadata, elect the first leader, and confirm that the cluster is healthy.
Initialize Raft metadata
-- Run on the leader after CREATE EXTENSION
SELECT pgraft_init();
-- Optional: set a human-friendly cluster label
SELECT pgraft_set_config('cluster_name', 'production-cluster');
-- Verify leader election and quorum
SELECT pgraft_is_leader() AS is_leader,
pgraft_get_term() AS current_term,
pgraft_quorum_met() AS quorum_ready;Review current members
-- Shows the local node (leader) after initialization
SELECT * FROM pgraft_get_nodes();
-- Detailed cluster status including commit indexes
SELECT * FROM pgraft_get_cluster_status();Add and Remove Nodes
Prepare each follower with the same postgresql.conf identity settings, then register it with the leader. Removing nodes requires leader confirmation to maintain quorum.
Add follower nodes
-- Execute on the elected leader once the follower database is running
SELECT pgraft_add_node(2, '10.0.0.12', 7002);
SELECT pgraft_add_node(3, '10.0.0.13', 7003);
-- Monitor replication catch-up
SELECT node_id,
state,
match_index,
commit_index
FROM pgraft_get_nodes();Remove a node gracefully
-- Triggered from the leader to revoke membership
SELECT pgraft_remove_node(3);
-- Confirm removal and quorum health
SELECT pgraft_quorum_met() AS quorum_ok,
pgraft_get_nodes();Operational Monitoring
pgraft exposes diagnostic functions for Raft internals. Use them to track leadership, log replication, and worker health in dashboards or alerts.
Health overview
-- Leader identity, Raft term, and election metrics
SELECT * FROM pgraft_get_cluster_status();
-- Per-node connectivity and Raft lag
SELECT node_id,
state,
last_heartbeat_ms,
replication_lag_bytes
FROM pgraft_get_nodes();Log and snapshot telemetry
-- Append/commit counts, snapshot cadence, and RPC statistics
SELECT * FROM pgraft_log_get_stats();
-- Inspect last five leadership transitions
SELECT *
FROM pgraft_get_events()
ORDER BY event_timestamp DESC
LIMIT 5;Failover & Leadership Control
Automatic elections occur when the leader misses heartbeat deadlines. Use the following procedures to simulate failover, promote a new leader, or pause elections during maintenance.
Manual leadership transfer
-- Ask the current leader to step down and trigger an election
SELECT pgraft_transfer_leadership(2);
-- Pause elections when taking the leader offline (e.g., maintenance)
SELECT pgraft_set_config('failover_enabled', 'false');
-- Resume elections after maintenance concludes
SELECT pgraft_set_config('failover_enabled', 'true');Failover drill checklist
# 1. Confirm cluster is healthy
psql -c "SELECT pgraft_quorum_met();"
# 2. Trigger leadership transfer
psql -c "SELECT pgraft_transfer_leadership(2);"
# 3. Validate new leader
psql -c "SELECT pgraft_is_leader(), pgraft_get_leader();"
# 4. Re-enable automatic failover if disabled
psql -c "SELECT pgraft_set_config('failover_enabled', 'true');"Rolling Maintenance Workflow
Keep quorum while patching or restarting individual members. Always drain workload and verify replication catch-up before shutting down a node.
- 1. Drain client traffic. Redirect application connections away from the target node or remove it from connection poolers.
- 2. Ensure follower status. If the node is leader, run
pgraft_transfer_leadership()to promote another server. - 3. Wait for log sync. Use
SELECT replication_lag_bytesfrompgraft_get_nodes()to confirm lag is zero. - 4. Stop PostgreSQL. Apply OS patches or package upgrades and restart the instance.
- 5. Rejoin the cluster. After startup, pgraft automatically reconnects and catches up; monitor
state = 'follower'.