PostgreSQL deadlock detection and prevention
A PostgreSQL deadlock occurs when two or more transactions are each waiting for a lock held by the other, creating a cycle that can never resolve on its own. PostgreSQL detects the cycle automatically and terminates one transaction with error code 40P01. Prevention comes down to consistent lock ordering across all transactions that touch the same rows or tables.
What the error looks like
When PostgreSQL terminates a deadlocked transaction, the application receives an error similar to this:
ERROR: deadlock detected
DETAIL: Process 18420 waits for ShareLock on transaction 7391; blocked by process 18431.
Process 18431 waits for ShareLock on transaction 7390; blocked by process 18420.
HINT: See server log for query details.
The DETAIL line describes the cycle: each process is blocked by the other. The HINT points to the server log, where you will find the full query text for each participant if log_lock_waits is enabled.
Find recent deadlocks
Two sources give you visibility into active and recent lock contention.
Quickest method (Postgres 9.6+): pg_blocking_pids() covers all lock types — relation, page, tuple, and transaction-ID — in one line:
SELECT pid, pg_blocking_pids(pid) AS blocked_by
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0;Detailed blocking chains via pg_locks joined to pg_stat_activity (useful when you need the query text for each side of the block):
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity AS blocked
JOIN pg_locks AS bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks AS kl ON kl.transactionid = bl.transactionid AND kl.granted
JOIN pg_stat_activity AS blocking ON blocking.pid = kl.pid
WHERE blocked.wait_event_type = 'Lock';Server-log settings to enable before you need them:
-- postgresql.conf
log_lock_waits = on
deadlock_timeout = 1sWith log_lock_waits = on, any lock wait that exceeds deadlock_timeout is written to the server log before PostgreSQL checks for a cycle. This gives you the full query text of both participants, which the pg_stat_activity view alone will not preserve after the fact.
Root causes
1. Inconsistent lock ordering
The most common cause. Transaction A acquires a lock on row X, then tries to lock row Y. Transaction B acquires a lock on row Y, then tries to lock row X. Neither can proceed.
T1: LOCK orders row 42 → wait for invoices row 99
T2: LOCK invoices row 99 → wait for orders row 42
↑ cycle
2. Long transactions holding locks
A transaction that stays open for seconds or minutes increases the window during which another transaction can arrive and form a cycle. Long transactions also inflate lock queue depth, which multiplies collision probability.
3. Application retry storms
When a 40P01 error triggers an immediate retry without a rollback and backoff, the retried transaction re-acquires the same locks in the same order at the same moment. This recreates the deadlock repeatedly until one transaction happens to win the timing race.
Fix: enforce consistent lock ordering
The most reliable fix is to acquire locks in the same order everywhere in the application. If multiple code paths update the same set of rows, they must visit those rows in a canonical order (for example, ascending by primary key).
-- Instead of updating rows in arbitrary order:
UPDATE accounts SET balance = balance - 100 WHERE id = 7;
UPDATE accounts SET balance = balance + 100 WHERE id = 3;
-- Always sort by id ascending:
UPDATE accounts SET balance = balance - 100 WHERE id = 3;
UPDATE accounts SET balance = balance + 100 WHERE id = 7;In application code, sort the IDs before issuing the updates:
# Python example
row_ids = sorted([row_a_id, row_b_id])
for rid in row_ids:
cursor.execute("UPDATE accounts SET balance = ... WHERE id = %s", (rid,))Fix: use SELECT FOR UPDATE carefully
SELECT FOR UPDATE locks every row in the result set at read time. On large result sets this creates broad lock surfaces that increase collision probability.
Prefer locking only the specific rows you will modify:
-- Broad lock, avoid on large sets:
SELECT * FROM orders WHERE status = 'pending' FOR UPDATE;
-- Narrow lock, prefer this:
SELECT * FROM orders WHERE id = 42 FOR UPDATE;For coordination patterns that do not require row-level data, advisory locks are lighter:
SELECT pg_advisory_xact_lock(hashtext('job-processor-' || queue_name::text));Advisory locks are released automatically at transaction end and do not appear in row-level lock queues.
Fix: shorten transactions
Every additional statement inside a transaction extends the time during which held locks can collide with competing transactions. Move work that does not need transactional guarantees outside the transaction boundary.
-- Before: non-transactional work inside transaction
BEGIN;
PERFORM send_notification(user_id); -- external side effect, does not need ACID
UPDATE subscriptions SET status = 'active' WHERE id = $1;
COMMIT;
-- After: keep only the write inside the transaction
PERFORM send_notification(user_id);
BEGIN;
UPDATE subscriptions SET status = 'active' WHERE id = $1;
COMMIT;Fetching data for display, calling external services, and computing values that do not depend on locked rows should all happen before BEGIN.
Fix: handle retries correctly
When your application catches error code 40P01, the transaction must be fully rolled back before retrying. Issuing new statements on a connection in an error state will fail. After rollback, apply a brief randomized delay before retrying to break timing symmetry between competing transactions.
import psycopg2
import time
import random
MAX_RETRIES = 3
for attempt in range(MAX_RETRIES):
try:
with conn: # context manager calls ROLLBACK on exception
cur.execute("UPDATE ...")
cur.execute("UPDATE ...")
break
except psycopg2.errors.DeadlockDetected:
if attempt == MAX_RETRIES - 1:
raise
time.sleep(0.05 + random.uniform(0, 0.1)) # jitterIn Java (JDBC), check SQLState for "40P01". In Go (lib/pq), check pq.Error.Code. In every case, call tx.Rollback() before constructing a new transaction.
Prevention checklist
Use this checklist when reviewing code that touches shared rows.
-- 1. Enable lock-wait logging (do this in postgresql.conf, not per-session)
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET deadlock_timeout = '1s';
SELECT pg_reload_conf();
-- 2. Check for current lock contention before deploying changes
SELECT relation::regclass, mode, granted, pid
FROM pg_locks
WHERE relation IS NOT NULL
ORDER BY relation, granted;
-- 3. Identify long-running transactions that are holding locks
SELECT pid, now() - xact_start AS duration, state, LEFT(query, 80)
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND state != 'idle'
ORDER BY duration DESC;Operational checklist:
- Acquire locks in a consistent, canonical order (ascending primary key is the simplest convention).
- Keep transactions short. Fetch data before
BEGIN, send notifications afterCOMMIT. - Use
SELECT FOR UPDATEonly on specific rows, not broad result sets. - Roll back fully before retrying on 40P01. Add jitter to retry delays.
- Enable
log_lock_waitsand review server logs after any deadlock incident. - During load testing, use pg_stat_activity and
pg_locksto surface contention before it reaches production.
Automated root-cause analysis tools, including NOFire AI, can correlate deadlock errors in application logs with the corresponding PostgreSQL lock-wait entries, reducing the time from alert to fix. See the AI SRE Benchmark to go deeper on how AI-assisted diagnosis compares to manual triage across database error classes.
Related debugging guides
Database lock issues often cascade into the following. See also:
Frequently asked questions
- Does PostgreSQL resolve deadlocks automatically?
- Yes. PostgreSQL detects deadlock cycles and terminates one transaction (the victim). The application receives a 40P01 error and must handle the retry.
- What is the difference between a deadlock and a lock wait?
- A lock wait is one transaction waiting for a lock held by another. A deadlock is a cycle. Lock waits resolve when the holding transaction commits; deadlocks require PostgreSQL to intervene.
- What is a good value for deadlock_timeout?
- 1 second is the PostgreSQL default and a reasonable production value. Lower values (200ms) help in development to surface deadlocks faster.
Go deeper: the AI SRE Benchmark
Book a demo