Managing a high-volume PostgreSQL database is a demanding responsibility. As data grows into hundreds of gigabytes or even terabytes, performance tuning, monitoring, and optimization become critical. While PostgreSQL offers advanced features, the effectiveness of a database administrator largely depends on their ability to use the right diagnostic queries at the right time.
This blog covers the top fifteen essential PostgreSQL queries every DBA should know. These queries help identify performance issues, track bottlenecks, analyze growth patterns, monitor activities, understand index usage, and keep large systems stable and efficient.
Each section explains what the query does, why a DBA needs it, and when it is useful.
1. Identify Unused Indexes (idx_scan = 0)
Unused indexes increase storage, slow down write performance, and create unnecessary maintenance overhead.
Purpose
To find indexes that have never been scanned by PostgreSQL.
Query
SELECT
s.schemaname,
s.relname AS table_name,
s.indexrelname AS index_name,
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
s.idx_scan
FROM
pg_stat_user_indexes s
JOIN
pg_index i ON i.indexrelid = s.indexrelid
WHERE
s.idx_scan = 0
AND i.indisunique = false
ORDER BY
pg_relation_size(s.indexrelid) DESC;
Why it matters
Indexes with zero scans are candidates for removal because they:
- Slow down INSERT, UPDATE and DELETE operations
- Increase autovacuum load
- Consume disk space
You can also explore the postgres system view based on the indexes
pg_stat_user_indexes - PostgreSQL system view that shows index usage statistics only for indexes on user-defined tables. It excludes system catalogs and internal tables.
pg_stat_all_indexes - A system view that provides index usage statistics for both user-defined tables and system catalog tables. This includes internal indexes used by PostgreSQL itself.
2. Fetch Long-Running Queries Using pg_stat_statements
Long-running queries are often the root cause of slow applications and database latency.
Before run this query, you need to install the extension named pg_stat_statements
Create extension pg_stat_statements;
You can verify the installation like this
select * from pg_available_extensions where name = 'pg_stat_statements';
Purpose
To find queries with the highest execution time, most calls, or slowest performance.
Query
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
rows
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 10;
Why it matters
This helps a DBA quickly detect:
- Slow queries
- Missing indexes
- Inefficient query patterns
3. Detect Table Bloat
Table bloat leads to performance degradation and wasted storage.
Before using this query, you need to install the extension named pgstatstuple
create extension pgstatstuple;
Purpose
To estimate bloat percentage based on dead tuples.
Query
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
round((pgstattuple(schemaname||'.'||tablename)).free_percent::numeric, 2) AS bloat_percent
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY (pgstattuple(schemaname||'.'||tablename)).free_percent DESC;
Why it matters
High bloat requires immediate action such as:
- VACUUM
- ANALYZE
- REINDEX
- CLUSTER
4. Check Autovacuum Activity
Autovacuum ensures table health by cleaning dead tuples.
Purpose
To see what autovacuum jobs are currently running.
Query
SELECT
pv.pid,
pv.datname,
c.relname AS table_name,
pv.phase,
pv.heap_blks_total,
pv.heap_blks_scanned,
pv.heap_blks_vacuumed,
round(100.0 * pv.heap_blks_scanned / NULLIF(pv.heap_blks_total, 0), 2) AS scan_percent,
now() - pa.xact_start AS running_time,
pa.wait_event_type,
pa.wait_event
FROM pg_stat_progress_vacuum pv
LEFT JOIN pg_class c ON pv.relid = c.oid
LEFT JOIN pg_stat_activity pa ON pv.pid = pa.pid
ORDER BY pa.xact_start;
Why it matters
This helps identify:
- Stuck vacuums
- Long-running vacuum operations
- Tables requiring attention
5. Find the Largest Tables in the Database
Large tables are expensive to maintain and require careful planning.
Query
SELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM
pg_catalog.pg_statio_user_tables
ORDER BY
pg_total_relation_size(relid) DESC
LIMIT 10;
Why it matters
Useful to identify:
- Candidates for partitioning
- Storage-heavy tables
- Tables requiring optimization
6. Find the Largest Indexes
Indexes can sometimes grow larger than the underlying tables.
Query
SELECT
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM
pg_stat_user_indexes
ORDER BY
pg_relation_size(indexrelid) DESC
LIMIT 10;
Why it matters
- Large indexes slow down query execution because PostgreSQL must scan bigger index structures.
- They consume more shared memory and I/O, increasing system load.
- Autovacuum and REINDEX operations take longer and become more expensive.
- Oversized indexes often indicate bloat or poor indexing strategy.
- They increase disk usage and can negatively affect write performance on the table.
7. Identify Missing Indexes Using Sequential Scan Counts
Sequential scans on large tables typically indicate lack of proper indexing.
Query
SELECT
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan
FROM
pg_stat_all_tables
ORDER BY
seq_scan DESC
LIMIT 10;
Why it matters
- High sequential scan counts on large tables usually indicate missing or ineffective indexes.
- Sequential scans increase I/O load and slow down query response time.
- They can cause unnecessary CPU usage because PostgreSQL must read more rows than needed.
- Identifying these tables early helps prioritize index creation or optimization.
- Reducing sequential scans leads to faster lookups and more efficient query plans.
8. Estimate Table Row Count Quickly
Counting rows in large tables using COUNT(*) is expensive.
Query
SELECT
relname AS table,
n_live_tup AS estimated_rows
FROM
pg_stat_all_tables
ORDER BY
n_live_tup DESC
LIMIT 10;
Why it matters
- It avoids costly full-table scans that can slow down large production systems.
- Provides quick row-count estimates using PostgreSQL’s internal statistics.
- Helps DBAs evaluate table growth trends without impacting performance.
- Useful for capacity planning, partitioning decisions, and storage forecasting.
- Enables faster analysis during migrations, optimizations, and troubleshooting.
9. Monitor Active Connections
Connection overload can cause performance issues or lead to database crashes.
Query
SELECT
datname,
count(*) AS total_connections
FROM
pg_stat_activity
GROUP BY 1;
Why it matters
- Too many open connections can exhaust resources and degrade database performance.
- Helps detect connection leaks caused by application bugs or improper session handling.
- Identifies sudden traffic spikes that may require scaling or rate-limiting.
- Reveals whether connection pooling is configured correctly or overused.
- Prevents connection saturation, which can cause timeouts or database outages.
10. Check Row Locks and Blocking Sessions
Lock contention slows down queries and causes deadlocks.
Query
SELECT
locktype,
relation::regclass,
mode,
granted,
pid,
now() - query_start AS waiting_since,
query
FROM pg_locks
JOIN pg_stat_activity USING (pid)
ORDER BY waiting_since DESC;
Why it matters
- Detects blocking sessions that are slowing down or halting critical queries.
- Helps identify long-running transactions that are holding locks unnecessarily.
- Reveals lock types and lock modes, aiding in understanding concurrency issues.
- Prevents deadlocks by showing queries waiting for each other in a chain.
- Enables quick resolution of performance bottlenecks during high-load periods
11. List All Active Queries
Useful when diagnosing slow systems or high CPU usage.
Query
SELECT *
FROM pg_stat_activity
WHERE state = 'active';
Why it matters
- Provides real-time visibility into what each session is currently executing.
- Helps diagnose slow performance by identifying long-running or expensive queries.
- Useful for detecting unoptimized queries during peak system load.
- Allows DBAs to pinpoint application components generating heavy workloads.
- Essential for immediate troubleshooting when CPU, I/O, or memory usage suddenly spikes.
12. Monitor All Progress Operations
PostgreSQL provides internal progress views for operations such as ANALYZE, COPY, CLUSTER, etc.
Purpose
To inspect the progress of internal operations.
Query
SELECT * FROM pg_stat_progress_analyze;
SELECT * FROM pg_stat_progress_basebackup;
SELECT * FROM pg_stat_progress_cluster;
SELECT * FROM pg_stat_progress_copy;
SELECT * FROM pg_stat_progress_create_index;
SELECT * FROM pg_stat_progress_vacuum;
Why it matters
- Allows DBAs to track exactly how far maintenance tasks such as ANALYZE, VACUUM, and CREATE INDEX have progressed.
- Helps identify operations that are stuck or running slower than expected.
- Provides insight into long-running tasks that may be impacting system performance.
- Useful for planning maintenance windows by estimating remaining task duration.
- Enables quick diagnosis of bottlenecks during heavy maintenance operations.
13. Inspect Table Statistics Using pg_stats
pg_stats shows column-level statistics used by the query planner.
Query
SELECT *
FROM pg_stats
ORDER BY tablename, attname;
Why it matters
Helps understand:
- Column distinctness
- Histogram data
- Most common values
Critical for diagnosing planner misestimation.
14. Identify Tables with Highest Dead Tuples
Dead tuples contribute to bloat and slow performance.
Query
SELECT
schemaname,
relname,
n_dead_tup
FROM pg_stat_all_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
Why it matters
- High dead-tuple counts indicate tables that are accumulating bloat and degrading performance.
- Excessive dead tuples slow down sequential scans, index scans, and query execution.
- They increase table and index size, consuming unnecessary disk space.
- Identifying these tables early helps prioritize VACUUM or REINDEX operations.
- Reducing dead tuples improves autovacuum efficiency and overall database stability.
15. Monitor Database-Wide Cache Hit Ratio
A high cache hit ratio indicates excellent buffer usage.
Query
SELECT
blks_hit * 100.0 / (blks_read + blks_hit) AS cache_hit_ratio
FROM pg_stat_database
WHERE datname = current_database();
Why it matters
- The cache hit ratio reflects how efficiently PostgreSQL serves data from memory instead of disk.
- A low ratio indicates excessive disk reads, leading to slower query performance.
- Helps identify memory pressure, poor caching behavior, or insufficient shared buffers.
- Reveals indexing issues when many queries cannot be satisfied from cached index pages.
- Monitoring this metric ensures the database is optimized for high-performance workloads.
Conclusion
A skilled PostgreSQL DBA must be able to diagnose problems quickly, identify bottlenecks, and keep large databases running efficiently. The fifteen queries in this guide form a core toolkit for managing real-world production systems. They provide insights into unused indexes, slow queries, bloat accumulation, lock contention, maintenance activity, and overall database health.
Using these queries regularly helps ensure better performance, more efficient indexing strategies, faster maintenance operations, and higher system stability. Whether you manage an ERP system, eCommerce platform, SaaS application, or analytics environment, these foundational queries will help you maintain a reliable and well-optimized PostgreSQL database.