Jan 01, 2026

How to Monitor PostgreSQL Internals with the StatsMgr Extension

When you manage a busy PostgreSQL server, the built-in pg_stat_* views are extremely useful—but they only show what is happening right now. If your Prometheus/Grafana pipeline misses data for even a short window, you lose visibility into what happened during that period.

StatsMgr solves this problem. It is a PostgreSQL extension that uses a background worker and shared memory to periodically snapshot PostgreSQL’s internal statistics—including WAL, I/O, SLRU, checkpoint activity, archiver performance, and more. These snapshots are stored as a rolling history, giving you time-series visibility directly inside PostgreSQL.

In this blog, I will cover

Installing StatsMgr from source
Starting the StatsMgr background worker and understanding why its PID does not appear in pg_stat_activity
Running every statsmgr_* function with real outputs from my environment
How each set of statistics is useful for a DBA in real-world scenarios

All examples and outputs shown here are from PostgreSQL 17.7, which is the minimum version supported by StatsMgr.

1. What StatsMgr Actually Does

At a high level, StatsMgr:

Snapshots PostgreSQL’s cumulative statistics at fixed intervals
Stores a configurable number of historical snapshots in shared memory
Exposes this history via SQL functions like statsmgr_wal(), statsmgr_io(), statsmgr_slru(), etc. (PostgreSQL)

You can think of it as a tiny time-series layer sitting inside PostgreSQL, focused only on internal stats.

This is very useful when:

Your external monitoring is down, but you still want to know what happened
You need to debug a spike that already passed
You want to compute per-minute deltas of WAL, SLRU, I/O, etc., without constantly pulling from pg_stat_* yourself

2. Installing StatsMgr from Source

I built StatsMgr from the official repository using PostgreSQL 17’s PGXS:

git clone https://codeberg.org/Data-Bene/StatsMgr.git
cd StatsMgr
make USE_PGXS=1 PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config
sudo make install USE_PGXS=1 PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config

You can verify the extension is available with:

SELECT * 
FROM pg_available_extensions 
WHERE name = 'statsmgr';

You get result like this

name    | default_version | installed_version |               comment                
---------+-----------------+-------------------+--------------------------------------
 statsmgr| 0.1-alpha       | 0.1-alpha         | Statistics management for PostgreSQL

3. Two Ways to Install: Temporary vs Permanent

You can “install” StatsMgr at two levels:

3.1 Temporary (per-database) installation

This is what you already did:

CREATE EXTENSION statsmgr;

This:

Registers the SQL functions like statsmgr_wal(), statsmgr_io(), etc. in that database
Lets you manually start the background worker using statsmgr_start_main_worker()
Does not require a server restart

However, because this uses a dynamic background worker (registered at runtime), it will not automatically start when PostgreSQL restarts. After a restart, you must again run statsmgr_start_main_worker().

This is perfect for experiments and development environments.

3.2 Permanent installation via shared_preload_libraries

StatsMgr is designed to integrate with background workers and shared memory. To have it available cluster-wide and to allow it to register workers during postmaster startup, you should add it to shared_preload_libraries in postgresql.conf:

You can get the path of the config file inside psql terminal like this

postgres=# show config_file ;
               config_file               
-----------------------------------------
 /etc/postgresql/17/main/postgresql.conf

Open this file and find this parameter named shared_preload_libraries and set it’s value like this

shared_preload_libraries = 'statsmgr'

Then restart PostgreSQL.

Why this matters:

Extensions that allocate shared memory or register background workers at startup (like pg_stat_statements, pg_cron, etc.) must be in shared_preload_libraries. (PostgreSQL)
With this configuration, StatsMgr can be ready immediately when the server is up, and you can choose to have its background worker started early in the lifecycle.

4. Starting the StatsMgr Background Worker

You can start the main worker like this:

SELECT * FROM statsmgr_start_main_worker();

Output:

  pid   
---------
 1010574

On the OS side, execute this command with this process id

ps -ef | grep 1010574

Now you can see a process with this pid runs in background

postgres 1010574  1064  0 18:20 ?  00:00:00 postgres: 17/main: statsmgr main

So the background worker is clearly running as a PostgreSQL child process.

4.1 Why the PID is not in pg_stat_activity

SELECT * 
FROM pg_stat_activity 
WHERE pid = '1010574';

You got 0 records

(0 rows)

At first this looks strange, because pg_stat_activity is described as “one row per server process”.

The key detail is how background workers are defined in PostgreSQL:

A background worker can either
- Attach to shared memory only, or
- Attach to shared memory and also open a normal database connection

Only workers that behave like a database backend (with a DB connection) appear in pg_stat_activity. StatsMgr’s main worker is a lightweight process that attaches to shared memory and collects statistics; it does not open a SQL-level connection to any database.

Result:

You see the PID in the operating system process list
But it does not appear in pg_stat_activity, because from PostgreSQL’s point of view it is not a “backend” with a current query

This is normal and safe. If you want to monitor that the worker is alive, check:

The OS process list by name (statsmgr main)
Logs from the extension

5. Understanding the StatsMgr Views with Real Output

After starting the worker, you can explore all the functions related to this statsmgr extension.

Internally, each function returns multiple rows representing snapshots over time. The tick column is a key to reference snapshots, and tick_tz is the timestamp at which the snapshot was taken. As the worker keeps running, more ticks are added.

5.1 statsmgr_archiver(): WAL archiving statistics

SELECT * FROM statsmgr_archiver();

You get result like this.

archived_count | last_archived_wal | last_archived_timestamp   ...
----------------+-------------------+---------------------------
              0 |                   | 2000-01-01 05:30:00+05:30
              0 |                   | 2000-01-01 05:30:00+05:30
              0 |                   | 2000-01-01 05:30:00+05:30
              0 |                   | 2000-01-01 05:30:00+05:30

Fields are analogous to pg_stat_archiver: number of successfully archived WAL segments, last archived WAL file, last failure, and the time the stats were reset.

How this helps a DBA:

Detect if archiving is stuck (no increment in archived_count across ticks)
Investigate frequent failures using failed_count and last_failed_timestamp
Cross-check with statsmgr_wal() to see how much WAL was generated vs how much was archived

5.2 statsmgr_bgwriter(): background writer behaviour

SELECT * FROM statsmgr_bgwriter();

Output snippet:

buf_written_clean | maxwritten_clean | buf_alloc |       stat_reset_timestamp       | tick
-------------------+------------------+-----------+----------------------------------+------
              5408 |                0 |     44090 | 2025-11-16 08:41:00.668938+05:30 |    3
...

This mirrors counters from pg_stat_bgwriter.

Key columns:

buf_written_clean: buffers dirtied by backends that were written by the background writer
maxwritten_clean: times the bgwriter stopped early because it hit its write limit
buf_alloc: how many buffers were allocated

Why is it useful?

If buf_alloc keeps growing rapidly, you might be hitting memory pressure or cache churn.
If maxwritten_clean is high, the bgwriter is often hitting its limit and may not keep up with dirty buffers.
By comparing buf_written_clean across ticks, you can compute per-interval rates to see if some workload spikes create more background writes

5.3 statsmgr_checkpointer(): checkpoint cost

SELECT * FROM statsmgr_checkpointer() LIMIT 1;

Output:

num_timed               | 1635
num_requested           | 23
write_time              | 650.289
sync_time               | 5.215
buffers_written         | 7769
slru_written            | 
...

This maps onto the checkpoint statistics available via pg_stat_checkpointer. (PostgreSQL)

Important columns:

num_timed: checkpoints triggered by timeout
num_requested: checkpoints requested manually or by WAL pressure
write_time and sync_time: total time spent writing and syncing data during checkpoints
buffers_written: total number of buffers written during checkpoints

As a DBA, you use these to:

Tune max_wal_size and checkpoint_timeout
Check if checkpoints are too frequent (high num_timed or num_requested)
See how expensive checkpoint I/O is, especially sync_time

With StatsMgr, instead of just seeing totals, you get a time-series view: you can check how checkpoint behaviour changed during a specific high-load period.

5.4 statsmgr_io(): detailed I/O view

SELECT * FROM statsmgr_io() LIMIT 1;

Output:

backend_type         | client backend
object               | relation
context              | bulkread
reads                | 294728
read_time            | 853.539
hits                 | 4416
reuses               | 293255
...
tick                 | 4
tick_tz              | 2025-11-25 18:27:43.87459+05:30

This corresponds to the new pg_stat_io view in modern PostgreSQL versions. (PostgreSQL)

Here:

backend_type tells you which kind of process generated the I/O (client backend, autovacuum worker, checkpointer, etc.)
object tells you what is being accessed (relations, temp relations, SLRU, etc.)
context splits I/O by context (normal read, bulk read, bulk write, etc.)
reads, read_time, hits show how many blocks were read and their timing

How this helps:

Find which backend types cause heavy I/O (for example, autovacuum vs user queries)
Separate sequential bulk reads from small random reads
If read_time is high but reads are moderate, storage latency may be an issue

5.5 statsmgr_slru(): SLRU cache health

SELECT * FROM statsmgr_slru();

Sample output:

name              | blocks_zeroed | blocks_hit | blocks_read | blocks_written | flush | truncate | tick
------------------+---------------+------------+-------------+----------------+-------+----------+-----
commit_timestamp  |             0 |          0 |           0 |              0 |     0 |        0 | 4
multixact_member  |            32 |       1720 |           0 |             34 |    57 |       56 | 4
multixact_offset  |             1 |     103262 |          23 |             38 |    57 |        0 | 4
notify            |             0 |          0 |           0 |              0 |     0 |        0 | 4
...
transaction       |             0 |         81 |          16 |             18 |    73 |        0 | 4

This is a time-series wrapper around pg_stat_slru, which tracks activity in Simple LRU (SLRU) areas used for things like multixacts, subtransactions, and commit timestamps.

Why DBAs care:

multixact_* SLRUs are critical for row-level locks and can cause wraparound issues
High blocks_read and blocks_written for multixact_* areas might signal heavy locking or very frequent updates
Truncations and flushes give hints about wraparound maintenance

Because StatsMgr keeps several ticks, you can see trends such as gradually increasing writes in multixact_member before a wraparound maintenance event.

5.6 statsmgr_snapshot(): forcing a snapshot

SELECT * FROM statsmgr_snapshot();

Result:

statsmgr_snapshot 
-------------------
 
(1 row)

This function is used to force a snapshot outside of the normal periodic scheduling. In your build, it simply triggers the snapshot mechanism and returns an empty row, which is why you see a single blank result.

Usage idea:

You can call statsmgr_snapshot() right before and after a specific benchmark or migration step, then compare snapshot ticks to see exact increments for that period.

5.7 statsmgr_wal(): WAL generation profile

SELECT * FROM statsmgr_wal();

Output:

wal_records | wal_fpi | wal_bytes  | wal_buffers_full | wal_write | wal_sync | wal_write_time | wal_sync_time | tick
-------------+---------+------------+------------------+-----------+----------+----------------+---------------+------
     2289011 |   41002 | 1751475695 |           156200 |    189118 |    32912 |       1973.238 |    139297.966 | 6
...

This reflects the counters from pg_stat_wal.

Important fields:

wal_records: number of WAL records generated
Wal_fpi : the count of full data pages written into WAL to guarantee crash-safe recovery after a page is modified.
wal_bytes: total volume of WAL written
wal_buffers_full: times the WAL buffer was full and forced a flush
wal_write, wal_sync, wal_write_time, wal_sync_time: how often and how long WAL writes and fsyncs took

Practical DBA usage:

Estimate WAL generation rate for sizing replication bandwidth and storage
Monitor wal_fpi to see if many full page images are being produced (which may indicate frequent page dirties between checkpoints)
Look at wal_buffers_full: high values can mean wal_buffers is too small
Compare wal_write_time and wal_sync_time across ticks to detect slow storage or spikes during certain workloads

StatsMgr adds the historical dimension: instead of a single cumulative value, you can see how these counters evolved for each snapshot tick.

5.8 statsmgr_reset() and statsmgr_stop_main_worker()

SELECT * FROM statsmgr_reset();

Result:

statsmgr_reset 
----------------
 
(1 row)

This resets StatsMgr’s internal history and counters. It is similar in spirit to calling pg_stat_reset() on built-in statistics.

Use cases:

Before running a benchmark
Before starting a new diagnostic session
After changing configuration and wanting a clean baseline

Then you stopped the worker:

SELECT * FROM statsmgr_stop_main_worker();

First call:

  pid   
---------
 1010574

Second call:

pid 
-----
  -1

Interpretation:

The first call returns the PID of the worker that has just been asked to stop
The second call returns -1, meaning there is no main worker currently running

This is useful in automation: you can safely call statsmgr_stop_main_worker() without having to check first whether the worker is alive.

6. How a DBA Can Use StatsMgr in Practice

1. Capacity planning and performance baselines

Use statsmgr_wal() across ticks to calculate WAL generated per minute and derive throughput needs for storage and replication.
Use statsmgr_io() to understand which backend types and contexts dominate I/O.

2. Investigating “mystery spikes”

When you hear “yesterday around 3 PM the database felt slow”, but your Prometheus/agent missed data, StatsMgr’s snapshots can still show you:

Did WAL volume jump?
Did checkpoints or bgwriter activity spike?
Did SLRU areas suddenly get hammered?

3. Tuning checkpoints and bgwriter

From statsmgr_checkpointer() and statsmgr_bgwriter() you can track how changes in checkpoint_timeout, max_wal_size, bgwriter_lru_maxpages, etc., affect write and sync timings over time.

4. Lock and multixact troubleshooting

statsmgr_slru() gives a historical view of multixact and transaction SLRU activity. Heavy multixact writes might correlate with high contention or frequent updates/deletes.

5. Backup/archiving reliability

statsmgr_archiver() makes it easy to prove that archiving has been running consistently over the last N snapshots, and to detect periods where no WAL was archived even though statsmgr_wal() shows WAL was generated.

Conclusion

StatsMgr brings a powerful new way to observe what is happening inside PostgreSQL. Instead of relying only on the current values shown in the standard pg_stat_* views, StatsMgr continuously captures snapshots of key internal statistics—covering WAL activity, I/O behaviour, checkpoints, SLRU usage, and the archiver process—and stores them as a rolling history.

This transforms PostgreSQL into a lightweight time-series monitoring system. Throughout this walkthrough, we built the extension from source, enabled it inside the database, started its background worker, and explored every StatsMgr function in action. By examining the historical snapshots it collects, and by combining them with SQL techniques like window functions, you gain the ability to look back in time, correlate different types of internal activity, and understand exactly what the database was doing during any moment of interest.

For a DBA, this level of visibility is invaluable—especially during troubleshooting, capacity planning, and performance analysis—and makes StatsMgr an excellent addition to any PostgreSQL 17+ environment.

How to Monitor PostgreSQL Internals with the StatsMgr Extension

1. What StatsMgr Actually Does

2. Installing StatsMgr from Source

3. Two Ways to Install: Temporary vs Permanent

3.1 Temporary (per-database) installation

3.2 Permanent installation via shared_preload_libraries

4. Starting the StatsMgr Background Worker

4.1 Why the PID is not in pg_stat_activity

5. Understanding the StatsMgr Views with Real Output

5.1 statsmgr_archiver(): WAL archiving statistics

5.2 statsmgr_bgwriter(): background writer behaviour

5.3 statsmgr_checkpointer(): checkpoint cost

5.4 statsmgr_io(): detailed I/O view

5.5 statsmgr_slru(): SLRU cache health

5.6 statsmgr_snapshot(): forcing a snapshot

5.7 statsmgr_wal(): WAL generation profile

5.8 statsmgr_reset() and statsmgr_stop_main_worker()

6. How a DBA Can Use StatsMgr in Practice

Conclusion

Category

Related Post