When you manage a busy PostgreSQL server, the built-in pg_stat_* views are extremely useful—but they only show what is happening right now. If your Prometheus/Grafana pipeline misses data for even a short window, you lose visibility into what happened during that period.
StatsMgr solves this problem. It is a PostgreSQL extension that uses a background worker and shared memory to periodically snapshot PostgreSQL’s internal statistics—including WAL, I/O, SLRU, checkpoint activity, archiver performance, and more. These snapshots are stored as a rolling history, giving you time-series visibility directly inside PostgreSQL.
In this blog, I will cover
- Installing StatsMgr from source
- Starting the StatsMgr background worker and understanding why its PID does not appear in pg_stat_activity
- Running every statsmgr_* function with real outputs from my environment
- How each set of statistics is useful for a DBA in real-world scenarios
All examples and outputs shown here are from PostgreSQL 17.7, which is the minimum version supported by StatsMgr.
1. What StatsMgr Actually Does
At a high level, StatsMgr:
- Snapshots PostgreSQL’s cumulative statistics at fixed intervals
- Stores a configurable number of historical snapshots in shared memory
- Exposes this history via SQL functions like statsmgr_wal(), statsmgr_io(), statsmgr_slru(), etc. (PostgreSQL)
You can think of it as a tiny time-series layer sitting inside PostgreSQL, focused only on internal stats.
This is very useful when:
- Your external monitoring is down, but you still want to know what happened
- You need to debug a spike that already passed
- You want to compute per-minute deltas of WAL, SLRU, I/O, etc., without constantly pulling from pg_stat_* yourself
2. Installing StatsMgr from Source
I built StatsMgr from the official repository using PostgreSQL 17’s PGXS:
git clone https://codeberg.org/Data-Bene/StatsMgr.git
cd StatsMgr
make USE_PGXS=1 PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config
sudo make install USE_PGXS=1 PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config
You can verify the extension is available with:
SELECT *
FROM pg_available_extensions
WHERE name = 'statsmgr';
You get result like this
name | default_version | installed_version | comment
---------+-----------------+-------------------+--------------------------------------
statsmgr| 0.1-alpha | 0.1-alpha | Statistics management for PostgreSQL
3. Two Ways to Install: Temporary vs Permanent
You can “install” StatsMgr at two levels:
3.1 Temporary (per-database) installation
This is what you already did:
CREATE EXTENSION statsmgr;
This:
- Registers the SQL functions like statsmgr_wal(), statsmgr_io(), etc. in that database
- Lets you manually start the background worker using statsmgr_start_main_worker()
- Does not require a server restart
However, because this uses a dynamic background worker (registered at runtime), it will not automatically start when PostgreSQL restarts. After a restart, you must again run statsmgr_start_main_worker().
This is perfect for experiments and development environments.
3.2 Permanent installation via shared_preload_libraries
StatsMgr is designed to integrate with background workers and shared memory. To have it available cluster-wide and to allow it to register workers during postmaster startup, you should add it to shared_preload_libraries in postgresql.conf:
You can get the path of the config file inside psql terminal like this
postgres=# show config_file ;
config_file
-----------------------------------------
/etc/postgresql/17/main/postgresql.conf
Open this file and find this parameter named shared_preload_libraries and set it’s value like this
shared_preload_libraries = 'statsmgr'
Then restart PostgreSQL.
Why this matters:
- Extensions that allocate shared memory or register background workers at startup (like pg_stat_statements, pg_cron, etc.) must be in shared_preload_libraries. (PostgreSQL)
- With this configuration, StatsMgr can be ready immediately when the server is up, and you can choose to have its background worker started early in the lifecycle.
4. Starting the StatsMgr Background Worker
You can start the main worker like this:
SELECT * FROM statsmgr_start_main_worker();
Output:
pid
---------
1010574
On the OS side, execute this command with this process id
ps -ef | grep 1010574
Now you can see a process with this pid runs in background
postgres 1010574 1064 0 18:20 ? 00:00:00 postgres: 17/main: statsmgr main
So the background worker is clearly running as a PostgreSQL child process.
4.1 Why the PID is not in pg_stat_activity
SELECT *
FROM pg_stat_activity
WHERE pid = '1010574';
You got 0 records
(0 rows)
At first this looks strange, because pg_stat_activity is described as “one row per server process”.
The key detail is how background workers are defined in PostgreSQL:
- A background worker can either
- Attach to shared memory only, or
- Attach to shared memory and also open a normal database connection
Only workers that behave like a database backend (with a DB connection) appear in pg_stat_activity. StatsMgr’s main worker is a lightweight process that attaches to shared memory and collects statistics; it does not open a SQL-level connection to any database.
Result:
- You see the PID in the operating system process list
- But it does not appear in pg_stat_activity, because from PostgreSQL’s point of view it is not a “backend” with a current query
This is normal and safe. If you want to monitor that the worker is alive, check:
- The OS process list by name (statsmgr main)
- Logs from the extension
5. Understanding the StatsMgr Views with Real Output
After starting the worker, you can explore all the functions related to this statsmgr extension.
Internally, each function returns multiple rows representing snapshots over time. The tick column is a key to reference snapshots, and tick_tz is the timestamp at which the snapshot was taken. As the worker keeps running, more ticks are added.
5.1 statsmgr_archiver(): WAL archiving statistics
SELECT * FROM statsmgr_archiver();
You get result like this.
archived_count | last_archived_wal | last_archived_timestamp ...
----------------+-------------------+---------------------------
0 | | 2000-01-01 05:30:00+05:30
0 | | 2000-01-01 05:30:00+05:30
0 | | 2000-01-01 05:30:00+05:30
0 | | 2000-01-01 05:30:00+05:30
Fields are analogous to pg_stat_archiver: number of successfully archived WAL segments, last archived WAL file, last failure, and the time the stats were reset.
How this helps a DBA:
- Detect if archiving is stuck (no increment in archived_count across ticks)
- Investigate frequent failures using failed_count and last_failed_timestamp
- Cross-check with statsmgr_wal() to see how much WAL was generated vs how much was archived
5.2 statsmgr_bgwriter(): background writer behaviour
SELECT * FROM statsmgr_bgwriter();
Output snippet:
buf_written_clean | maxwritten_clean | buf_alloc | stat_reset_timestamp | tick
-------------------+------------------+-----------+----------------------------------+------
5408 | 0 | 44090 | 2025-11-16 08:41:00.668938+05:30 | 3
...
This mirrors counters from pg_stat_bgwriter.
Key columns:
- buf_written_clean: buffers dirtied by backends that were written by the background writer
- maxwritten_clean: times the bgwriter stopped early because it hit its write limit
- buf_alloc: how many buffers were allocated
Why is it useful?
- If buf_alloc keeps growing rapidly, you might be hitting memory pressure or cache churn.
- If maxwritten_clean is high, the bgwriter is often hitting its limit and may not keep up with dirty buffers.
- By comparing buf_written_clean across ticks, you can compute per-interval rates to see if some workload spikes create more background writes
5.3 statsmgr_checkpointer(): checkpoint cost
SELECT * FROM statsmgr_checkpointer() LIMIT 1;
Output:
num_timed | 1635
num_requested | 23
write_time | 650.289
sync_time | 5.215
buffers_written | 7769
slru_written |
...
This maps onto the checkpoint statistics available via pg_stat_checkpointer. (PostgreSQL)
Important columns:
- num_timed: checkpoints triggered by timeout
- num_requested: checkpoints requested manually or by WAL pressure
- write_time and sync_time: total time spent writing and syncing data during checkpoints
- buffers_written: total number of buffers written during checkpoints
As a DBA, you use these to:
- Tune max_wal_size and checkpoint_timeout
- Check if checkpoints are too frequent (high num_timed or num_requested)
- See how expensive checkpoint I/O is, especially sync_time
With StatsMgr, instead of just seeing totals, you get a time-series view: you can check how checkpoint behaviour changed during a specific high-load period.5.4 statsmgr_io(): detailed I/O view
SELECT * FROM statsmgr_io() LIMIT 1;
Output:
backend_type | client backend
object | relation
context | bulkread
reads | 294728
read_time | 853.539
hits | 4416
reuses | 293255
...
tick | 4
tick_tz | 2025-11-25 18:27:43.87459+05:30
This corresponds to the new pg_stat_io view in modern PostgreSQL versions. (PostgreSQL)
Here:
- backend_type tells you which kind of process generated the I/O (client backend, autovacuum worker, checkpointer, etc.)
- object tells you what is being accessed (relations, temp relations, SLRU, etc.)
- context splits I/O by context (normal read, bulk read, bulk write, etc.)
- reads, read_time, hits show how many blocks were read and their timing
How this helps:
- Find which backend types cause heavy I/O (for example, autovacuum vs user queries)
- Separate sequential bulk reads from small random reads
- If read_time is high but reads are moderate, storage latency may be an issue
5.5 statsmgr_slru(): SLRU cache health
SELECT * FROM statsmgr_slru();
Sample output:
name | blocks_zeroed | blocks_hit | blocks_read | blocks_written | flush | truncate | tick
------------------+---------------+------------+-------------+----------------+-------+----------+-----
commit_timestamp | 0 | 0 | 0 | 0 | 0 | 0 | 4
multixact_member | 32 | 1720 | 0 | 34 | 57 | 56 | 4
multixact_offset | 1 | 103262 | 23 | 38 | 57 | 0 | 4
notify | 0 | 0 | 0 | 0 | 0 | 0 | 4
...
transaction | 0 | 81 | 16 | 18 | 73 | 0 | 4
This is a time-series wrapper around pg_stat_slru, which tracks activity in Simple LRU (SLRU) areas used for things like multixacts, subtransactions, and commit timestamps.
Why DBAs care:
- multixact_* SLRUs are critical for row-level locks and can cause wraparound issues
- High blocks_read and blocks_written for multixact_* areas might signal heavy locking or very frequent updates
- Truncations and flushes give hints about wraparound maintenance
Because StatsMgr keeps several ticks, you can see trends such as gradually increasing writes in multixact_member before a wraparound maintenance event.
5.6 statsmgr_snapshot(): forcing a snapshot
SELECT * FROM statsmgr_snapshot();
Result:
statsmgr_snapshot
-------------------
(1 row)
This function is used to force a snapshot outside of the normal periodic scheduling. In your build, it simply triggers the snapshot mechanism and returns an empty row, which is why you see a single blank result.
Usage idea:
- You can call statsmgr_snapshot() right before and after a specific benchmark or migration step, then compare snapshot ticks to see exact increments for that period.
5.7 statsmgr_wal(): WAL generation profile
SELECT * FROM statsmgr_wal();
Output:
wal_records | wal_fpi | wal_bytes | wal_buffers_full | wal_write | wal_sync | wal_write_time | wal_sync_time | tick
-------------+---------+------------+------------------+-----------+----------+----------------+---------------+------
2289011 | 41002 | 1751475695 | 156200 | 189118 | 32912 | 1973.238 | 139297.966 | 6
...
This reflects the counters from pg_stat_wal.
Important fields:
- wal_records: number of WAL records generated
- Wal_fpi : the count of full data pages written into WAL to guarantee crash-safe recovery after a page is modified.
- wal_bytes: total volume of WAL written
- wal_buffers_full: times the WAL buffer was full and forced a flush
- wal_write, wal_sync, wal_write_time, wal_sync_time: how often and how long WAL writes and fsyncs took
Practical DBA usage:
- Estimate WAL generation rate for sizing replication bandwidth and storage
- Monitor wal_fpi to see if many full page images are being produced (which may indicate frequent page dirties between checkpoints)
- Look at wal_buffers_full: high values can mean wal_buffers is too small
- Compare wal_write_time and wal_sync_time across ticks to detect slow storage or spikes during certain workloads
StatsMgr adds the historical dimension: instead of a single cumulative value, you can see how these counters evolved for each snapshot tick.
5.8 statsmgr_reset() and statsmgr_stop_main_worker()
SELECT * FROM statsmgr_reset();
Result:
statsmgr_reset
----------------
(1 row)
This resets StatsMgr’s internal history and counters. It is similar in spirit to calling pg_stat_reset() on built-in statistics.
Use cases:
- Before running a benchmark
- Before starting a new diagnostic session
- After changing configuration and wanting a clean baseline
Then you stopped the worker:
SELECT * FROM statsmgr_stop_main_worker();
First call:
pid
---------
1010574
Second call:
pid
-----
-1
Interpretation:
- The first call returns the PID of the worker that has just been asked to stop
- The second call returns -1, meaning there is no main worker currently running
This is useful in automation: you can safely call statsmgr_stop_main_worker() without having to check first whether the worker is alive.
6. How a DBA Can Use StatsMgr in Practice
1. Capacity planning and performance baselines
- Use statsmgr_wal() across ticks to calculate WAL generated per minute and derive throughput needs for storage and replication.
- Use statsmgr_io() to understand which backend types and contexts dominate I/O.
2. Investigating “mystery spikes”
When you hear “yesterday around 3 PM the database felt slow”, but your Prometheus/agent missed data, StatsMgr’s snapshots can still show you:
- Did WAL volume jump?
- Did checkpoints or bgwriter activity spike?
- Did SLRU areas suddenly get hammered?
3. Tuning checkpoints and bgwriter
- From statsmgr_checkpointer() and statsmgr_bgwriter() you can track how changes in checkpoint_timeout, max_wal_size, bgwriter_lru_maxpages, etc., affect write and sync timings over time.
4. Lock and multixact troubleshooting
- statsmgr_slru() gives a historical view of multixact and transaction SLRU activity. Heavy multixact writes might correlate with high contention or frequent updates/deletes.
5. Backup/archiving reliability
- statsmgr_archiver() makes it easy to prove that archiving has been running consistently over the last N snapshots, and to detect periods where no WAL was archived even though statsmgr_wal() shows WAL was generated.
Conclusion
StatsMgr brings a powerful new way to observe what is happening inside PostgreSQL. Instead of relying only on the current values shown in the standard pg_stat_* views, StatsMgr continuously captures snapshots of key internal statistics—covering WAL activity, I/O behaviour, checkpoints, SLRU usage, and the archiver process—and stores them as a rolling history.
This transforms PostgreSQL into a lightweight time-series monitoring system. Throughout this walkthrough, we built the extension from source, enabled it inside the database, started its background worker, and explored every StatsMgr function in action. By examining the historical snapshots it collects, and by combining them with SQL techniques like window functions, you gain the ability to look back in time, correlate different types of internal activity, and understand exactly what the database was doing during any moment of interest.
For a DBA, this level of visibility is invaluable—especially during troubleshooting, capacity planning, and performance analysis—and makes StatsMgr an excellent addition to any PostgreSQL 17+ environment.