Speeding Up and Monitoring Index Creation in PostgreSQL

Creating indexes on large datasets in PostgreSQL can feel like watching paint dry—especially when you’re dealing with millions of rows. But with the right system settings and live monitoring, you can speed things up and see what’s happening under the hood. Here’s how.

Datum Boy Working On Desktop PC

🛠️ Problem: Index Creation Is Slow

When working with a large table that has 100M+ rows (in our case it’s all_contributions just for the sake of example), a simple command like:

CREATE INDEX idx_recipient_state ON all_contributions (recipient_state);

can take several minutes—or worse, timeout if PostgreSQL’s default settings aren’t tuned.

Step 1: Tune PostgreSQL for Faster Indexing

Before creating indexes, tweak some Postgres configuration variables for better performance.

a. Increase `maintenance_work_mem`

SET maintenance_work_mem = '8GB';

This gives PostgreSQL more RAM to use during index creation. By default, this is often set to a few MBs, which is not enough for large tables.

b. Boost Parallel Workers

SET max_parallel_maintenance_workers = 8;

This tells PostgreSQL to use up to 8 CPU cores for index creation. On machines with multiple cores, this can significantly cut down build time.

💡 Note: These are session-level settings, so you’ll need to run them each time or include them in your script.

Step 2: Monitor Index Progress in Real Time

PostgreSQL provides a system view called pg_stat_progress_create_index that shows you what’s happening behind the scenes during index creation.

Use this query to monitor index creation:

SELECT
  age(clock_timestamp(), a.query_start) AS duration,
  a.query,
  p.phase,
  round(p.blocks_done / NULLIF(p.blocks_total::numeric, 0) * 100, 2) AS "% done",
  p.blocks_total,
  p.blocks_done,
  round(p.tuples_done / NULLIF(p.tuples_total::numeric, 0) * 100, 2) AS "% tuples done",
  p.tuples_total,
  p.tuples_done
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid
LEFT JOIN pg_stat_all_indexes ai
  ON ai.relid = p.relid AND ai.indexrelid = p.index_relid;

This gives you a live view of progress, including which phase it’s in (scanning, sorting, building), and how far along it is.

Wrap-Up

By combining smart configuration settings with real-time insight, you can:

Speed up the creation of large indexes
Avoid unnecessary timeouts
Track progress live and make better decisions

Indexing doesn’t have to be a black box anymore.

Need help debugging slow queries or managing schema changes at scale? Reach out to the team at Datum Brain.