Understanding PostgreSQL EXPLAIN: A Guide to Query Optimization

PostgreSQL's EXPLAIN command is a powerful tool that provides insights into how a query will be executed. It helps developers and database administrators identify potential performance issues in their queries. In this blog, we will explore what EXPLAIN is, when to use it, how to interpret its output, and techniques for optimizing queries based on the insights it provides.

In this blog post:

What is PostgreSQL EXPLAIN command?

EXPLAIN is a PostgreSQL command that displays the execution plan for a given SQL query. The execution plan shows how the query planner intends to retrieve the data, including the order of operations, use of indexes, and estimated costs.

EXPLAIN Command: Query Execution Flow
EXPLAIN Command: Query Execution Flow

Variants of PostgreSQL EXPLAIN Command

The PostgreSQL EXPLAIN command has two key variants, each serving a distinct purpose in analysing query performance:

  1. EXPLAIN
  2. EXPLAIN ANALYZE

PostgreSQL EXPLAIN vs EXPLAIN ANALYZE

Feature EXPLAIN EXPLAIN ANALYZE
Query Execution Does not execute the query. Executes the query.
Purpose Displays the planned execution strategy, including estimated costs and operations. Provides the execution plan along with actual runtime statistics, such as time taken and rows processed.
Use Case Ideal for understanding query structure and planning optimization. Best for identifying discrepancies between the query plan and actual performance.
Performance Impact No impact on performance as the query isn’t executed. Can impact performance as the query is executed.

When to use EXPLAIN command in PostgreSQL

Use EXPLAIN command in PostgreSQL to:

(i). Understand Query Performance:

  • Diagnose slow queries.
  • Identify inefficient operations like sequential scans on large tables.

(ii). Plan Index Usage:

  • Check if queries are utilizing indexes effectively.

(iii). Optimize Complex Queries:

  • Analyse nested queries, joins, and subqueries.

(iv). Benchmark Query Changes:

  • Compare execution plans before and after optimization.

How to use EXPLAIN command in PostgreSQL

To use the EXPLAIN command, simply run EXPLAIN  followed by the  SQL query to get the query execution plan:

EXPLAIN SELECT * FROM orders WHERE order_date > '2023-01-01';

This won't run the query, but it shows the steps PostgreSQL plans to take to get the data, like whether it will use an index or scan the entire table. It also provides an estimate of how much time it will take and how many rows it expects to process.

Now that we have seen how to use the EXPLAIN command, it's important to understand what the output means about query execution. In the following section, let us discuss how to interpret this output for better query optimization.

Understanding EXPLAIN Output

The EXPLAIN output is hierarchical and includes:

  • Node Types: Describes the operation being performed (e.g., Seq Scan, Index Scan, Nested Loop).
  • Cost: PostgreSQL EXPLAIN cost metric provide an estimate of the resources required for the query. There are two main cost values:

      (i). Startup Cost represents the estimated cost to begin processing the first row.

      (ii). Total Cost reflects the overall estimated cost to process all rows in the query.

  • Rows: The Rows value shows the estimated number of rows PostgreSQL expects to process for each operation. This estimate is based on table statistics and the query structure.
  • Width: Width refers to the average size (in bytes) of each row in the operation. Larger rows (e.g., those with text fields) will have a higher width, affecting memory usage.

With a clear understanding of the EXPLAIN output, let us examine a simple query example to see how PostgreSQL applies these concepts in practice.

PostgreSQL EXPLAIN example: Simple Query

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

Output:

Seq Scan on orders  (cost=0.00..35.50 rows=10 width=100)
  Filter: (customer_id = 42)
Output Breakdown
Output Breakdown

Without an index on the customer_id column, PostgreSQL needs to perform a sequential scan. This means it scans every single row in the orders table to find rows where customer_id = 42. Even though you have the customer_id = 42 condition in the query, without an index, PostgreSQL doesn’t have a quick way to jump to the relevant rows.

Optimization Opportunity: Add an index on customer_id to avoid a sequential scan:

CREATE INDEX idx_customer_id ON orders(customer_id);

After optimization:

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

Output:

Index Scan using idx_customer_id on orders  (cost=0.14..8.15 rows=10 width=100)
  Index Cond: (customer_id = 42)

Index Scan using idx_customer_id shows that PostgreSQL is now using the index on the customer_id column. By adding the index on customer_id improves performance by avoiding the need for a sequential scan.

PostgreSQL EXPLAIN ANALYZE example: Joins and Nested Loops

EXPLAIN ANALYZE
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

Output:

Nested Loop  (cost=0.43..12.34 rows=100 width=72) (actual time=0.054..0.115 rows=5 loops=1)
  -> Seq Scan on customers c  (cost=0.00..1.02 rows=2 width=36) (actual time=0.012..0.016 rows=2 loops=1)
  -> Index Scan using orders_customer_id_idx on orders o  (cost=0.43..5.15 rows=5 width=36) (actual time=0.030..0.045 rows=5 loops=2)
  • A Nested Loop is used to join customers and orders.
  • Seq Scan on customers followed by an Index Scan on orders.
  • Actual time shows it completed quickly, but on larger datasets, the nested loop could become a bottleneck.

Optimization Opportunity: Check if a hash join or a merge join would be faster:

SET enable_nestloop = off; -- Force the planner to avoid nested loops.
EXPLAIN ANALYZE
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

The planner may choose a Hash Join or Merge Join, which are typically faster for large datasets.

Common Optimization Techniques

(i). Add Indexes: Identify columns used in WHERE, JOIN, and ORDER BY clauses.

Example:

CREATE INDEX idx_order_date ON orders(order_date);

(ii). Use Covering Indexes: Include frequently queried columns in the index to avoid accessing the table.

Example:

CREATE INDEX idx_customer_id_order_date ON orders(customer_id, order_date);

(iii). Analyse and Vacuum Tables: Keep statistics updated for accurate query planning.

Example:

ANALYZE orders;
VACUUM orders;

(iv). Partitioning: Split large tables into smaller, manageable chunks.

Example:

CREATE TABLE orders_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

(v). Rewrite Queries: Simplify subqueries, remove unnecessary joins, and use aggregate functions efficiently.

Replace:

SELECT * FROM orders WHERE order_date > '2023-01-01' AND order_date < '2023-12-31';

With:

SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';

(vi). Use EXPLAIN ANALYZE for Real Data: Test on production-like data to get accurate insights into execution time and row counts.

Database Monitoring with Atatus

Atatus Database Monitoring simplifies real-time tracking of your database performance. It provides valuable insights into query performance, resource usage, and transaction throughput across both SQL and NoSQL databases.

Atatus supports a wide range of databases, including SQL databases such as MySQL, PostgreSQL, MariaDB, and SQLite. It also supports NoSQL databases like MongoDB, Redis, Memcached, and Aerospike.

Here are the key features of Atatus Database Monitoring:

  • Query Performance Insights: Gain detailed insights into SQL and NoSQL query performance, including execution times and latency.
  • Slow Query Identification: Detect slow or resource-intensive queries and optimize them to improve performance.
  • Bottleneck Detection: Identify and resolve database bottlenecks, including table locks and blocking queries.
  • Comprehensive Dashboards: Combine metrics, logs, and traces into a single dashboard for easy monitoring and troubleshooting.
  • Customizable Alerts: Set up alerts for performance issues and critical events to address problems proactively.

Start your 14-day free trial with Atatus today!

Atatus

#1 Solution for Logs, Traces & Metrics

tick-logo APM

tick-logo Kubernetes

tick-logo Logs

tick-logo Synthetics

tick-logo RUM

tick-logo Serverless

tick-logo Security

tick-logo More

Pavithra Parthiban

Pavithra Parthiban

A technical content writer specializing in monitoring and observability tools, adept at making complex concepts easy to understand.
Chennai