Understanding PostgreSQL EXPLAIN: A Guide to Query Optimization
PostgreSQL's EXPLAIN
command is a powerful tool that provides insights into how a query will be executed. It helps developers and database administrators identify potential performance issues in their queries. In this blog, we will explore what EXPLAIN
is, when to use it, how to interpret its output, and techniques for optimizing queries based on the insights it provides.
In this blog post:
- What is PostgreSQL EXPLAIN command?
- When to use EXPLAIN command in PostgreSQL
- How to use EXPLAIN command in PostgreSQL
- Understanding EXPLAIN Output
- Common Optimization Techniques
- Database Monitoring with Atatus
What is PostgreSQL EXPLAIN command?
EXPLAIN
is a PostgreSQL command that displays the execution plan for a given SQL query. The execution plan shows how the query planner intends to retrieve the data, including the order of operations, use of indexes, and estimated costs.
Variants of PostgreSQL EXPLAIN Command
The PostgreSQL EXPLAIN command has two key variants, each serving a distinct purpose in analysing query performance:
- EXPLAIN
- EXPLAIN ANALYZE
PostgreSQL EXPLAIN vs EXPLAIN ANALYZE
Feature | EXPLAIN | EXPLAIN ANALYZE |
---|---|---|
Query Execution | Does not execute the query. | Executes the query. |
Purpose | Displays the planned execution strategy, including estimated costs and operations. | Provides the execution plan along with actual runtime statistics, such as time taken and rows processed. |
Use Case | Ideal for understanding query structure and planning optimization. | Best for identifying discrepancies between the query plan and actual performance. |
Performance Impact | No impact on performance as the query isn’t executed. | Can impact performance as the query is executed. |
When to use EXPLAIN command in PostgreSQL
Use EXPLAIN
command in PostgreSQL to:
(i). Understand Query Performance:
- Diagnose slow queries.
- Identify inefficient operations like sequential scans on large tables.
(ii). Plan Index Usage:
- Check if queries are utilizing indexes effectively.
(iii). Optimize Complex Queries:
- Analyse nested queries, joins, and subqueries.
(iv). Benchmark Query Changes:
- Compare execution plans before and after optimization.
How to use EXPLAIN command in PostgreSQL
To use the EXPLAIN command, simply run EXPLAIN
followed by the SQL query to get the query execution plan:
EXPLAIN SELECT * FROM orders WHERE order_date > '2023-01-01';
This won't run the query, but it shows the steps PostgreSQL plans to take to get the data, like whether it will use an index or scan the entire table. It also provides an estimate of how much time it will take and how many rows it expects to process.
Now that we have seen how to use the EXPLAIN
command, it's important to understand what the output means about query execution. In the following section, let us discuss how to interpret this output for better query optimization.
Understanding EXPLAIN Output
The EXPLAIN
output is hierarchical and includes:
- Node Types: Describes the operation being performed (e.g.,
Seq Scan
,Index Scan
,Nested Loop
). - Cost: PostgreSQL
EXPLAIN
cost metric provide an estimate of the resources required for the query. There are two main cost values:
(i). Startup Cost represents the estimated cost to begin processing the first row.
(ii). Total Cost reflects the overall estimated cost to process all rows in the query.
- Rows: The Rows value shows the estimated number of rows PostgreSQL expects to process for each operation. This estimate is based on table statistics and the query structure.
- Width: Width refers to the average size (in bytes) of each row in the operation. Larger rows (e.g., those with text fields) will have a higher width, affecting memory usage.
With a clear understanding of the EXPLAIN
output, let us examine a simple query example to see how PostgreSQL applies these concepts in practice.
PostgreSQL EXPLAIN example: Simple Query
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Output:
Seq Scan on orders (cost=0.00..35.50 rows=10 width=100)
Filter: (customer_id = 42)
Without an index on the customer_id
column, PostgreSQL needs to perform a sequential scan. This means it scans every single row in the orders
table to find rows where customer_id = 42
. Even though you have the customer_id = 42
condition in the query, without an index, PostgreSQL doesn’t have a quick way to jump to the relevant rows.
Optimization Opportunity: Add an index on customer_id
to avoid a sequential scan:
CREATE INDEX idx_customer_id ON orders(customer_id);
After optimization:
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
Output:
Index Scan using idx_customer_id on orders (cost=0.14..8.15 rows=10 width=100)
Index Cond: (customer_id = 42)
Index Scan using idx_customer_id
shows that PostgreSQL is now using the index on the customer_id
column. By adding the index on customer_id
improves performance by avoiding the need for a sequential scan.
PostgreSQL EXPLAIN ANALYZE example: Joins and Nested Loops
EXPLAIN ANALYZE
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
Output:
Nested Loop (cost=0.43..12.34 rows=100 width=72) (actual time=0.054..0.115 rows=5 loops=1)
-> Seq Scan on customers c (cost=0.00..1.02 rows=2 width=36) (actual time=0.012..0.016 rows=2 loops=1)
-> Index Scan using orders_customer_id_idx on orders o (cost=0.43..5.15 rows=5 width=36) (actual time=0.030..0.045 rows=5 loops=2)
- A Nested Loop is used to join
customers
andorders
. Seq Scan
oncustomers
followed by anIndex Scan
onorders
.- Actual time shows it completed quickly, but on larger datasets, the nested loop could become a bottleneck.
Optimization Opportunity: Check if a hash join or a merge join would be faster:
SET enable_nestloop = off; -- Force the planner to avoid nested loops.
EXPLAIN ANALYZE
SELECT o.order_id, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
The planner may choose a Hash Join or Merge Join, which are typically faster for large datasets.
Common Optimization Techniques
(i). Add Indexes: Identify columns used in WHERE
, JOIN
, and ORDER BY
clauses.
Example:
CREATE INDEX idx_order_date ON orders(order_date);
(ii). Use Covering Indexes: Include frequently queried columns in the index to avoid accessing the table.
Example:
CREATE INDEX idx_customer_id_order_date ON orders(customer_id, order_date);
(iii). Analyse and Vacuum Tables: Keep statistics updated for accurate query planning.
Example:
ANALYZE orders;
VACUUM orders;
(iv). Partitioning: Split large tables into smaller, manageable chunks.
Example:
CREATE TABLE orders_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
(v). Rewrite Queries: Simplify subqueries, remove unnecessary joins, and use aggregate functions efficiently.
Replace:
SELECT * FROM orders WHERE order_date > '2023-01-01' AND order_date < '2023-12-31';
With:
SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';
(vi). Use EXPLAIN ANALYZE
for Real Data: Test on production-like data to get accurate insights into execution time and row counts.
Database Monitoring with Atatus
Atatus Database Monitoring simplifies real-time tracking of your database performance. It provides valuable insights into query performance, resource usage, and transaction throughput across both SQL and NoSQL databases.
Atatus supports a wide range of databases, including SQL databases such as MySQL, PostgreSQL, MariaDB, and SQLite. It also supports NoSQL databases like MongoDB, Redis, Memcached, and Aerospike.
Here are the key features of Atatus Database Monitoring:
- Query Performance Insights: Gain detailed insights into SQL and NoSQL query performance, including execution times and latency.
- Slow Query Identification: Detect slow or resource-intensive queries and optimize them to improve performance.
- Bottleneck Detection: Identify and resolve database bottlenecks, including table locks and blocking queries.
- Comprehensive Dashboards: Combine metrics, logs, and traces into a single dashboard for easy monitoring and troubleshooting.
- Customizable Alerts: Set up alerts for performance issues and critical events to address problems proactively.
Start your 14-day free trial with Atatus today!
#1 Solution for Logs, Traces & Metrics
APM
Kubernetes
Logs
Synthetics
RUM
Serverless
Security
More