Subqueries and Common Table Expressions (CTEs) enable you to break complex problems into manageable pieces, creating readable queries that calculate intermediate results for final analysis. From simple filtering to hierarchical traversals, these techniques transform unmaintainable nested monsters into elegant, step-by-step solutions perfect for reporting, analytics, and data engineering.
What Is a Subquery and When to Use It
A subquery is a SELECT statement nested inside another query, executing first to provide results for the outer query. Use subqueries when:
- Filtering based on dynamic thresholds (employees above average salary)
- Creating derived values for comparisons
- Replacing hard-coded values with calculated ones
-- Classic example: Employees earning above average salary
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
| employee_id | first_name | salary |
|---|---|---|
| 101 | Rahul | ₹85,000 |
| 103 | Priya | ₹92,000 |
Types of Subqueries: Scalar, Row, Table
Scalar Subquery: Single Value
SELECT name, price
FROM products
WHERE category_id = (SELECT category_id FROM categories WHERE name = 'Electronics');
Row Subquery: Multiple Columns
-- Find product matching specific vendor and model
SELECT * FROM products p
WHERE (p.vendor_id, p.model) = (
SELECT vendor_id, model FROM inventory WHERE stock_low = TRUE
LIMIT 1
);
Table Subquery: Multiple Rows
-- Products more expensive than any laptop
SELECT name, price FROM products p1
WHERE price > ALL (
SELECT price FROM products p2 WHERE p2.category = 'Laptop'
);
Correlated vs Non-correlated Subqueries
Non-correlated: Independent Execution
Inner query runs once, outer uses single result:
✅ Fast & Efficient
SELECT name FROM products
WHERE category_id IN (
SELECT category_id FROM categories
WHERE is_active = TRUE
);
⚠️ Executes Per Row
SELECT e.name, e.salary
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e.department
);
Warning: Correlated subqueries execute once per outer row, making them slow on large tables. Use JOINs or CTEs for better performance.
Introduction to CTEs with WITH Clause
Common Table Expressions create named temporary result sets, dramatically improving readability for complex logic.
WITH high_earners AS (
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > 70000
),
dept_averages AS (
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
)
SELECT he.first_name, he.salary, da.avg_salary,
he.salary - da.avg_salary as above_avg
FROM high_earners he
JOIN dept_averages da ON he.department = da.department
ORDER BY above_avg DESC;
Multi-CTE queries enable step-by-step calculations like daily totals → weekly totals → monthly trends.
Daily to Monthly Sales Rollup
WITH daily_sales AS (
SELECT DATE(order_date) as sale_date, SUM(total) as daily_total
FROM orders
GROUP BY DATE(order_date)
),
weekly_sales AS (
SELECT DATE(sale_date - INTERVAL WEEKDAY(sale_date) DAY) as week_start,
SUM(daily_total) as weekly_total
FROM daily_sales
GROUP BY week_start
)
SELECT
DATE_FORMAT(week_start, '%Y-%m') as month,
SUM(weekly_total) as monthly_revenue,
COUNT(*) as weeks_in_month
FROM weekly_sales
GROUP BY DATE_FORMAT(week_start, '%Y-%m')
ORDER BY month;
Recursive CTEs for Hierarchical Data
Recursive CTEs solve tree-like problems: organizational charts, category hierarchies, bill-of-materials.
-- Employee hierarchy (self-referencing manager_id)
WITH RECURSIVE employee_tree AS (
-- Anchor: Top-level executives
SELECT employee_id, first_name, manager_id, 0 as level
FROM employees WHERE manager_id IS NULL
UNION ALL
-- Recursive: Direct reports
SELECT e.employee_id, e.first_name, e.manager_id, et.level + 1
FROM employees e
INNER JOIN employee_tree et ON e.manager_id = et.employee_id
)
SELECT * FROM employee_tree ORDER BY level, first_name;
| employee_id | first_name | manager_id | level |
|---|---|---|---|
| 1 | CEO | NULL | 0 |
| 2 | CTO | 1 | 1 |
| 3 | Dev Lead | 2 | 2 |
Choosing Between Subqueries, CTEs, and JOINs
| Technique | Use When | Pros | Cons |
|---|---|---|---|
| Simple Subquery | Single dynamic value | Concise | Nested readability issues |
| Correlated Subquery | Row-by-row comparison | Flexible | Poor performance |
| CTE | Multi-step logic, readability | Modular, reusable | Slight overhead |
| JOIN | Combining tables | Fastest, optimized | Complex for derived logic |
Refactoring Decision Tree
- Simple table combination? → JOIN
- Step-by-step calculations? → CTE
- Single dynamic filter value? → Scalar subquery
- Hierarchical/tree data? → Recursive CTE
- Deep nesting → unreadable? → Refactor to CTEs
Production Tips
- CTEs materialize only when referenced (no storage overhead)
- Index columns used in correlated subquery WHERE clauses
- Use CTEs for readability even when JOINs perform similarly
- Recursive CTEs need termination condition (prevents infinite loops)
- Modern ORMs generate subqueries/CTEs automatically
Subqueries and CTEs elevate SQL from simple lookups to sophisticated analytical engines. Start with subqueries for simple cases, graduate to CTEs for maintainability, and reserve recursion for true hierarchical challenges. Your queries will become more readable, performant, and professional-grade.
Comments
Post a Comment