Subqueries and CTEs in SQL: Master Complex Queries - Complete Guide

Subqueries and Common Table Expressions (CTEs) enable you to break complex problems into manageable pieces, creating readable queries that calculate intermediate results for final analysis. From simple filtering to hierarchical traversals, these techniques transform unmaintainable nested monsters into elegant, step-by-step solutions perfect for reporting, analytics, and data engineering. 

What Is a Subquery and When to Use It

A subquery is a SELECT statement nested inside another query, executing first to provide results for the outer query. Use subqueries when:

  • Filtering based on dynamic thresholds (employees above average salary)
  • Creating derived values for comparisons
  • Replacing hard-coded values with calculated ones
-- Classic example: Employees earning above average salary
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
employee_id first_name salary
101Rahul₹85,000
103Priya₹92,000

Types of Subqueries: Scalar, Row, Table

Scalar Subquery: Single Value

SELECT name, price
FROM products
WHERE category_id = (SELECT category_id FROM categories WHERE name = 'Electronics');

Row Subquery: Multiple Columns

-- Find product matching specific vendor and model
SELECT * FROM products p
WHERE (p.vendor_id, p.model) = (
  SELECT vendor_id, model FROM inventory WHERE stock_low = TRUE
  LIMIT 1
);

Table Subquery: Multiple Rows

-- Products more expensive than any laptop
SELECT name, price FROM products p1
WHERE price > ALL (
  SELECT price FROM products p2 WHERE p2.category = 'Laptop'
);

Correlated vs Non-correlated Subqueries

Non-correlated: Independent Execution

Inner query runs once, outer uses single result:

✅ Fast & Efficient

SELECT name FROM products
WHERE category_id IN (
  SELECT category_id FROM categories 
  WHERE is_active = TRUE
);

⚠️ Executes Per Row

SELECT e.name, e.salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary) 
  FROM employees e2 
  WHERE e2.department = e.department
);

Warning: Correlated subqueries execute once per outer row, making them slow on large tables. Use JOINs or CTEs for better performance.

Introduction to CTEs with WITH Clause

Common Table Expressions create named temporary result sets, dramatically improving readability for complex logic.

WITH high_earners AS (
  SELECT employee_id, first_name, salary
  FROM employees
  WHERE salary > 70000
),
dept_averages AS (
  SELECT department, AVG(salary) as avg_salary
  FROM employees
  GROUP BY department
)
SELECT he.first_name, he.salary, da.avg_salary, 
       he.salary - da.avg_salary as above_avg
FROM high_earners he
JOIN dept_averages da ON he.department = da.department
ORDER BY above_avg DESC;

Multi-CTE queries enable step-by-step calculations like daily totals → weekly totals → monthly trends.

Daily to Monthly Sales Rollup

WITH daily_sales AS (
  SELECT DATE(order_date) as sale_date, SUM(total) as daily_total
  FROM orders
  GROUP BY DATE(order_date)
),
weekly_sales AS (
  SELECT DATE(sale_date - INTERVAL WEEKDAY(sale_date) DAY) as week_start,
         SUM(daily_total) as weekly_total
  FROM daily_sales
  GROUP BY week_start
)
SELECT 
  DATE_FORMAT(week_start, '%Y-%m') as month,
  SUM(weekly_total) as monthly_revenue,
  COUNT(*) as weeks_in_month
FROM weekly_sales
GROUP BY DATE_FORMAT(week_start, '%Y-%m')
ORDER BY month;

Recursive CTEs for Hierarchical Data

Recursive CTEs solve tree-like problems: organizational charts, category hierarchies, bill-of-materials.

-- Employee hierarchy (self-referencing manager_id)
WITH RECURSIVE employee_tree AS (
  -- Anchor: Top-level executives
  SELECT employee_id, first_name, manager_id, 0 as level
  FROM employees WHERE manager_id IS NULL
  
  UNION ALL
  
  -- Recursive: Direct reports
  SELECT e.employee_id, e.first_name, e.manager_id, et.level + 1
  FROM employees e
  INNER JOIN employee_tree et ON e.manager_id = et.employee_id
)
SELECT * FROM employee_tree ORDER BY level, first_name;
employee_id first_name manager_id level
1CEONULL0
2CTO11
3Dev Lead22

Choosing Between Subqueries, CTEs, and JOINs

Technique Use When Pros Cons
Simple Subquery Single dynamic value Concise Nested readability issues
Correlated Subquery Row-by-row comparison Flexible Poor performance
CTE Multi-step logic, readability Modular, reusable Slight overhead
JOIN Combining tables Fastest, optimized Complex for derived logic

Refactoring Decision Tree

  1. Simple table combination? → JOIN
  2. Step-by-step calculations? → CTE
  3. Single dynamic filter value? → Scalar subquery
  4. Hierarchical/tree data? → Recursive CTE
  5. Deep nesting → unreadable? → Refactor to CTEs

Production Tips

  • CTEs materialize only when referenced (no storage overhead)
  • Index columns used in correlated subquery WHERE clauses
  • Use CTEs for readability even when JOINs perform similarly
  • Recursive CTEs need termination condition (prevents infinite loops)
  • Modern ORMs generate subqueries/CTEs automatically

Subqueries and CTEs elevate SQL from simple lookups to sophisticated analytical engines. Start with subqueries for simple cases, graduate to CTEs for maintainability, and reserve recursion for true hierarchical challenges. Your queries will become more readable, performant, and professional-grade.

Comments