Indexes are specialized data structures that dramatically accelerate data retrieval, transforming "table scan" queries (minutes) into "index seek" operations (milliseconds). Like a book's index pointing directly to page numbers, SQL indexes point directly to matching rows. This guide covers index fundamentals, types, design patterns, and optimization techniques used by production database engineers.
What Is an Index in SQL?
An index creates a sorted copy of selected columns plus pointers to the actual data rows, enabling the database to quickly locate and retrieve matching records without scanning every row.
- Benefit: WHERE, JOIN, ORDER BY, GROUP BY become lightning-fast
- Cost: Extra storage + slower INSERT/UPDATE/DELETE (must update index too)
- Analogy: Library catalog vs reading every book
Clustered vs Non-clustered Indexes
| Aspect | Clustered Index | Non-clustered Index |
|---|---|---|
| Structure | Physical row order = index order | Separate structure pointing to data |
| Per Table | Exactly 1 (usually PRIMARY KEY) | Multiple allowed |
| Storage | No extra copy (data = index) | Extra storage for index + pointers |
| Example | PRIMARY KEY (id) |
INDEX idx_email (email) |
When and Where to Create Indexes
Index columns frequently used in:
- WHERE clauses:
WHERE status = 'active' - JOIN conditions:
ON customer_id = o.customer_id - ORDER BY / GROUP BY:
ORDER BY created_at DESC
-- High-impact indexes for e-commerce
CREATE INDEX idx_orders_customer_status
ON orders (customer_id, status, order_date);
CREATE INDEX idx_products_category_price
ON products (category_id, price);
CREATE INDEX idx_users_email_active
ON users (email, is_active);
No Index
Table Scan
With Index
Index Seek
Covering Indexes and Composite Indexes
Composite Indexes: Multi-column Power
Indexes on multiple columns (in correct order) accelerate complex queries.
-- Perfect for: customer_id + status + order_date
CREATE INDEX idx_orders_csod
ON orders (customer_id, status, order_date DESC);
-- Can answer ANY prefix:
-- customer_id=123
-- customer_id=123 AND status='shipped'
-- customer_id=123 AND status='shipped' ORDER BY order_date
Column Order Rule: Equality → Range → Sort. Put most selective (=) columns first.
Covering Indexes: Index-Only Reads
When SELECT columns match index columns, database reads ONLY the index (no table access).
-- Covering index covers entire query
CREATE INDEX idx_users_covering
ON users (status, created_at, email);
SELECT status, created_at, email
FROM users
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 10;
-- Reads ONLY index! Lightning fast 🚀
Trade-offs: Write Overhead and Storage
| Operation | No Index | With Index |
|---|---|---|
| SELECT (1% rows) | Table scan: Slow | Index seek: Fast |
| INSERT | Fast | Slower (update indexes) |
| UPDATE (indexed column) | Fast | Slower (rebuild index entries) |
| Storage | Table only | Table + index copies (~30-50% extra) |
Over-indexing kills writes: 20+ indexes per table causes INSERT/UPDATE to crawl.
Diagnosing and Tuning Slow Queries
1. EXPLAIN Your Queries
-- MySQL/PostgreSQL
EXPLAIN SELECT * FROM orders
WHERE customer_id = 123
AND status = 'shipped'
ORDER BY order_date DESC;
-- Look for:
-- ✅ "Using index"
-- ✅ "Index scan"
-- ❌ "Using filesort"
-- ❌ "Using temporary"
2. Before/After Performance Test
❌ No Index
mysql> EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | orders | NULL | ALL | NULL | NULL | NULL | NULL | 1M | 0.01 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------+
1M rows scanned! 😱
✅ With Index
mysql> CREATE INDEX idx_orders_cust_status ON orders(customer_id, status);
mysql> EXPLAIN SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';
+----+-------------+-------+------------+-------+---------------+-------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | orders| NULL | ref | idx_orders... | idx_orders_cust.. | 5 | const | 15 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-------------------+---------+-------+------+----------+-------+
15 rows via index! 🚀
3. Index Maintenance Commands
-- Check index usage
SELECT * FROM sys.schema_unused_indexes; -- MySQL 8.0+
-- Analyze table statistics
ANALYZE TABLE orders;
-- Show index definitions
SHOW INDEX FROM orders;
-- Drop unused indexes
DROP INDEX idx_old_unused ON orders;
Production Index Strategy
Index Candidates Checklist
- Primary/foreign keys → Always index
- WHERE clause columns (especially =, IN)
- JOIN columns (especially foreign keys)
- ORDER BY / GROUP BY columns
- High-cardinality columns (>10 unique values)
The 80/20 Rule
- 3-5 indexes per table covers 80% performance gains
- Monitor slow query logs → index slow WHERE/JOIN columns
- Quarterly index audit: drop unused indexes
- Test write performance after indexing
Indexes are your database's turbocharger: strategic placement yields massive SELECT gains with manageable write overhead. EXPLAIN religiously, index smartly, monitor usage, and watch query times plummet while user satisfaction soars.
Comments
Post a Comment