Choosing correct SQL data types isn't just about "fitting the data"—it's about storage efficiency, query performance, data integrity, and future-proofing your schema. Wrong types cause bloat (VARCHAR(255) everywhere), slow queries (storing dates as strings), and validation bugs (INT for prices). This guide covers data type selection, normalization fundamentals, and production schema patterns.
Why Choosing the Right Data Type Matters
Data types impact:
- Storage: TINYINT (1 byte) vs VARCHAR(50) (50+ bytes)
- Performance: Indexable types vs TEXT/BLOB scans
- Correctness: DECIMAL(10,2) vs FLOAT rounding errors
- Queries: DATE functions vs string parsing
Numeric, String, Date/Time, and Boolean Types
Numeric Types: Precision vs Range
| Type | Storage | Range/Precision | Use Case |
|---|---|---|---|
| TINYINT | 1 byte | 0-255 | Status codes, small counters |
| SMALLINT | 2 bytes | -32K to 32K | Product IDs, small quantities |
| INT / INTEGER | 4 bytes | -2B to 2B | User IDs, most counters (default choice) |
| BIGINT | 8 bytes | -9E18 to 9E18 | Timestamps, analytics metrics |
| DECIMAL(10,2) | ~5 bytes | Exact decimals | Money, percentages (no rounding errors!) |
| FLOAT/DOUBLE | 4/8 bytes | Approx decimals | Scientific data (avoid for money!) |
-- E-commerce price storage (CORRECT)
CREATE TABLE products (
product_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
price DECIMAL(10,2) NOT NULL, -- ₹99999.99 max
discount_percent DECIMAL(5,2) DEFAULT 0.00 CHECK (discount_percent >= 0),
stock_quantity SMALLINT UNSIGNED DEFAULT 0
);
-- NEVER store money as FLOAT!
-- 0.1 + 0.2 = 0.3000000004 😱
String Types: Fixed vs Variable Length
| Type | Storage | Use Case | Example |
|---|---|---|---|
| CHAR(10) | 10 bytes fixed | Fixed-length codes | Country codes 'IN', 'US' |
| VARCHAR(100) | len + 1-2 bytes | Names, emails, URLs | 'Rahul Patel', 'user@domain.com' |
| TEXT | Dynamic | Long descriptions | Product descriptions, comments |
-- Smart string sizing
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(30) NOT NULL UNIQUE,
email VARCHAR(254) NOT NULL UNIQUE, -- RFC 5321 max
first_name VARCHAR(50),
bio TEXT -- Blog posts, descriptions
);
Date/Time Types: Never Use Strings!
CREATE TABLE events (
event_id INT PRIMARY KEY,
event_name VARCHAR(100),
event_date DATE, -- 2025-01-15
start_time TIME, -- 14:30:00
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NULL, -- Supports NULL
timezone_offset INT DEFAULT 330 -- IST +5:30 hours
);
Never: event_date VARCHAR(10) → No date arithmetic, sorting issues, timezone hell.
-- Powerful date queries
SELECT * FROM events
WHERE event_date BETWEEN '2025-01-01' AND '2025-01-31'
AND HOUR(start_time) BETWEEN 9 AND 18;
Boolean: TINYINT(1) vs BOOLEAN
-- All equivalent, use BOOLEAN for clarity
is_active BOOLEAN DEFAULT TRUE,
is_deleted BOOLEAN DEFAULT FALSE,
priority_level TINYINT(1) CHECK (priority_level IN (0,1,2));
Dealing with Large Data: BLOBs and TEXT
| Type | Max Size | Use Case |
|---|---|---|
| TINYTEXT | 255 bytes | Short notes |
| TEXT | 64KB | Articles, descriptions |
| MEDIUMTEXT | 16MB | Long documents |
| LONGTEXT | 4GB | Books, logs |
| BLOB family | Same | Binary: images, files |
Rule: Store file metadata + URLs in DB, files in S3/Cloud Storage. BLOBs slow down backups/exports.
Normalization Basics: 1NF, 2NF, 3NF
1NF: Atomic Values Only
Bad: skills VARCHAR(500) = 'Python,JavaScript,SQL'
Good: Separate user_skills junction table
2NF: Eliminate Partial Dependencies
Non-key attributes depend on entire PRIMARY KEY, not part of it.
3NF: Eliminate Transitive Dependencies
Non-key attributes don't depend on other non-key attributes.
-- 3NF E-commerce Schema
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
city VARCHAR(50)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE order_items (
order_item_id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
FOREIGN KEY (order_id) REFERENCES orders(order_id)
);
Denormalization for Analytics Use Cases
Read-heavy workloads (reports, dashboards) benefit from denormalization:
- Summary tables: Daily revenue rollups
- Materialized views: Frequently JOINed data
- JSON columns: Flexible denormalized payloads
-- Analytics denormalized table
CREATE TABLE daily_sales_summary (
report_date DATE PRIMARY KEY,
total_orders INT,
total_revenue DECIMAL(12,2),
new_customers INT,
top_product JSON -- {"name": "Laptop", "revenue": 75000}
);
Practical Tips for Designing Tables
✅ Schema Design Checklist
- Naming: snake_case, plural tables (
users), singular foreign keys (user_id) - Primary Key: Always INT/BIGINT AUTO_INCREMENT (avoid VARCHAR)
- Foreign Keys: Explicitly declare + index
- NOT NULL: Default for non-optional columns
- Data Types: Smallest that works (TINYINT before INT)
- Defaults: TIMESTAMP, 0, '', sensible fallbacks
- Indexes: Keys + frequent WHERE/JOIN columns
- Audit Columns:
created_at,updated_at,created_by
-- Production-ready user table
CREATE TABLE users (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(254) NOT NULL UNIQUE,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
phone VARCHAR(15),
is_active BOOLEAN DEFAULT TRUE,
role ENUM('user', 'admin', 'moderator') DEFAULT 'user',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX idx_email_active (email, is_active),
INDEX idx_created (created_at)
) ENGINE=InnoDB;
Master data types and normalization to build scalable, performant schemas that grow gracefully with your application. Right types = right performance. Normalization = maintainable design. Checklist = production readiness.
Comments
Post a Comment