SQL Data Types & Schema Design: Build Bulletproof Databases

Choosing correct SQL data types isn't just about "fitting the data"—it's about storage efficiency, query performance, data integrity, and future-proofing your schema. Wrong types cause bloat (VARCHAR(255) everywhere), slow queries (storing dates as strings), and validation bugs (INT for prices). This guide covers data type selection, normalization fundamentals, and production schema patterns.

Modular 16:9 banner for SQL Data Types blog with precise building block schema visuals and exact "SQL DATA TYPES" title text.

Why Choosing the Right Data Type Matters

Data types impact:

Storage: TINYINT (1 byte) vs VARCHAR(50) (50+ bytes)
Performance: Indexable types vs TEXT/BLOB scans
Correctness: DECIMAL(10,2) vs FLOAT rounding errors
Queries: DATE functions vs string parsing

Numeric, String, Date/Time, and Boolean Types

Numeric Types: Precision vs Range

Type	Storage	Range/Precision	Use Case
TINYINT	1 byte	0-255	Status codes, small counters
SMALLINT	2 bytes	-32K to 32K	Product IDs, small quantities
INT / INTEGER	4 bytes	-2B to 2B	User IDs, most counters (default choice)
BIGINT	8 bytes	-9E18 to 9E18	Timestamps, analytics metrics
DECIMAL(10,2)	~5 bytes	Exact decimals	Money, percentages (no rounding errors!)
FLOAT/DOUBLE	4/8 bytes	Approx decimals	Scientific data (avoid for money!)

-- E-commerce price storage (CORRECT)
CREATE TABLE products (
  product_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  price DECIMAL(10,2) NOT NULL,  -- ₹99999.99 max
  discount_percent DECIMAL(5,2) DEFAULT 0.00 CHECK (discount_percent >= 0),
  stock_quantity SMALLINT UNSIGNED DEFAULT 0
);

-- NEVER store money as FLOAT!
-- 0.1 + 0.2 = 0.3000000004 😱

String Types: Fixed vs Variable Length

Type	Storage	Use Case	Example
CHAR(10)	10 bytes fixed	Fixed-length codes	Country codes 'IN', 'US'
VARCHAR(100)	len + 1-2 bytes	Names, emails, URLs	'Rahul Patel', 'user@domain.com'
TEXT	Dynamic	Long descriptions	Product descriptions, comments

-- Smart string sizing
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(30) NOT NULL UNIQUE,
  email VARCHAR(254) NOT NULL UNIQUE,  -- RFC 5321 max
  first_name VARCHAR(50),
  bio TEXT  -- Blog posts, descriptions
);

Date/Time Types: Never Use Strings!

CREATE TABLE events (
  event_id INT PRIMARY KEY,
  event_name VARCHAR(100),
  event_date DATE,           -- 2025-01-15
  start_time TIME,           -- 14:30:00  
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NULL, -- Supports NULL
  timezone_offset INT DEFAULT 330  -- IST +5:30 hours
);

Never: event_date VARCHAR(10) → No date arithmetic, sorting issues, timezone hell.

-- Powerful date queries
SELECT * FROM events 
WHERE event_date BETWEEN '2025-01-01' AND '2025-01-31'
  AND HOUR(start_time) BETWEEN 9 AND 18;

Boolean: TINYINT(1) vs BOOLEAN

-- All equivalent, use BOOLEAN for clarity
is_active BOOLEAN DEFAULT TRUE,
is_deleted BOOLEAN DEFAULT FALSE,
priority_level TINYINT(1) CHECK (priority_level IN (0,1,2));

Dealing with Large Data: BLOBs and TEXT

Type	Max Size	Use Case
TINYTEXT	255 bytes	Short notes
TEXT	64KB	Articles, descriptions
MEDIUMTEXT	16MB	Long documents
LONGTEXT	4GB	Books, logs
BLOB family	Same	Binary: images, files

Rule: Store file metadata + URLs in DB, files in S3/Cloud Storage. BLOBs slow down backups/exports.

Normalization Basics: 1NF, 2NF, 3NF

1NF: Atomic Values Only

Bad: skills VARCHAR(500) = 'Python,JavaScript,SQL'

Good: Separate user_skills junction table

2NF: Eliminate Partial Dependencies

Non-key attributes depend on entire PRIMARY KEY, not part of it.

3NF: Eliminate Transitive Dependencies

Non-key attributes don't depend on other non-key attributes.

-- 3NF E-commerce Schema
CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(100),
  city VARCHAR(50)
);

CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  order_date DATE,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE order_items (
  order_item_id INT PRIMARY KEY,
  order_id INT,
  product_id INT,
  quantity INT,
  FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Denormalization for Analytics Use Cases

Read-heavy workloads (reports, dashboards) benefit from denormalization:

Summary tables: Daily revenue rollups
Materialized views: Frequently JOINed data
JSON columns: Flexible denormalized payloads

-- Analytics denormalized table
CREATE TABLE daily_sales_summary (
  report_date DATE PRIMARY KEY,
  total_orders INT,
  total_revenue DECIMAL(12,2),
  new_customers INT,
  top_product JSON  -- {"name": "Laptop", "revenue": 75000}
);

Practical Tips for Designing Tables

✅ Schema Design Checklist

Naming: snake_case, plural tables (users), singular foreign keys (user_id)
Primary Key: Always INT/BIGINT AUTO_INCREMENT (avoid VARCHAR)
Foreign Keys: Explicitly declare + index
NOT NULL: Default for non-optional columns
Data Types: Smallest that works (TINYINT before INT)
Defaults: TIMESTAMP, 0, '', sensible fallbacks
Indexes: Keys + frequent WHERE/JOIN columns
Audit Columns: created_at, updated_at, created_by

-- Production-ready user table
CREATE TABLE users (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  email VARCHAR(254) NOT NULL UNIQUE,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  phone VARCHAR(15),
  is_active BOOLEAN DEFAULT TRUE,
  role ENUM('user', 'admin', 'moderator') DEFAULT 'user',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  INDEX idx_email_active (email, is_active),
  INDEX idx_created (created_at)
) ENGINE=InnoDB;

Master data types and normalization to build scalable, performant schemas that grow gracefully with your application. Right types = right performance. Normalization = maintainable design. Checklist = production readiness.

SHEET INSIGHTS