SQL Data Types & Schema Design: Build Bulletproof Databases

Choosing correct SQL data types isn't just about "fitting the data"—it's about storage efficiency, query performance, data integrity, and future-proofing your schema. Wrong types cause bloat (VARCHAR(255) everywhere), slow queries (storing dates as strings), and validation bugs (INT for prices). This guide covers data type selection, normalization fundamentals, and production schema patterns. 

Modular 16:9 banner for SQL Data Types blog with precise building block schema visuals and exact "SQL DATA TYPES" title text. 

Why Choosing the Right Data Type Matters

Data types impact:

  • Storage: TINYINT (1 byte) vs VARCHAR(50) (50+ bytes)
  • Performance: Indexable types vs TEXT/BLOB scans
  • Correctness: DECIMAL(10,2) vs FLOAT rounding errors
  • Queries: DATE functions vs string parsing

Numeric, String, Date/Time, and Boolean Types

Numeric Types: Precision vs Range

Type Storage Range/Precision Use Case
TINYINT 1 byte 0-255 Status codes, small counters
SMALLINT 2 bytes -32K to 32K Product IDs, small quantities
INT / INTEGER 4 bytes -2B to 2B User IDs, most counters (default choice)
BIGINT 8 bytes -9E18 to 9E18 Timestamps, analytics metrics
DECIMAL(10,2) ~5 bytes Exact decimals Money, percentages (no rounding errors!)
FLOAT/DOUBLE 4/8 bytes Approx decimals Scientific data (avoid for money!)
-- E-commerce price storage (CORRECT)
CREATE TABLE products (
  product_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  price DECIMAL(10,2) NOT NULL,  -- ₹99999.99 max
  discount_percent DECIMAL(5,2) DEFAULT 0.00 CHECK (discount_percent >= 0),
  stock_quantity SMALLINT UNSIGNED DEFAULT 0
);

-- NEVER store money as FLOAT!
-- 0.1 + 0.2 = 0.3000000004 😱

String Types: Fixed vs Variable Length

Type Storage Use Case Example
CHAR(10) 10 bytes fixed Fixed-length codes Country codes 'IN', 'US'
VARCHAR(100) len + 1-2 bytes Names, emails, URLs 'Rahul Patel', 'user@domain.com'
TEXT Dynamic Long descriptions Product descriptions, comments
-- Smart string sizing
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(30) NOT NULL UNIQUE,
  email VARCHAR(254) NOT NULL UNIQUE,  -- RFC 5321 max
  first_name VARCHAR(50),
  bio TEXT  -- Blog posts, descriptions
);

Date/Time Types: Never Use Strings!

CREATE TABLE events (
  event_id INT PRIMARY KEY,
  event_name VARCHAR(100),
  event_date DATE,           -- 2025-01-15
  start_time TIME,           -- 14:30:00  
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  expires_at TIMESTAMP NULL, -- Supports NULL
  timezone_offset INT DEFAULT 330  -- IST +5:30 hours
);

Never: event_date VARCHAR(10) → No date arithmetic, sorting issues, timezone hell.

-- Powerful date queries
SELECT * FROM events 
WHERE event_date BETWEEN '2025-01-01' AND '2025-01-31'
  AND HOUR(start_time) BETWEEN 9 AND 18;

Boolean: TINYINT(1) vs BOOLEAN

-- All equivalent, use BOOLEAN for clarity
is_active BOOLEAN DEFAULT TRUE,
is_deleted BOOLEAN DEFAULT FALSE,
priority_level TINYINT(1) CHECK (priority_level IN (0,1,2));

Dealing with Large Data: BLOBs and TEXT

Type Max Size Use Case
TINYTEXT 255 bytes Short notes
TEXT 64KB Articles, descriptions
MEDIUMTEXT 16MB Long documents
LONGTEXT 4GB Books, logs
BLOB family Same Binary: images, files

Rule: Store file metadata + URLs in DB, files in S3/Cloud Storage. BLOBs slow down backups/exports.

Normalization Basics: 1NF, 2NF, 3NF

1NF: Atomic Values Only

Bad: skills VARCHAR(500) = 'Python,JavaScript,SQL'

Good: Separate user_skills junction table

2NF: Eliminate Partial Dependencies

Non-key attributes depend on entire PRIMARY KEY, not part of it.

3NF: Eliminate Transitive Dependencies

Non-key attributes don't depend on other non-key attributes.

-- 3NF E-commerce Schema
CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(100),
  city VARCHAR(50)
);

CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  order_date DATE,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE order_items (
  order_item_id INT PRIMARY KEY,
  order_id INT,
  product_id INT,
  quantity INT,
  FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Denormalization for Analytics Use Cases

Read-heavy workloads (reports, dashboards) benefit from denormalization:

  • Summary tables: Daily revenue rollups
  • Materialized views: Frequently JOINed data
  • JSON columns: Flexible denormalized payloads
-- Analytics denormalized table
CREATE TABLE daily_sales_summary (
  report_date DATE PRIMARY KEY,
  total_orders INT,
  total_revenue DECIMAL(12,2),
  new_customers INT,
  top_product JSON  -- {"name": "Laptop", "revenue": 75000}
);

Practical Tips for Designing Tables

✅ Schema Design Checklist

  1. Naming: snake_case, plural tables (users), singular foreign keys (user_id)
  2. Primary Key: Always INT/BIGINT AUTO_INCREMENT (avoid VARCHAR)
  3. Foreign Keys: Explicitly declare + index
  4. NOT NULL: Default for non-optional columns
  5. Data Types: Smallest that works (TINYINT before INT)
  6. Defaults: TIMESTAMP, 0, '', sensible fallbacks
  7. Indexes: Keys + frequent WHERE/JOIN columns
  8. Audit Columns: created_at, updated_at, created_by
-- Production-ready user table
CREATE TABLE users (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  email VARCHAR(254) NOT NULL UNIQUE,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL,
  phone VARCHAR(15),
  is_active BOOLEAN DEFAULT TRUE,
  role ENUM('user', 'admin', 'moderator') DEFAULT 'user',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  INDEX idx_email_active (email, is_active),
  INDEX idx_created (created_at)
) ENGINE=InnoDB;

Master data types and normalization to build scalable, performant schemas that grow gracefully with your application. Right types = right performance. Normalization = maintainable design. Checklist = production readiness.

Comments