Real-World Data Relationships

In real applications, data doesn’t exist in isolation. Consider these common examples:

  • Blog posts are written by users.
  • Orders are placed by customers.
  • Comments belong to blog posts.
  • Products are organised into categories.

Without relationships, you’d have to store redundant information in every record. For example, you’d store the author’s name and email in every single blog post they write. With relationships, you store just a reference number that points to the user who wrote it, eliminating data duplication and ensuring consistency when user information changes.

The Problem with Disconnected Data

Imagine if our blog stored posts like this:

CREATE TABLE posts (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(200),
  content TEXT,
  author_name VARCHAR(100),
  author_email VARCHAR(255)
);

In this approach, we store the author’s name and email directly in each post. This means every time a user writes a post, we store their name and email again and again. If a user writes 10 posts, we’d store their name and email 10 times.

Problems with this approach

  • Data duplication: User information is copied in every post.
  • Update complexity: If a user changes their email, you’d need to update every post.
  • Storage waste: Repeated information takes up unnecessary space.
  • Data inconsistency risk: Easy to forget updating all copies, leading to mismatched information.

The Solution: Data Relationships

Instead, we store a reference to the user:

CREATE TABLE posts (
  id INT PRIMARY KEY AUTO_INCREMENT,
  title VARCHAR(200),
  content TEXT,
  user_id INT
);

In this improved approach, the user_id field is a reference to the user who wrote this post. Instead of storing duplicate user information, we store just the user’s ID number.

This creates a one-to-many relationship: one user can write many posts, but each post belongs to exactly one user.


Tags: