Chapter 9: “Semi-Structured Data and NoSQL”

Relational databases are powerful, but not always flexible. In real-world scenarios, especially those involving: Varying schemas, Unpredictable fields, Nested or hierarchical data. A rigid table model can become limiting. That’s where semi-structured data and NoSQL systems excel.

What Is Semi-Structured Data?

Semi-structured data has some organization but doesn’t fit neatly into tables. Common formats include: JSON and XML. A reason that it doesn’t neatly fit into tables is that there is nested data which would not neatly fit into one table. We can see from the example below that “preferences” has 2 fields of nested data within it.

Example (JSON):

{
  "user_id": 102,
  "name": "Sasha",
  "preferences": {
    "notifications": true,
    "themes": ["dark", "contrast"]
  }
}
    

This structure is great for user settings, metadata, IoT readings, and logs.

SQL Meets Semi-Structured: JSON in PostgreSQL

PostgreSQL offers native support for JSON and JSONB (binary JSON) fields:

CREATE TABLE users (
  user_id SERIAL PRIMARY KEY,
  profile JSONB
);
    

You can query deeply nested fields:

SELECT profile->>'name', profile->'preferences'->'notifications'
FROM users;
    

You can also index JSON fields:

CREATE INDEX idx_profile_theme ON users USING GIN (profile jsonb_path_ops);
    

Introducing NoSQL

NoSQL = "Not Only SQL" — a set of database technologies designed for flexibility, scalability, and speed. Types include: MongoDB, Hbase, and Cassandra. We will link to the documentation, but we will not be going in depth with most of these. We will briefly touch up on MongoDB as it is one of the most popular NoSQL system.

MongoDB is the most popular NoSQL system. It stores data as BSON documents such as:

{
  "_id": ObjectId("..."),
  "name": "Lia",
  "orders": [
    {"item": "tent", "qty": 1},
    {"item": "water filter", "qty": 2}
  ]
}
    

Key Concepts of MongoDB

  • Collections = Tables
  • Documents = Rows
  • Fields = Columns
  • Schema is flexible and dynamic

Querying in MongoDB

MongoDB uses a JavaScript-like query language:

Find all users with 'dark' in preferences

db.users.find({
  "preferences.themes": "dark"
});
    

Aggregate functions resemble SQL pipelines:

db.orders.aggregate([
  { $unwind: "$orders" },
  { $group: {
      _id: "$name",
      total: { $sum: "$orders.qty" }
  }}
]);
    

Pros and Cons of NoSQL

Advantages:

  • Schema flexibility
  • Fast writes and reads at scale
  • Suited for real-time apps, mobile, IoT

Disadvantages:

  • Weaker consistency guarantees (eventual consistency)
  • Less mature tooling for analytics
  • Joins and constraints are limited or absent

When to Use SQL vs NoSQL

Requirement Best Fit
Rigid schema, normalized data SQL
Flexible schema, nested data NoSQL
Complex joins and constraints SQL
High write throughput NoSQL
Strong transactional integrity SQL
Rapid prototyping NoSQL

Hybrid Approaches

Many teams now use both: SQL for operational records and structured reporting and NoSQL for logging, caching, personalization, or event data. PostgreSQL and MySQL now support JSON; MongoDB supports SQL like interfaces and connectors.