Chapter 9: “Semi-Structured Data and NoSQL”
Relational databases are powerful, but not always flexible. In real-world scenarios, especially those involving: Varying schemas, Unpredictable fields, Nested or hierarchical data. A rigid table model can become limiting. That’s where semi-structured data and NoSQL systems excel.
What Is Semi-Structured Data?
Semi-structured data has some organization but doesn’t fit neatly into tables. Common formats include: JSON and XML. A reason that it doesn’t neatly fit into tables is that there is nested data which would not neatly fit into one table. We can see from the example below that “preferences” has 2 fields of nested data within it.
Example (JSON):
{ "user_id": 102, "name": "Sasha", "preferences": { "notifications": true, "themes": ["dark", "contrast"] } }
This structure is great for user settings, metadata, IoT readings, and logs.
SQL Meets Semi-Structured: JSON in PostgreSQL
PostgreSQL offers native support for JSON and JSONB (binary JSON) fields:
CREATE TABLE users ( user_id SERIAL PRIMARY KEY, profile JSONB );
You can query deeply nested fields:
SELECT profile->>'name', profile->'preferences'->'notifications' FROM users;
You can also index JSON fields:
CREATE INDEX idx_profile_theme ON users USING GIN (profile jsonb_path_ops);
Introducing NoSQL
NoSQL = "Not Only SQL" — a set of database technologies designed for flexibility, scalability, and speed. Types include: MongoDB, Hbase, and Cassandra. We will link to the documentation, but we will not be going in depth with most of these. We will briefly touch up on MongoDB as it is one of the most popular NoSQL system.
MongoDB is the most popular NoSQL system. It stores data as BSON documents such as:
{ "_id": ObjectId("..."), "name": "Lia", "orders": [ {"item": "tent", "qty": 1}, {"item": "water filter", "qty": 2} ] }
Key Concepts of MongoDB
- Collections = Tables
- Documents = Rows
- Fields = Columns
- Schema is flexible and dynamic
Querying in MongoDB
MongoDB uses a JavaScript-like query language:
Find all users with 'dark' in preferences
db.users.find({ "preferences.themes": "dark" });
Aggregate functions resemble SQL pipelines:
db.orders.aggregate([ { $unwind: "$orders" }, { $group: { _id: "$name", total: { $sum: "$orders.qty" } }} ]);
Pros and Cons of NoSQL
Advantages:
- Schema flexibility
- Fast writes and reads at scale
- Suited for real-time apps, mobile, IoT
Disadvantages:
- Weaker consistency guarantees (eventual consistency)
- Less mature tooling for analytics
- Joins and constraints are limited or absent
When to Use SQL vs NoSQL
Requirement | Best Fit |
---|---|
Rigid schema, normalized data | SQL |
Flexible schema, nested data | NoSQL |
Complex joins and constraints | SQL |
High write throughput | NoSQL |
Strong transactional integrity | SQL |
Rapid prototyping | NoSQL |
Hybrid Approaches
Many teams now use both: SQL for operational records and structured reporting and NoSQL for logging, caching, personalization, or event data. PostgreSQL and MySQL now support JSON; MongoDB supports SQL like interfaces and connectors.