SQL for Data Science – Overview

Julio Castillo

This SQL track is designed to take beginners from fundamental concepts to advanced techniques used in data science, data engineering, and analytics. You'll learn the structure and logic of relational databases, write optimized queries, and explore real-world use cases including pipelines, semi-structured data, and modern scalable systems. We use PostgreSQL throughout, but also mention other tools and DBMS you’re likely to encounter in the industry.

Chapter 1: What is SQL and Why Databases Matter — Real-world applications, relational models, and differences from NoSQL.
Chapter 2: Basic SQL Syntax — SELECT, WHERE, GROUP BY, HAVING, ORDER BY, etc.
Chapter 3: Intermediate Queries and Joins — Subqueries, JOIN types, EXISTS, IN, and more.
Chapter 4: Views, CTEs, and Materialized Views — Structure, reuse, and optimization tradeoffs.
Chapter 5: Window Functions — RANK, ROW_NUMBER, LAG/LEAD, PARTITION BY, and more.
Chapter 6: Query Optimization and Performance — EXPLAIN ANALYZE, indexing, normalization.
Chapter 7: Data Modeling and Preparation — ER diagrams, constraints, normalization, cleaning.
Chapter 8: SQL in Pipelines and Parallel Systems — ETL, dbt, MPPs, scaling, parallel execution.
Chapter 9: Semi-Structured Data and NoSQL — Working with JSON/XML, MongoDB basics.
Chapter 10: SQL in the Real World — Data science, engineering, and analytics use cases.

SQL for Data Science – Overview

Table of Contents