Purpose: Senior-level SQL puzzles asked in nearly every backend, full-stack, and data-engineering interview. Each question includes multiple solution approaches, trade-offs, and explanation of edge cases.
- Ranking and Top-N Problems
- Duplicate Detection and Removal
- Cumulative and Running Calculations
- Anti-Join Patterns
- Gaps and Islands
- Pivoting and Unpivoting
- Self-Joins and Hierarchies
- Statistics: Median, Mode, Percentiles
- Date Math and Trends
- Window Function Patterns
- LeetCode-Style Hard Patterns
Schema: employees(id, name, salary)
Method 1 — Window function (preferred):
SELECT DISTINCT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk = N;DENSE_RANK() handles ties correctly: if two people share the highest salary, the next one is rank 2, not 3.
Method 2 — Correlated subquery:
SELECT DISTINCT salary
FROM employees e1
WHERE N - 1 = (
SELECT COUNT(DISTINCT salary)
FROM employees e2
WHERE e2.salary > e1.salary
);Reads: "find the salary where exactly N-1 distinct salaries are higher."
Method 3 — LIMIT / OFFSET:
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET N - 1;Simple and fast on small data, but OFFSET ignores ties (returns one row even if duplicates exist at that rank).
Trade-offs:
| Method | Handles ties | Performance | Portability |
|---|---|---|---|
| DENSE_RANK | Yes | O(n log n) | Postgres, MySQL 8+, SQL Server, Oracle |
| Correlated | Yes | O(n²) — bad on large tables | All dialects |
| LIMIT/OFFSET | No | Fast | Postgres, MySQL; SQL Server uses OFFSET FETCH |
-- Returns NULL if only one distinct salary exists
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);The classic LeetCode answer. The outer MAX returns NULL when no row qualifies — a common requirement.
Schema: employees(id, name, salary, department_id)
WITH ranked AS (
SELECT
department_id,
name,
salary,
DENSE_RANK() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS rnk
FROM employees
)
SELECT department_id, name, salary
FROM ranked
WHERE rnk = N;PARTITION BY resets the rank for each department.
SELECT department_id, name, salary
FROM (
SELECT
department_id, name, salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk <= 3;Use RANK() if you want strict positional ranking (skipping after ties). Use ROW_NUMBER() if you must return exactly 3 even when ties exist.
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY avg_salary DESC
LIMIT 1;For ties, use a window:
WITH dept AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT department_id, avg_salary
FROM dept
WHERE avg_salary = (SELECT MAX(avg_salary) FROM dept);SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;To list every duplicate row (not just the email):
SELECT *
FROM users
WHERE email IN (
SELECT email FROM users GROUP BY email HAVING COUNT(*) > 1
);Postgres / SQL Server (CTE + ROW_NUMBER):
WITH dups AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
)
DELETE FROM users
WHERE id IN (SELECT id FROM dups WHERE rn > 1);MySQL (self-join):
DELETE u1 FROM users u1
INNER JOIN users u2
ON u1.email = u2.email AND u1.id > u2.id;SELECT product_id
FROM orders
GROUP BY product_id
HAVING COUNT(*) = 3;SELECT first_name, last_name, dob, COUNT(*)
FROM people
GROUP BY first_name, last_name, dob
HAVING COUNT(*) > 1;Schema: orders(id, user_id, order_date, amount)
SELECT
order_date,
SUM(amount) AS daily_total,
SUM(SUM(amount)) OVER (ORDER BY order_date) AS running_total
FROM orders
GROUP BY order_date
ORDER BY order_date;Why nested SUM works: SUM(amount) aggregates per day; the outer SUM is a window function over those daily totals.
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS user_running_total
FROM orders;The explicit ROWS BETWEEN ... frame avoids ambiguity. Without it, ORDER BY defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which can give surprising results with duplicate dates.
SELECT
order_date,
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_orders;Note: this uses row-based window. If gaps in dates exist, use RANGE BETWEEN INTERVAL '6 day' PRECEDING AND CURRENT ROW (Postgres) for date-based windows.
SELECT
product_id,
revenue,
100.0 * SUM(revenue) OVER (ORDER BY revenue DESC)
/ SUM(revenue) OVER () AS cumulative_pct
FROM products
ORDER BY revenue DESC;Useful for Pareto / 80-20 analysis.
Schema: customers(id), orders(id, customer_id)
Method 1 — LEFT JOIN with NULL filter:
SELECT c.*
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;Method 2 — NOT EXISTS (handles NULLs safely):
SELECT c.*
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);Method 3 — NOT IN (avoid if column may be NULL):
SELECT *
FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders WHERE customer_id IS NOT NULL);NOT IN returns zero rows when a NULL is in the list — always guard against it.
SELECT p.*
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM order_items oi
WHERE oi.product_id = p.id
AND oi.created_at > NOW() - INTERVAL '30 day'
);SELECT DISTINCT user_id
FROM activity
WHERE DATE_TRUNC('month', activity_date) = '2025-03-01'
AND user_id NOT IN (
SELECT user_id FROM activity
WHERE DATE_TRUNC('month', activity_date) = '2025-04-01'
);Schema: activity(user_id, activity_date), one row per user per day.
WITH grouped AS (
SELECT
user_id,
activity_date,
activity_date - INTERVAL '1 day' *
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) AS grp
FROM activity
)
SELECT user_id, MIN(activity_date) AS streak_start, COUNT(*) AS streak_len
FROM grouped
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;Trick explained: subtracting ROW_NUMBER() from the date produces the same value for all rows in a consecutive run. Group by that value to get islands.
SELECT id + 1 AS gap_start,
next_id - 1 AS gap_end
FROM (
SELECT id, LEAD(id) OVER (ORDER BY id) AS next_id
FROM tickets
) t
WHERE next_id - id > 1;WITH grp AS (
SELECT user_id, login_date,
login_date - INTERVAL '1 day' *
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS island
FROM logins
),
streaks AS (
SELECT user_id, COUNT(*) AS streak
FROM grp
GROUP BY user_id, island
)
SELECT user_id, MAX(streak) AS longest_streak
FROM streaks
GROUP BY user_id;Schema: sales(year, quarter, amount)
Standard SQL using CASE + SUM:
SELECT
year,
SUM(CASE WHEN quarter = 'Q1' THEN amount ELSE 0 END) AS q1,
SUM(CASE WHEN quarter = 'Q2' THEN amount ELSE 0 END) AS q2,
SUM(CASE WHEN quarter = 'Q3' THEN amount ELSE 0 END) AS q3,
SUM(CASE WHEN quarter = 'Q4' THEN amount ELSE 0 END) AS q4
FROM sales
GROUP BY year;Postgres tablefunc:
SELECT * FROM crosstab(
'SELECT year, quarter, amount FROM sales ORDER BY 1, 2',
'VALUES (''Q1''), (''Q2''), (''Q3''), (''Q4'')'
) AS ct(year INT, q1 NUMERIC, q2 NUMERIC, q3 NUMERIC, q4 NUMERIC);SQL Server / Oracle:
SELECT * FROM sales
PIVOT (SUM(amount) FOR quarter IN ([Q1], [Q2], [Q3], [Q4])) p;SELECT year, 'Q1' AS quarter, q1 AS amount FROM sales_wide
UNION ALL
SELECT year, 'Q2', q2 FROM sales_wide
UNION ALL
SELECT year, 'Q3', q3 FROM sales_wide
UNION ALL
SELECT year, 'Q4', q4 FROM sales_wide;Postgres also supports unnest() with arrays, and SQL Server has UNPIVOT.
Schema: employees(id, name, manager_id)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;LEFT JOIN ensures the CEO (no manager) still appears.
SELECT e.name
FROM employees e
JOIN employees m ON m.id = e.manager_id
WHERE e.salary > m.salary;WITH RECURSIVE subordinates AS (
SELECT id, name, manager_id
FROM employees
WHERE id = 100 -- root manager
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;The recursive part keeps adding employees whose manager is already in the result.
WITH RECURSIVE org AS (
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, o.depth + 1
FROM employees e
JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org ORDER BY depth, id;Schema: friendships(user_a, user_b) — assume no self-pairs.
SELECT LEAST(user_a, user_b) AS u1, GREATEST(user_a, user_b) AS u2
FROM friendships
GROUP BY 1, 2;This canonicalises every pair so (1, 2) and (2, 1) collapse into one.
Method 1 — PERCENTILE_CONT (Postgres, Oracle, SQL Server):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;Method 2 — Window function with average of middle rows:
WITH ranked AS (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER () AS cnt
FROM employees
)
SELECT AVG(salary) AS median
FROM ranked
WHERE rn IN ((cnt + 1) / 2, (cnt + 2) / 2);Handles even and odd row counts uniformly.
Method 3 — MySQL (no PERCENTILE_CONT before 8.0):
SELECT AVG(salary) FROM (
SELECT salary FROM employees
ORDER BY salary
LIMIT 2 - (SELECT COUNT(*) FROM employees) % 2
OFFSET (SELECT (COUNT(*) - 1) / 2 FROM employees)
) t;SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
ORDER BY cnt DESC
LIMIT 1;If ties matter:
WITH counts AS (
SELECT category, COUNT(*) AS cnt FROM products GROUP BY category
)
SELECT category FROM counts
WHERE cnt = (SELECT MAX(cnt) FROM counts);SELECT
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS q1,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY salary) AS median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS q3
FROM employees;WITH monthly AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY 1
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0) AS mom_growth_pct
FROM monthly
ORDER BY month;NULLIF prevents divide-by-zero when the previous month is 0.
SELECT
EXTRACT(WEEK FROM order_date) AS wk,
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2024 THEN amount END) AS rev_2024,
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2025 THEN amount END) AS rev_2025
FROM orders
GROUP BY 1
ORDER BY wk;SELECT id, value, prev_value
FROM (
SELECT
id,
value,
LAG(value) OVER (ORDER BY id) AS prev_value
FROM measurements
) t
WHERE value > prev_value;SELECT
user_id,
order_date,
order_date - LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS days_since_prev
FROM orders;SELECT customer_id,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders
GROUP BY customer_id;To get the full row, use FIRST_VALUE / LAST_VALUE window functions or a join back.
| Function | Salary 100 | Salary 100 | Salary 90 | Salary 80 |
|---|---|---|---|---|
| ROW_NUMBER | 1 | 2 | 3 | 4 |
| RANK | 1 | 1 | 3 | 4 |
| DENSE_RANK | 1 | 1 | 2 | 3 |
- ROW_NUMBER: unique sequential ids, breaks ties arbitrarily.
- RANK: ties share rank, leaves gaps.
- DENSE_RANK: ties share rank, no gaps.
SELECT
reading_time,
reading,
MAX(reading) OVER (ORDER BY reading_time
ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS rolling_max_10
FROM sensor;SELECT
user_id,
order_date,
amount,
FIRST_VALUE(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS first_amt,
LAST_VALUE(amount) OVER (
PARTITION BY user_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amt
FROM orders;LAST_VALUE requires the explicit unbounded frame — without it, "last" means "current row" in default frame.
SELECT
user_id,
total_spend,
NTILE(4) OVER (ORDER BY total_spend) AS quartile
FROM customer_spend;Useful for cohorting customers into spending tiers.
Schema: trips(id, client_id, driver_id, status, request_at), users(users_id, banned)
SELECT
request_at AS day,
ROUND(
SUM(CASE WHEN status <> 'completed' THEN 1 ELSE 0 END)::numeric
/ COUNT(*), 2
) AS cancellation_rate
FROM trips t
JOIN users c ON c.users_id = t.client_id AND c.banned = 'No'
JOIN users d ON d.users_id = t.driver_id AND d.banned = 'No'
WHERE request_at BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY request_at;WITH ranked AS (
SELECT
department_id, name, salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT department_id, name, salary
FROM ranked
WHERE rnk = 1;Schema: stadium(id, visit_date, people)
WITH qualifying AS (
SELECT id, visit_date, people,
id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM stadium
WHERE people >= 100
)
SELECT id, visit_date, people
FROM qualifying
WHERE grp IN (
SELECT grp FROM qualifying GROUP BY grp HAVING COUNT(*) >= 3
)
ORDER BY visit_date;Schema: customer(customer_id, product_key), product(product_key)
SELECT customer_id
FROM customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM product);Classic relational division pattern.
Schema: tree(id, p_id)
SELECT id,
CASE
WHEN p_id IS NULL THEN 'Root'
WHEN id IN (SELECT DISTINCT p_id FROM tree WHERE p_id IS NOT NULL) THEN 'Inner'
ELSE 'Leaf'
END AS type
FROM tree;Schema: seat(id, student)
SELECT
CASE
WHEN id % 2 = 1 AND id = (SELECT MAX(id) FROM seat) THEN id
WHEN id % 2 = 1 THEN id + 1
ELSE id - 1
END AS id,
student
FROM seat
ORDER BY id;Schema: queue(person_id, person_name, weight, turn), weight limit 1000.
WITH cum AS (
SELECT person_name, weight, turn,
SUM(weight) OVER (ORDER BY turn) AS running
FROM queue
)
SELECT person_name
FROM cum
WHERE running <= 1000
ORDER BY running DESC
LIMIT 1;WITH grp AS (
SELECT user_id, login_date,
login_date - INTERVAL '1 day' *
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS island
FROM logins
)
SELECT DISTINCT user_id
FROM grp
GROUP BY user_id, island
HAVING COUNT(*) >= 5;SELECT customer_id
FROM orders
WHERE order_date IN (CURRENT_DATE, CURRENT_DATE - 1)
GROUP BY customer_id
HAVING COUNT(DISTINCT order_date) = 2;SELECT REVERSE('hello'); -- 'olleh'In dialects without REVERSE, use a recursive CTE iterating characters.
SELECT user_id, order_id, order_date
FROM (
SELECT user_id, order_id, order_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) AS rn
FROM orders
) t
WHERE rn = 2;SELECT user_id, event_time, status,
SUM(CASE WHEN status <> prev_status OR prev_status IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM (
SELECT user_id, event_time, status,
LAG(status) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_status
FROM events
) t;Each new session starts when status differs from the previous row.
SELECT id, name, hire_date
FROM employees
WHERE hire_date >= CURRENT_DATE - INTERVAL '30 days';Index hire_date for fast filtering. Avoid wrapping the column in a function (WHERE DATE(hire_date) >= ...) — that defeats the index.
SELECT
COUNT(*) FILTER (WHERE status = 'active') AS active,
COUNT(*) FILTER (WHERE status = 'pending') AS pending,
COUNT(*) FILTER (WHERE status = 'closed') AS closed
FROM tickets;FILTER is Postgres syntax. In MySQL/SQL Server use SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END).
SELECT DISTINCT value
FROM numbers
ORDER BY value ASC
LIMIT 1 OFFSET K - 1;Equivalent to Nth highest, but ascending. Use DENSE_RANK for tie-aware behaviour.
-- UNION removes duplicates (extra sort)
SELECT id FROM table_a
UNION
SELECT id FROM table_b;
-- UNION ALL is faster — keeps duplicates
SELECT id FROM table_a
UNION ALL
SELECT id FROM table_b;If you know there are no duplicates, always prefer UNION ALL — it avoids a sort/dedup step.
-- IN: simple, but problematic with NULLs
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);
-- EXISTS: typically faster with correlated subqueries
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- NOT IN can return zero rows if subquery contains NULL
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM orders);
-- Safer with NOT EXISTS
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);NOT IN returns zero rows if any value in the subquery is NULL. NOT EXISTS handles NULLs correctly.
- Prefer window functions over correlated subqueries — O(n log n) versus O(n²).
- Indexes on PARTITION BY + ORDER BY columns speed up windowed queries.
- Avoid SELECT * when ranking — wide rows balloon temp storage.
EXISTSoverINfor subqueries with NULLable columns.UNION ALLoverUNIONif duplicates aren't an issue — saves a sort.- Beware row multiplication when joining one-to-many before aggregation; aggregate first or use window functions.
- Use
EXPLAIN ANALYZEto verify the planner picks expected indexes.