cheatsheet
> data structure
→ data ~ tables ~ spreadsheets
- row = records
- columns = fields
- each col has a data type (integer, varchar (text), date, …)
> basic SQL commands
SELECTretrieve data from one or more tables → list the col to displayFROMwhich table, can be aliases? ?? ❓ASrename final column outputWITH table AS queryINSERTadd new records (rows) to a tableUPDATEmodify existing records (rows) in a tableDELETEremove records (rows) from a tableCREATE,ALTER,DROPmanage db objects like tables and schemas
> filtering and sorting
WHEREclause filters results based on certain conditionsORDER BYsorts data by specified columnsGROUP BYgroups rows sharing values in specified columns for agg functions (SUM, AVG, …)
> relationships and joins
JOIN ... ONINNER JOINLEFT JOINRIGHT JOINFULL JOINUSING
> advanced features
COUNTSUM,MAX,MIN,AVG
> logic, condition, …
OR,AND,NOT,BETWEEN,LIKEIS NULLvs.IS NOT NULLINvs.NOT IN
> others?
LIMITCASE
CASE
WHEN ... THEN ...
WHEN ... THEN ...
ELSE ...
END AS varname- concat texts
var1 || ' ' || var2 - convert
var::text - search substring
var::text LIKE '%SUBSTRING' - substr the first 6 char
LEFT(var, 6)
/*... */or--= comments
SQL Arithmetic Operators
| Operator | Description | Example |
|---|---|---|
| + | Add | |
| - | Subtract | |
| * | Multiply | |
| / | Divide | |
| % | Modulo (remainder of division) |
SQL Bitwise Operators
| Operator | Description |
|---|---|
| & | Bitwise AND |
| | | Bitwise OR |
| ^ | Bitwise exclusive OR |
SQL Comparison Operators
| Operator | Description | Example |
|---|---|---|
| = | Equal to | |
| > | Greater than | |
| < | Less than | |
| >= | Greater than or equal to | |
| ⇐ | Less than or equal to | |
| <> | Not equal to |
SQL Logical Operators
| Operator | Description | Example |
|---|---|---|
| ALL | TRUE if all of the subquery values meet the condition | |
| AND | TRUE if all the conditions separated by AND is TRUE | |
| ANY | TRUE if any of the subquery values meet the condition | |
| BETWEEN | TRUE if the operand is within the range of comparisons | |
| EXISTS | TRUE if the subquery returns one or more records | |
| IN | TRUE if the operand is equal to one of a list of expressions | |
| LIKE | TRUE if the operand matches a pattern | |
| NOT | Displays a record if the condition(s) is NOT TRUE | |
| OR | TRUE if any of the conditions separated by OR is TRUE | |
| SOME | TRUE if any of the subquery values meet the condition | |
| XOR | TRUE if only 1 is true |
SQL Functions
source: https://sqlzoo.net/wiki/FUNCTIONS
| Function | Description | Example |
|---|---|---|
| % MODULO | Remainder of division (MOD()) vs. DIV() division operation | 15 % 4 → 3 |
| + (string) | String concatenation (in some DBs) … or use CONCAT() | 'abc' + 'def' → 'abcdef' |
| + INTERVAL | Add time interval to date | date + INTERVAL 7 DAY |
| + (dates) | Add days to date | date + 7 → date+7 days |
| ABS | Absolute value | ABS(-5) → 5 |
| AVG | Average of values | AVG(price) → 24.99 |
| CASE | Conditional logic ❗ don’t forget ‘END’ | CASE WHEN x>0 THEN 'pos' ELSE 'neg' END |
| CAST | Convert data type | CAST('123' AS INT) → 123 |
| CEIL | Round up to nearest integer vs. FLOOR (down) | CEIL(4.2) → 5 |
| COALESCE | Return first non-NULL value or REPLACE a NULL with a specific value | COALESCE(NULL, 5, 10) → 5COALESCE(leader, 'UNDEFINED!') |
| CONCAT | Concatenate strings | CONCAT('a', 'b') → 'ab' |
| COS | Cosine of angle | COS(0) → 1 |
| COUNT | Count rows | COUNT(*) → 42 |
| CURRENT_DATE | Today’s date | CURRENT_DATE → 2026-02-17 |
| CURRENT_TIMESTAMP | Current date and time | CURRENT_TIMESTAMP → 2026-02-17 14:30:00 |
| DATEPART | Extract date part | DATEPART(year, date) → 2026 |
| DAY | Day of month | DAY('2026-02-17') → 17 |
| DIV | Integer division vs. % = remainder | 15 DIV 4 → 3 |
| EXTRACT | Extract date part | EXTRACT(YEAR FROM date) → 2026 |
| FLOOR | Round down to nearest int vs. CEIL (up) | FLOOR(4.7) → 4 |
| HOUR | Hour from time | HOUR('14:30:00') → 14 |
| IFNULL | Return alternative if NULL | IFNULL(col, 0) → col or 0 |
| INSTR | Find substring position | INSTR('hello', 'll') → 3 |
| LEFT | Leftmost characters (substring) | LEFT('hello', 2) → 'he' |
| LEN | String length (SQL Server) | LEN('hello') → 5 |
| LENGTH | String length | LENGTH('hello') → 5 |
| LOWER | Lower case | LOWER('France') → 'france' |
| MAX | Maximum value | MAX(salary) → 150000 |
| MIN | Minimum value | MIN(salary) → 35000 |
| MINUTE | Minute from time | MINUTE('14:30:45') → 30 |
| MOD | Remainder (modulo) … same as % | MOD(15, 4) → 3 |
| MONTH | Month number | MONTH('2026-02-17') → 2 |
| NULLIF | NULL if values equal | NULLIF(5, 5) → NULL |
| NVL | Replace NULL (Oracle) | NVL(col, 0) → col or 0 |
| PATINDEX | Pattern position | PATINDEX('%ll%', 'hello') → 3 |
| POSITION | Substring position | POSITION('ll' IN 'hello') → 3 |
| QUARTER | Quarter of year | QUARTER('2026-02-17') → 1 |
| RANK | Rank with gaps | RANK() OVER(ORDER BY score) → 1,1,3 |
| REPLACE | Replace substring | REPLACE('hello', 'l', 'x') → 'hexxo' |
| RIGHT | Rightmost characters (substring) | RIGHT('hello', 2) → 'lo' |
| ROUND | Round number (param can be negative) | ROUND(4.25, 1) → 4.3 |
| SECOND | Seconds from time | SECOND('14:30:45') → 45 |
| SIN | Sine of angle | SIN(0) → 0 |
| SUBSTR | Substring (nope with sqlserver) | SUBSTR('hello', 2, 3) → 'ell' |
| SUBSTRING | Extract substring (compatible w all) | SUBSTRING('hello', 2, 3) → 'ell' |
| SUBSTRING(ansi) | ANSI standard substring | SUBSTRING('hello' FROM 2 FOR 3) → 'ell' |
| SUM | Sum of values | SUM(amount) → 1500.50 |
| TAN | Tangent of angle | TAN(0) → 0 |
| TO_CHAR(dates) | Format date as string | TO_CHAR(date, 'YYYY-MM-DD') → '2026-02-17' |
| TRIM | Remove spaces | TRIM(' hello ') → 'hello' |
| UPPER | Upper case | UPPER('france') → 'FRANCE' |
| YEAR | Year from date | YEAR('2026-02-17') → 2026 |
sqlbolt tutorial
1-5) SELECT queries 101
WHERE and FROM to get started + operators
WHEREclause for constraints
SELECT column, another_column, …
FROM mytable
WHERE condition
AND/OR another_condition
AND/OR …;| Operator | Condition | SQL Example |
|---|---|---|
| =, !=, <, ⇐, >, >= | Standard numerical operators | col_name != 4 |
| BETWEEN … AND … | Number is within range of two values (inclusive) | col_name BETWEEN 1.5 AND 10.5 |
| NOT BETWEEN … AND … | Number is not within range of two values (inclusive) | col_name NOT BETWEEN 1 AND 10 |
| IN (…) | Number exists in a list | col_name IN (2, 4, 6) |
| NOT IN (…) | Number does not exist in a list | col_name NOT IN (1, 3, 5) |
| Operator | Condition | Example |
|---|---|---|
| = | Case sensitive exact string comparison (notice the single equals) | col_name = “abc” |
| != or <> | Case sensitive exact string inequality comparison | col_name != “abcd” |
| LIKE | Case insensitive exact string comparison | col_name LIKE “ABC” |
| NOT LIKE | Case insensitive exact string inequality comparison | col_name NOT LIKE “ABCD” |
| % | Used anywhere in a string to match a sequence of zero or more characters (only with LIKE or NOT LIKE) | col_name LIKE “%AT%” (matches “AT”, “ATTIC”, “CAT” or even “BATS”) |
| _ | Used anywhere in a string to match a single character (only with LIKE or NOT LIKE) | col_name LIKE “AN_” (matches “AND”, but not “AN”) |
| IN (…) | String exists in a list | col_name IN (“A”, “B”, “C”) |
| NOT IN (…) | String does not exist in a list | col_name NOT IN (“D”, “E”, “F”) |
ORDER BY and LIMIT and OFFSET for filtering
- Filtering and sorting Query results
SELECT DISTINCT column, another_column, …
FROM mytable
WHERE condition(s)
ORDER BY column ASC/DESC
LIMIT num_limit OFFSET num_offset;IS NULL
WHERE column IS/IS NOT NULL6-7) Multi-table queries with JOINs
JOIN and ON for default (INNER) JOIN
Tables that share information about a single entity need to have a primary key that identifies that entity uniquely across the database. One common primary key type is an auto-incrementing integer (because they are space efficient), but it can also be a string, hashed value, so long as it is unique.
SELECT column, another_table_column, …
FROM mytable
INNER JOIN another_table
ON mytable.id = another_table.id
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;The
INNER JOINis a process that matches rows from the first table and the second table which have the same key (as defined by theONconstraint) to create a result row with the combined columns from both tables. After the tables are joined, the other clauses we learned previously are then applied.
INNER JOIN=JOINcause it’s the default one!
LEFT/RIGHT/FULL for OUTER JOIN
- vs.
OUTER JOIN=LEFT/RIGHT/FULLJOINs
If the two tables have asymmetric data, which can easily happen when data is entered in different stages, then we would have to use a
LEFT JOIN,RIGHT JOINorFULL JOINinstead to ensure that the data you need is not left out of the results.
SELECT column, another_table_column, …
FROM mytable
INNER/LEFT/RIGHT/FULL JOIN another_table
ON mytable.id = another_table.matching_id
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;Like the
INNER JOINthese three new joins have to specify which column to join the data on.
LEFT JOIN: When joining table A to table B, aLEFT JOINsimply includes rows from A regardless of whether a matching row is found in B.RIGHT JOIN: … the same, but reversed, keeping rows in B regardless of whether a match is found in A.FULL JOIN: rows from both tables are kept, regardless of whether a matching row exists in the other table.
9-11) Queries with expressions, aggregates
AS for aliases and expressions
SELECT col_expression AS expr_description, …
FROM mytable;
SELECT particle_speed / 2.0 AS half_particle_speed
FROM physics_data
WHERE ABS(particle_position) * 10.0 > 500;
SELECT column AS better_column_name, …
FROM a_long_widgets_table_name AS mywidgets
INNER JOIN widget_sales
ON mywidgets.id = widget_sales.widget_id;AGG_FUNC and GROUP BY for aggregates
SELECT AGG_FUNC(column_or_expression) AS aggregate_description, …
FROM mytable
WHERE constraint_expression
GROUP BY column;- The
GROUP BYclause works by grouping rows that have the same value in the column specified.- If you don’t add this, the result can be much different cause it assumes operation on all the table!
| Function | Description |
|---|---|
| COUNT(*), COUNT(column) | A common function used to counts the number of rows in the group if no column name is specified. Otherwise, count the number of rows in the group with non-NULL values in the specified column. |
| MIN(column) | Finds the smallest numerical value in the specified column for all rows in the group. |
| MAX(column) | Finds the largest numerical value in the specified column for all rows in the group. |
| AVG(column) | Finds the average numerical value in the specified column for all rows in the group. |
| SUM(column) | Finds the sum of all numerical values in the specified column for the rows in the group. |
GROUP BY and HAVING for additional conditions
Luckily, SQL allows us to do this by adding an additional
HAVINGclause which is used specifically with theGROUP BYclause to allow us to filter grouped rows from the result set.
The
HAVINGclause constraints are written the same way as theWHEREclause constraints, and are applied to the grouped rows. With our examples, this might not seem like a particularly useful construct, but if you imagine data with millions of rows with different properties, being able to apply additional constraints is often necessary to quickly make sense of the data.
SELECT group_by_column,
AGG_FUNC(column_expression) AS aggregate_result_alias, …
FROM mytable
WHERE condition
GROUP BY column
HAVING group_condition;If you aren’t using the
GROUP BYclause, a simpleWHEREclause will suffice.
12) Order of execution of a Query
SELECT DISTINCT column, AGG_FUNC(column_or_expression), …
FROM mytable
JOIN another_table
ON mytable.column = another_table.column
WHERE constraint_expression
GROUP BY column
HAVING constraint_expression
ORDER BY column ASC/DESC
LIMIT count OFFSET COUNT;… Because each part of the query is executed sequentially, it’s important to understand the order of execution so that you know what results are accessible where.
13-15) Row operations
We previously described a table in a database as a
- two-dimensional set of rows and columns,
- with the columns being the properties and
- the rows being instances
- of the entity in the table.
In SQL, the database schema is what describes the structure of each table, and the datatypes that each column of the table can contain. (e.g. Integer, Strong, …)
This fixed structure is what allows a database to be efficient, and consistent despite storing millions or even billions of rows.
INSERT to add rows
When inserting data into a database, we need to use an
INSERTstatement, which declares which table to write into, the columns of data that we are filling, and one or more rows of data to insert. In general, each row of data you insert should contain values for every corresponding column in the table. You can insert multiple rows at a time by just listing them sequentially.
INSERT INTO mytable
VALUES (value_or_expr, another_value_or_expr, …),
(value_or_expr_2, another_value_or_expr_2, …),
…;In some cases, if you have incomplete data and the table contains columns that support default values, you can insert rows with only the columns of data you have by specifying them explicitly.
INSERT INTO mytable
(column, another_column, …)
VALUES (value_or_expr, another_value_or_expr, …),
(value_or_expr_2, another_value_or_expr_2, …),
…;UPDATE and SET to update rows
In addition to adding new data, a common task is to update existing data, which can be done using an
UPDATEstatement. Similar to theINSERTstatement, you have to specify exactly which table, columns, and rows to update. In addition, the data you are updating has to match the data type of the columns in the table schema.
UPDATE mytable
SET column = value_or_expr,
other_column = another_value_or_expr,
…
WHERE condition;Most people working with SQL will make mistakes updating data at one point or another. Whether it’s updating the wrong set of rows in a production database, or accidentally leaving out the
WHEREclause (which causes the update to apply to all rows), you need to be extra careful when constructingUPDATEstatements.
One helpful tip is to always write the constraint first and test it in a
SELECTquery to make sure you are updating the right rows, and only then writing the column/value pairs to update.
DELETE FROM to delete rows
DELETE FROM mytable
WHERE condition;16-18) Table operations
CREATE TABLE IF NOT EXISTS to create tables
CREATE TABLE IF NOT EXISTS mytable (
column DataType TableConstraint DEFAULT default_value,
another_column DataType TableConstraint DEFAULT default_value,
…
);Here’s an example
CREATE TABLE movies (
id INTEGER PRIMARY KEY,
title TEXT,
director TEXT,
year INTEGER,
length_minutes INTEGER
);Table data types
| Data type | Description |
|---|---|
INTEGER, BOOLEAN | The integer datatypes can store whole integer values like the count of a number or an age. In some implementations, the boolean value is just represented as an integer value of just 0 or 1. |
FLOAT, DOUBLE, REAL | The floating point datatypes can store more precise numerical data like measurements or fractional values. Different types can be used depending on the floating point precision required for that value. |
CHARACTER(num_chars), VARCHAR(num_chars), TEXT | The text based datatypes can store strings and text in all sorts of locales. The distinction between the various types generally amount to underlaying efficiency of the database when working with these columns. Both the CHARACTER and VARCHAR (variable character) types are specified with the max number of characters that they can store (longer values may be truncated), so can be more efficient to store and query with big tables. |
DATE, DATETIME | SQL can also store date and time stamps to keep track of time series and event data. They can be tricky to work with especially when manipulating data across timezones. |
BLOB | Finally, SQL can store binary data in blobs right in the database. These values are often opaque to the database, so you usually have to store them with the right metadata to requery them. |
Table constraints
| Constraint | Description |
|---|---|
PRIMARY KEY | This means that the values in this column are unique, and each value can be used to identify a single row in this table. |
AUTOINCREMENT | For integer values, this means that the value is automatically filled in and incremented with each row insertion. Not supported in all databases. |
UNIQUE | This means that the values in this column have to be unique, so you can’t insert another row with the same value in this column as another row in the table. Differs from the PRIMARY KEY in that it doesn’t have to be a key for a row in the table. |
NOT NULL | This means that the inserted value can not be NULL. |
CHECK (expression) | This allows you to run a more complex expression to test whether the values inserted are valid. For example, you can check that values are positive, or greater than a specific size, or start with a certain prefix, etc. |
FOREIGN KEY | This is a consistency check which ensures that each value in this column corresponds to another value in a column in another table. For example, if there are two tables, one listing all Employees by ID, and another listing their payroll information, the FOREIGN KEY can ensure that every row in the payroll table corresponds to a valid employee in the master Employee list. |
ALTER to alter tables
As your data changes over time, SQL provides a way for you to update your corresponding tables and database schemas by using the
ALTER TABLEstatement to add, remove, or modify columns and table constraints.
ALTER TABLE mytable
ADD column DataType OptionalTableConstraint
DEFAULT default_value;ADD to add column
ALTER TABLE mytable
ADD column DataType OptionalTableConstraint
DEFAULT default_value;DROP to remove column
ALTER TABLE mytable
DROP column_to_be_deleted;RENAME to rename table
ALTER TABLE mytable
RENAME TO new_table_name;DROP TABLE IF EXISTS to drop tables
DROP TABLE IF EXISTS mytable;19) Subqueries
Use
WHEREclause to subquery things
- They are basically nested queries, they can be referenced inside:
- FROM, JOIN, WHERE or HAVING
Here’s an example
- FROM, JOIN, WHERE or HAVING
”… find out which of your Associates are costing the company more than the average revenue brought per Associate”
SELECT *
FROM sales_associates
WHERE salary >
(SELECT AVG(revenue_generated)
FROM sales_associates);Correlated subqueries
a.k.a. “synchronised sub-query”
A more powerful type of subquery is the correlated subquery in which the inner query references, and is dependent on, a column or alias from the outer query. Unlike the subqueries above, each of these inner queries need to be run for each of the rows in the outer query, since the inner query is dependent on the current outer query row.
Here’s an example
“Instead of the list of just Sales Associates above, imagine if you have a general list of Employees, their departments (engineering, sales, etc.), revenue, and salary. This time, you are now looking across the company to find the employees who perform worse than average in their department.
For each employee, you would need to calculate their cost relative to the average revenue generated by all people in their department. To take the average for the department, the subquery will need to know what department each employee is in:“
SELECT *
FROM employees
WHERE salary >
(SELECT AVG(revenue_generated)
FROM employees AS dept_employees
WHERE dept_employees.department = employees.department);WHERE and IN… existence tests
SELECT *, …
FROM mytable
WHERE column
IN/NOT IN (SELECT another_column
FROM another_table);
20) Unions, intersections and exceptions
When working with multiple tables, the
UNIONandUNION ALLoperator allows you to append the results of one query to another assuming that they have the same column count, order and data type. If you use theUNIONwithout theALL, duplicate rows between the tables will be removed from the result.
SELECT column, another_column
FROM mytable
UNION / UNION ALL / INTERSECT / EXCEPT
SELECT other_column, yet_another_column
FROM another_table
ORDER BY column DESC
LIMIT n;Similar to the
UNION, theINTERSECToperator will ensure that only rows that are identical in both result sets are returned, and theEXCEPToperator will ensure that only rows in the first result set that aren’t in the second are returned. This means that theEXCEPToperator is query order-sensitive, like theLEFT JOINandRIGHT JOIN.
Both
INTERSECTandEXCEPTalso discard duplicate rows after their respective operations, though some databases also supportINTERSECT ALLandEXCEPT ALLto allow duplicates to be retained and returned.
select star sql tutorial
Chapter 1
table: executions
columns:
- first_name, last_name, ex_number, ex_age, ex_date, county, last_statement,
Even though we’ve capitalized
SELECT,FROMandLIMIT, SQL commands are not case-sensitive. You can see that the code editor recognizes them and formats them as a command no matter the capitalization. Nevertheless, I recommend capitalizing them to differentiate them from column names, table names and variables.
Column names, table names and variables are also not case-sensitive in this version of SQL, though they are case-sensitive in many other versions. To be on the safe side, I recommend always assuming they are case-sensitive.
SQL can evaluate the truth-value of almost anything →
WHERE 1is valid and will return everything
This is because SQL is doing integer division. To do decimal division, at least one of the operands must be a decimal, for instance
51.0 / 2. A common trick is to multiply one number by1.0to convert it into a decimal. This will come in useful in the later chapters.
Chapter 2
CASE
WHEN <clause> THEN <result>
WHEN <clause> THEN <result>
...
ELSE <result>
ENDHere’s a weird example:
SELECT
COUNT(CASE WHEN county='Harris' THEN 1
ELSE NULL END),
COUNT(CASE WHEN county='Bexar' THEN 1
ELSE NULL END)
FROM executionsDialects
Although we’ve called this a book about SQL, if we want to be pedantic it really is a book about SQLite. This is because SQL is an imaginary concept, a platonic ideal. In reality, there are only dialects that try to conform to the SQL specifications.
SQL is also under-specified, meaning that some functionality is not specified by the standards. For instance, the standards don’t say whether the string length-finding function should be called
LEN(SQL Server) orLENGTH(SQLite); or how identifiers like table or column names should be quoted (`in MySQL,"in SQLite).
Here’s how to count the proportion of something:
SELECT COUNT(CASE WHEN last_statement LIKE '%innocent%' THEN 1
ELSE NULL END) * 100.0 / COUNT(*)
FROM executionsChapter 3
SELECT
last_statement IS NOT NULL AS has_last_statement,
county,
COUNT(*)
FROM executions
GROUP BY has_last_statement, county- you can GROUP BY multiple columns!
and now adding some HAVING extra condition on the GROUP BY
SELECT county, COUNT(*)
FROM executions
WHERE ex_age >= 50
GROUP BY county
HAVING COUNT(*) > 2;Some examples of nested queries
SELECT first_name, last_name
FROM executions
WHERE LENGTH(last_statement) =
(SELECT MAX(LENGTH(last_statement))
FROM executions)SELECT
county,
100.0 * COUNT(*) / (SELECT COUNT(*) FROM executions)
AS percentage
FROM executions
GROUP BY county
ORDER BY percentage DESC→ necessary whenever the things you want to retrieve cannot be processed with the main / outer query
Such a function would need to aggregate both within the groups (to get the numerator) and throughout the dataset (to get the denominator).
But each query either has aGROUP BYblock or doesn’t. So what we really need are two separate queries, one which aggregates with aGROUP BYand another that aggregates without. We can then combine them using a technique called “nesting”.
To reiterate, nesting is necessary here because in the
WHEREclause, as the computer is inspecting a row to decide if its last statement is the right length, it can’t look outside to figure out the maximum length across the entire dataset. We have to find the maximum length separately and feed it into the clause. Now let’s apply the same concept to find the percentage of executions from each county.
Chapter 4
<table1> JOIN <table2> ON <clause>
When joining tables that are derived from the same table, this is called a “self join” and is a powerful technique for allowing rows to get information from other parts of the same table.
SELECT
last_ex_date AS start,
ex_date AS end,
JULIANDAY(ex_date) - JULIANDAY(last_ex_date)
AS day_difference
FROM executions
JOIN (SELECT ex_number + 1 AS ex_number,
ex_date AS last_ex_date
FROM executions
WHERE ex_number < 553) previous
ON executions.ex_number = previous.ex_number
ORDER BY day_difference DESC
LIMIT 10which is equivalent to
SELECT
previous.ex_date AS start,
executions.ex_date AS end,
JULIANDAY(executions.ex_date) - JULIANDAY(previous.ex_date)
AS day_difference
FROM executions
JOIN executions previous
ON executions.ex_number = previous.ex_number + 1
ORDER BY day_difference DESC
LIMIT 10The big idea behind
JOINs has been to create an augmented table because the original didn’t contain the information we needed. This is a powerful concept because it frees us from the limitations of a single table and allows us to combine multiple tables in potentially complex ways. We’ve also seen that with this extra complexity, meticulous bookkeeping becomes important. Aliasing tables, renaming columns and defining goodJOIN ONclauses are all techniques that help us maintain order.
Chapter 5
Some of the major ones were:
- Comparing the shape of available tables with the desired result to decide what aggregation to do.
- Examining where the information we require resides. Is it in an adjacent row? Or a group? Or the entire dataset?
- Interpreting queries in the right logical structure. For instance, viewing clauses as essentially a true or false; viewing
<table1> JOIN <table2> ON ...as a great big table.
❗ The last exercices are less straightforward.
sqlzoo tutorial
-
✅ SELECT basics
- ✅ quiz
-
✅ SELECT from world
- ✅ quiz
-
✅ SELECT from nobel
- ✅ quiz
-
✅ SELECT in SELECT
- ✅ quiz
-
✅ SUM and COUNT
- ✅ quiz
-
✅ JOIN
- ✅ quiz
-
✅ More JOIN
- ✅ quiz
-
✅ Using NULL
- ✅ quiz
-
✅ Self JOIN
- ✅ quiz
-
ALL/ANY/INcomparisons: make sure there isn’t any NULL value otherwise, the result = NULL!
SELECT name
FROM world
WHERE gdp > ALL(SELECT gdp
FROM world
WHERE continent = 'Europe'
AND gdp IS NOT NULL)- correlated sub-query =
tables in inner and outer SELECT are the same,
values from outer are used in inner SELECT
SELECT continent, name, area
FROM world x
WHERE area = (SELECT MAX(area) FROM world y
WHERE y.continent=x.continent)- To review…
SELECT DISTINCT continent
FROM world w1
WHERE NOT EXISTS (
SELECT 1
FROM world w2
WHERE w2.continent = w1.continent
AND w2.population > 25000000
)SELECT name, continent
FROM world x
WHERE NOT EXISTS (
SELECT 1
FROM world y
WHERE x.continent = y.continent
AND x.name <> y.name
AND x.population <= 3 * y.population
)GROUP BYyou NEED to include all non-aggregate SELECT columns in GROUP BY
SELECT game.id, game.mdate, COUNT(goal.matchid) AS goals_scored
FROM game
JOIN goal ON goal.matchid = game.id
WHERE game.team1 = 'POL' OR game.team2 = 'POL'
GROUP BY game.id, game.mdate- Those self joins … what a nightmare
thoughtspot tutorial
Basic SQL
✔ 15 lessons
Intermediate SQL
20 lessons
SQL Analytics Training
8 lessons
Advanced SQL
9 lessons
resources
Learn
- https://sqlbolt.com/
- https://selectstarsql.com/
- https://sqlzoo.net/wiki/SQL_Tutorial
- https://www.thoughtspot.com/sql-tutorial
Documentation
Practice