Getting data out of databases is a common activity of many in data roles. It’s also a great data skill to be familiar with for promotion-hunting in many industries. (Dare you to name an industry that doesn’t use a database/data repository!?!).
Our community resources are stock full of SQL goodies, for the just-tell-me-the-basics to just-give-me-a-go-to-repo since searching the interweb machine, e.g., Google can have you watching lots of *unhelpful* videos, inundated with advertisements and looking at course-style exercises with no solutions. Check out these videos, articles, tutorials, and such to dip your toe in SQL waters.
Excel to SQL: Why Make the Shift
This data resource shares why SQL is needed when you’re an Excel user. I know, I know, SQL is yucky to many. Some find it boring and frustrating. Well, indulge me as I share my tech-happy place. Many people give the side-eye to SQL who are familiar with MS Excel. Excel can do A LOT, especially with the slate of plug-ins. Check out this short MS Excel vs SQL article. There are three main parts of SQL: building databases, retrieving rows from database tables, and maintaining databases
Select/From/Where Clauses
We’re walking through SELECT statements. SELECT statements can have upto 6 parts: select, from, where, group by, having, and order by. Each of them are keywords in database management systems like MySQL and Oracle.
Think of SQL SELECT statements as a way to shuffle data and present it to us in a format we can understand it. This week it’s about the first three main parts of an SQL query: the select, from, and where clauses.
- SELECT clause lists columns and/or expressions that are to be returned from the SQL query
- FROM clause specifies which database table(s) the data exists
- WHERE clause tells the computer which conditions need to hold for a database row should be returned in the SQL query result
So what does this all mean?
We can write statements that return database rows according to the conditions you want. Consider the customer table with columns labeled: CustomerID, CustomerName, ContactName, Address, City, PostalCode, and Country. Ok, so here’s the first brief example:
SELECT CustomerName, City
FROM Customers;
When this SQL query is run, we’ll see a number of rows returned that contain the customer name and their corresponding city. This query isn’t as useful in the real world — it simply parrots back data that’s in the database table.
Here’s a bit more meaty example that you’ll see frequently:
SELECT CustomerName, City
FROM Customers
WHERE Country=’Mexico’;
Now, the SQL query results will display only the customers (their names and city) who live in Mexico. SQL statements using select, from, and where clauses can do a lot to filter the data in databases, but these are the basics.
GroupBy/Having & Order By Clauses
It’s all about the GROUP BY, HAVING, and ORDER BY clauses. Ok, these three clauses extend the power of retrieving matching rows from a database table. First up, let’s talk GROUP BY. You’ll want to use a GROUP BY clause to group rows that have the same values, especially when summarizing, like so:
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
This query counts the number of customers per country. That’s really convenient to obtain some general information about what data exists in the database table. So any country that has 1 or more customers will be returned. For a large database table, you’d retrieve many query results with the count being 1. And that’s not very useful.
What if you’d want to only know the countries with more than, say 100, customers? The SQL clause HAVING, always coupled with GROUP BY, gives us a way to retrieve the answer:
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;
Note that COUNT(CustomerID) appears in both the SELECT and HAVING clauses. This is necessary when writing HAVING clauses. Said a different way: If you have a condition on a SQL statement with a GROUP BY clause, you must use the HAVING clause, and the column that appears in the HAVING clause must also be in the SELECT clause. To help cement the use of HAVING clauses, here’s what doesn’t work:
SELECT COUNT(CustomerID), Country
FROM Customers
HAVING COUNT(CustomerID) > 5;
The GROUP BY clause is missing so this query is considered malformed. You’ll see an error if you try to run the above query. Also, the following SQL SELECT statement will throw at least 2 SQL errors:
SELECT COUNT(CustomerID), Country
FROM Customers
WHERE COUNT(CustomerID) > 5
HAVING COUNT(CustomerID) > 5;
The first SQL error happens in the WHERE clause — which doesn’t accept summary functions, like COUNT( ), as part of its querying format and rules. The second SQL error occurs at the HAVING clause since it appears without its partner, the GROUP BY clause. It’s good to know what doesn’t work and hopefully, it helps clarify what does work 🙂
Last up, let’s discuss the ORDER BY clause. It allows us to sort SQL query results in either ascending or descending order, like so:
SELECT CustomerName FROM Customers
ORDER BY Country;
This SQL query returns all customers, by name, in alphabetical order. Ascending order is the default for the ORDER BY clause. If you want descending order, you’ll have to be explicit, like so:
SELECT CustomerName FROM Customers
ORDER BY Country DESC;
Well, that covers a brief introduction to GROUP BY, HAVING and ORDER BY clauses. More advanced queries can be created that use multiple database tables in the FROM clause, include multiple filtering conditions in the WHERE clause, grouping by multiple columns in the GROUP BY and HAVING clauses, and issue more than one sort in the ORDER BY clause. SQL query is similar to coding in that way. But take one step at a time :- )
SQL Joins
The deep work of SQL happens in joining database tables. The intent of the JOIN clause is to combine rows from two or more tables, based on a related column between them.
Here are the different types of the JOINs in SQL (go to https://www.w3schools.com/sql/sql_join.asp for more details):
- (INNER) JOIN: Returns records that have matching values in both tables
- LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table
- RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table
- FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
The figure above gives you the SQL statement templates for writing SQL joins. It’s best to learn by doing so check out the w3schools website I’ve added above.
More Complex SQL Querying
You are encouraged to sign up for a Khan Academy account (it’s free!). Your progress can be tracked to earn challenge patches and badges. This instruction module shows you how to perform more advanced SQL queries using AND/OR, IN, LIKE, HAVING, and more.
- SQL basics: They”ll show you the basics of creating tables and selecting data in various different ways
- More advanced SQL queries: Learn how to perform more advanced SQL queries using AND/OR, IN, LIKE, HAVING, and more.
- Relational queries in SQL: Learn how to store related data in multiple tables and use joins to bring them together (inner joins, outer joins, and self joins).
- Modifying databases with SQL: Learn how to update data, delete data, and change table schemas with SQL commands UPDATE, DELETE, ALTER, and DROP.
- Further learning in SQL: Get ideas for how you can continue learning more about SQL and databases.