{
"cells": [
{
"cell_type": "markdown",
"id": "76cc981d",
"metadata": {},
"source": [
"# SQL Wizard\n",
"\n",
"This notebook is designed to teach you the language of SQL. SQL stands for Structured Query Language. It is a language designed to query databases. These queries allow you to pull data from the database, update rows in the database, and even delete data permanently. There are many different SQL databases, each of which can all have different plugins to enhance their capabilities even further. Remember that apartment searching tool I made for you? That's all using SQL :) It just also used a plugin meant for operating on geographic locations.\n",
"\n",
"The most common SQL database is SQLite. It is a lightweight library which any program can embed within itself to give it access to a SQL database. Most other databases are separate servers which must be run as an entirely separate software hosted on a sever somewhere. Because of this SQLite is very easy to get started with.\n",
"\n",
"Run the following cell to install SQL capabilities into Jupyter:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ae40bab8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mDeploy AI and data apps for free on Ploomber Cloud! Learn more: https://docs.cloud.ploomber.io/en/latest/quickstart/signup.html\u001b[0m\n"
]
},
{
"data": {
"text/html": [
"Tip: You may define configurations in /home/octotep/projects/jenn-learning/pyproject.toml or /home/octotep/.jupysql/config. "
],
"text/plain": [
"Tip: You may define configurations in /home/octotep/projects/jenn-learning/pyproject.toml or /home/octotep/.jupysql/config. "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Please review our configuration guideline."
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"'poetry' is an invalid configuration. Please review our configuration guideline."
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "markdown",
"id": "18ac848a",
"metadata": {},
"source": [
"### Getting started with SQL\n",
"\n",
"Before we can get started with SQLite, let's first open a database! This is something you only need to do once in Jupyter since it will keep the database open for you. Run the following cell to open the database called \"properties.db\", which as you'll find out contains information about every Railey property at Deep Creek Lake.\n",
"\n",
"SQL databases all generally follow this model: Every database file can contain multiple \"tables\", and each table will contain rows and columns. If you know Excel this should be very familiar. Each Excel file can contain multiple sheet tabs at the bottom which allow you to store multiple spreadsheets in the same file. Those tabs at the bottom are equilavent to SQL tables."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "45e2c86e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"Connecting to 'sqlite:///properties.db'"
],
"text/plain": [
"Connecting to 'sqlite:///properties.db'"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%sql sqlite:///properties.db"
]
},
{
"cell_type": "markdown",
"id": "62ce9513-d9f9-4a1a-9d8d-8001ff98d285",
"metadata": {},
"source": [
"### Your first query\n",
"\n",
"SQL queries are made of individual statements. These statements start with a word which specifies what you'd like to do, such as `SELECT` to retrieve data, `UPDATE` to update existing rows, and `DELETE` to delete records from the database. Let's practice a SELECT statement.\n",
"\n",
"The syntax of `SELECT` is as follows: `SELECT FROM `. SQL also has a shorthand for selecting all columns: `SELECT * FROM `\n",
"\n",
"One last thing before we dive in! To make a Jupyter notebook cell run SQL, you must start the cell with `%%sql`. Without this, you'll likely get some errors since Jupyter will try to run it as SQL code. I will include this for you to make your life easier but you should know why it's there :)\n",
"\n",
"Additionally some queries will return a lot of data, so you may want to click on the area to the left of the returned data to shrink it "
]
},
{
"cell_type": "markdown",
"id": "0fda3705-2abe-497d-9fd4-389d7f44b428",
"metadata": {},
"source": [
"### Exercise 1: Getting started with SELECT\n",
"\n",
"As a first exercise, try selecting all columns from the table `all_railey_properties`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf28b6f1-c215-405c-afbf-c65cac1c2b7c",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%%sql\n",
"# Add your select statement here"
]
},
{
"cell_type": "markdown",
"id": "6f2fc505-8403-401e-b409-c9dacfc40911",
"metadata": {},
"source": [
"You'll notice that for your display convenience only ten results are displayed to you. Still, this query does select every row, but it would flood the notebook with results if it return all 500+ so the software doesn't allow that by default."
]
},
{
"cell_type": "markdown",
"id": "b36ab734-88f9-4db5-8a92-2d6d990362c9",
"metadata": {},
"source": [
"## Basic SQL functions\n",
"\n",
"SQL comes with some functions which allow you to calculate various measures. The three we will look at are `COUNT`, `SUM`, and `AVG`. For these explainations, let's imagine we'll be working with a hypothetical dataset of taxi cab rides. They will be stored in a table named `cab_rides` and have columns like `fare`, `date`, `time`, `trip_miles`, etc.\n",
"\n",
"`COUNT` will return the number of rows which are selected by the `SELECT` statement. Usage looks like this: `SELECT COUNT(*) FROM cab_rides`. The count function can go around any column name, but for convenience usually people will just write it with the all column shorthand so you don't have to remember the name of a column to use it.\n",
"\n",
"`SUM` will add up all values from the specified column. For example, with our hypothetical dataset, you could then write a query like this to get the total amount of miles driven across all taxi cab trips: `SELECT SUM(trip_miles) FROM cab_rides`. Note that for this function, you'll want to supply it with an actual column name because SUM(*) would likely give you an error because not all the columns can be summed (some contain strings or other values). Additionally, even if possible the question \"What is the total amount of all miles driven plus the total fare and tip amount across all rides?\" is not a particularly useful answer to acquire.\n",
"\n",
"`AVG` acts like `SUM` but instead calculates the mean of a dataset. All the rows will be `SUM`med together and then divided by the `COUNT` to get the average. In our above hypothetical dataset, you could instead write `SELECT AVG(trip_miles) FROM cab_rides` to get the average number of miles in a taxi cab ride. "
]
},
{
"cell_type": "markdown",
"id": "113318ff-35de-40c9-8320-3a9085da2313",
"metadata": {},
"source": [
"### Example 2: COUNT your chickens before they hatch\n",
"\n",
"Try a simple SELECT statement using COUNT to see just how many properties are in the `all_railey_properties` table:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "503fbe8b-0ffd-4f7b-98da-60840e02b4ac",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"# Your SQL goes here"
]
},
{
"cell_type": "markdown",
"id": "e8ca4747-ed04-4d90-91aa-084effd61e0e",
"metadata": {},
"source": [
"### Example 3: SUM it up for me\n",
"\n",
"Try a SELECT statement using SUM to figure out how many total reviews there are across all properties in the `all_railey_properties` table.\n",
"\n",
"Hint: The column name which contains the reviews is `reviews_count`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa652541-088a-4178-8462-1de4e0e4eb2f",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"# Your SQL goes here"
]
},
{
"cell_type": "markdown",
"id": "812de74c-ca9d-4d96-90ee-fca2391f4eae",
"metadata": {},
"source": [
"### Example 4: Law of AVGs\n",
"\n",
"Try one last SELECT statement using AVG to determine what the average review rating is across all properties in the `all_railey_properties` table. Hint: you have to figure out the correct column name for this yourself"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25738ff2-ee49-496d-802f-40cc6367d10a",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"# Your SQL goes here"
]
},
{
"cell_type": "markdown",
"id": "e9ad82b7",
"metadata": {},
"source": [
"## Filtering and sorting\n",
"\n",
"Great job learning about SELECT and the three basic SQL functions! We'll be using them a lot more below so I'm glad your becoming a little sql pro :)\n",
"\n",
"Next, we're going to look at different ways to filter and sort the results that are returned to you by SELECT which will let you make deeper insights into the data you work with.\n",
"\n",
"The first tool we will look at is WHERE. WHERE is a clause which comes at the end of a SELECT statement. Using our hypothetical taxi cab ride example, you might want to ask the database for all trips that were longer than 5 miles to do further analysis on. You could ask it for those trips with a query like: `SELECT * FROM cab_rides WHERE trip_miles > 5`. The WHERE clause takes a boolean expression involving a column name, and evaluates it on every row to determine whether it should be returned or not. SQL can use all the expected boolean operators, like `=` for equality, `>`, `<`, `<=`, `>=`, `!=`, etc"
]
},
{
"cell_type": "markdown",
"id": "1d2b0010-f049-4ff9-925e-8b84eba0aaa1",
"metadata": {},
"source": [
"### Exercise 5: WHERE is my hairbrush?\n",
"\n",
"Let's practice WHERE clauses by SELECTing all rows which have a minimum renting age which is greater than 24."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cc8f541-cc08-490e-abd2-e46d72062c1d",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"# Your SQL goes here"
]
},
{
"cell_type": "markdown",
"id": "51787413-b452-4c8f-97ce-fd8983f2300b",
"metadata": {},
"source": [
"### I LIKE you <3\n",
"\n",
"SQL also has some other boolean operators besides the classics. The most important of these which we'll cover is LIKE. LIKE allows you to search for strings which match a particular pattern. As an example, take our hypothetical taxi example from before. Imagine we have a column called `destination_address`. We could use a LIKE statement to search for all destinations which are on a particular street. I'll show you what it looks like and then we'll go from there:\n",
"\n",
"`SELECT * FROM cab_rides WHERE destination_address LIKE '%Main St%'`\n",
"\n",
"This SQL query will match entries like \"122 Main Street\", \"57 Main St.\", and \"3232 Main Street\". The `%` sign means match any characters. So here we tell the database to search for address which start with any characters, then are followed by \"Main St\", and finally has any number of characters at the end. This can be useful to have some basic search engine type functionality."
]
},
{
"cell_type": "markdown",
"id": "c4a3a89c-448c-41cc-82e7-0b911def9fc6",
"metadata": {},
"source": [
"### Exercise 6: WHERE is the person I LIKE?\n",
"\n",
"Now use your knowledge of LIKE to write a SQL query to pull out all deep creek homes which have \"Lake\" in the house's `name` column."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "659ee93a-e99a-4168-8c51-53e385b092d3",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"# Your SQL goes here"
]
},
{
"cell_type": "markdown",
"id": "b61ec4d0-b45a-4ec5-b47e-3732e4df086b",
"metadata": {},
"source": [
"### Sorting with ORDER BY\n",
"\n",
"Right now the SQL results returned to you are unsorted, but you can absolutely choose the sorting method that the database can use before it returns the rows to you. To accomplish this with SQL one should use the ORDER BY directive at the end of the SELECT statement. Imagine I would like to sort all hypothetical taxi rides by total_miles_traveled when they are returned. I can write a query which looks like this: `SELECT * FROM cab_rides ORDER BY total_miles_traveled DESC`. After you specify the column name you'd like to use to sort, you can also specify ASC or DESC to sort in ascending or decending order. By defualt, SQL databases with use an ascending sort but you can change that.\n",
"\n",
"You can also specify multiple ORDER BY clauses to sort by multiple columns at the same time, just like in excel. For example you could hit the SQL database with one of these to first sort by last name and then sort by first name when pulling records out: `SELECT * FROM cab_rides ORDER BY last_name ORDER BY first_name`. "
]
},
{
"cell_type": "markdown",
"id": "5dbcfa6c-cb23-4d88-b208-386c3b20c81e",
"metadata": {},
"source": [
"### Exercise 7: Where are the beds?\n",
"\n",
"Let's try another exercise, this time to sort the results by the amount of `beds` they have. Make sure you sort it so that we see the properties with the largest number of beds."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5677985f-7a94-4df6-adc9-c5c0ebad48f0",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"# Your SQL goes here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": ".venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
},
"toc-autonumbering": true,
"toc-showmarkdowntxt": false
},
"nbformat": 4,
"nbformat_minor": 5
}