Placeholder Image

Subtitles section Play video

  • In skilled hands, the SELECT statement seems like magic.

  • With just the right query, you can reach into a table... hunt around... and voila!

  • Find the exact data you desire.

  • This is the power of SQL.

  • So hang on to your hat and / or glasses, because we are going to take a wild ride

  • with SELECT queries...

  • A “SELECTquery is used to retrieve data from one or more tables.

  • The first part of a SELECT query is to list the columns you want data for.

  • If you do not want to specify the columns, you can use an asterisk to retrieve all columns.

  • Next, write a FROM clause to list the tables you want data from.

  • You can provide a list of restrictions on the data to retrieve with a WHERE clause.

  • Here, you specify the conditions that must be met by the data in order to be included

  • in the results.

  • If you have more than one condition, you usually separate them withAND” ... orOR”.

  • And finally, you can SORT the data returned by your query.

  • For example, to order the data by the values in a specific column, writeORDER BY

  • followed by the column name.

  • By default, it will sort in ascending order.

  • But if you would prefer to be explicit, you can enterASCto sort in ascending order

  • orDESCto sort in descending order.

  • These are the primary clauses in a “SELECTquery: SELECT, FROM, WHERE, and ORDER BY.

  • The table we will use today is theearthquaketable.

  • The SQL database I will be using is Postgres with the pgAdmin GUI.

  • You can download the data used in this video from Github.

  • If you select all the data from the earthquake table, you can see the columns available.

  • There are 10 columns describing each earthquake.

  • Note that the first column is NOT data from the table.

  • Much like a spreadsheet, this column numbers the rows.

  • Now, what we just did was a bit risky.

  • What if this table had several million rows?

  • Our SELECT query asked for all data so we could look at the columns.

  • But when you think about it using your brain, we only needed to select 1 row to see the columns.

  • So let us start over and SELECT all columns from the earthquake data, but LIMIT our results

  • to 1 row with the LIMIT statement.

  • Much better.

  • Let us now use SELECT queries to explore the earthquake table.

  • To begin, how many rows are in this table?

  • This can be answered by selecting a count from the earthquake table.

  • This table contains information on 23,119 earthquakes.

  • COUNT is a function.

  • Some other common and useful SQL functions are MIN, MAX, AVERAGE, and SUM.

  • Let us use the MIN and MAX functions to find out the timespan covered by this table.

  • Theoccured_oncolumn is a timestamp marking when the earthquake happened.

  • In a single query, we can select the minimum and maximum values in theoccurred_oncolumn.

  • Execute

  • We see that the earthquake table includes quakes from 1969 through 2018.

  • A half century of rumbles and tumbles...

  • What about the magnitude?

  • Does the table include ALL earthquakes?

  • Again, we can use functions to answer this question.

  • Let us select the minimum and maximum magnitudes.

  • Execute

  • The smallest earthquake in this table measured 5.5 on the Richter scale.

  • And sadly, the largest event was a terrifying 9.1 earthquake.

  • This provides us with a quick and accurate summary of this table.

  • The earthquake table contains 10 columns

  • Information on 23,119 quakes of magnitude 5.5 or greater covering the 50 year period

  • from 1969 through 2018.

  • Next, let us investigate thecausecolumn.

  • One might be tempted to select thecausedata from the table.

  • Execute

  • Houston, we have a problem.

  • As we scroll through, we see the query returned 23,119 results with a LOT of duplicates.

  • We can fix this with the SELECT DISTINCT statement.

  • TheDISTINCTkeyword makes sure we will not see duplicate rows.

  • Execute

  • Well isn’t that a kick in the pants.

  • Nuclear tests can cause measurable earthquakes.

  • Let us count how many quakes were the result of each cause.

  • We first count the earthquakes that were caused by a natural earthquake.

  • We only want the countand we want to restrict the count to those events where the cause

  • isearthquake.”

  • Execute

  • 22,942…

  • Well, we can be thankful that most of the quakes were due to good old-fashioned tectonic

  • activity.

  • Next, count the number of earthquakes due to nuclear explosions.

  • Execute

  • 173…

  • Yowsers

  • Lastly, how many events were caused by anexplosion.”

  • Execute

  • Only 4 results.

  • Curious...

  • Since there are so few, let us look at the 4 rows.

  • After some searching, I learned that all four of these earthquakes were also the result

  • of a nuclear test.

  • Two by the Soviet Union, and two by France.

  • This example illustrates a common problem with databases.

  • The data may not be perfect.

  • These four rows should have a cause ofnuclear explosionand not justexplosion.”

  • Let us now construct a query to find the details about the most recent earthquake caused by

  • a nuclear explosion.

  • We will select the place

  • magnitudeand occurred_on columns.

  • The cause will benuclear explosion’.

  • To find the most recent event, order by theoccurred_oncolumn in descending order.

  • We are only interested in the most recent event, so limit our query to 1.

  • Execute

  • And there it is

  • Let us answer a different question using SQL.

  • What were the 10 largest earthquakes in this time period?

  • Again, we select the place, magnitude, and occurred_on columns.

  • FROM the earthquake table

  • To find the largest quakes, ORDER the data by magnitude in descending order

  • And LIMIT our query to 10 rows.

  • Execute

  • This is a powerful and destructive list.

  • The earthquake off the coast of Indonesia was one of the most destructive earthquakes

  • in history causing the catastrophic boxing day tsunami.

  • The 9.1 quake off the coast of Japan also triggered a devastating tsunami.

  • And as typically happens with quakes of these magnitude, it spawned numerous aftershocks.

  • How can we count the number of aftershocks?

  • One solution is to search for all earthquakes that contain bothHonshuandJapan

  • in the PLACE column... that occurred within a week of the March 11 quake.

  • To do this, select a countfrom the earthquake table

  • WHERE the place is LIKE ‘%Honshu%Japan%’

  • Here is a new operator.

  • In quotes, we specify a string pattern.

  • The % symbol matches zero or more characters.

  • So this pattern will match any place containing the locationHonshubefore the word

  • JapanNext, restrict the rows to earthquakes that

  • occurred one week from the primary earthquake.

  • To do this, we can use another operator calledBETWEEN”.

  • This will only select rows with dates between March 11 and March 18, 2011.

  • Run

  • There were 166 earthquakes of magnitude 5.5 or greater within a week of the 9.1 quake...

  • Today we have learned many new ways to build SELECT queries.

  • These queries described real events impacting millions of people.

  • Today, scientists and engineers are taking this data and building early warning systems.

  • A few seconds warning before an earthquake can give people time to seek shelter.

  • And a few MINUTES warning of an incoming tsunami can give people time to retreat to safer ground.

  • SQL is a powerful tool for both exploring the past, and building a safer and brighter

  • tomorrow.

In skilled hands, the SELECT statement seems like magic.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it