Hands-On SQL for Data Science — Lesson 3

We are continuing our journey to become SQL experts. Believe me it is not difficult to master as long as you invest your time.

The research shows that those who persist wins. The fact that your into the 3rd lesson shows that you are in the right track!

So far, we have explored the following:

We continue to use the same database from Lesson 1 and sqlite to run SQL queries.

From Lesson 1:

Download the database company.db that we are going to use for this lesson from here.

Make sure you have sqlite installed in your computer. Load the db into command prompt:

>sqlite3 comapny.db

You should see the prompt:

sqlite3>

We will be using the same employee table as before.

I encourage you to go through the first two lessons before you proceed with this lesson.

In the next lesson, we will dive into adding multiple conditions to the WHERE clause, and using LIMIT and ORDER BY clauses.

AND, OR and NOT Operators

We use these operators to have multiple conditions in our WHERE clause.

Find the female employees in department 1.

We have two conditions here: sex = ‘F’ and department_id = 1. We need to use AND operator as both conditions should be satisfied.

SELECT name 
FROM employee
WHERE sex = 'F' AND department_id = 1

Output:

Find the average salary of employees who are females or working for department 2.

We have an aggregate function to execute here: AVG

First, we need to find the conditions: It is an OR operator as our expression should be true if any of the two conditions (sex = ‘F’, department_id = 2) is true: sex = ‘F’ OR department_id = 2

Query:

SELECT AVG(salary)
FROM employee
WHERE sex = 'F' AND department_id = 2

Output:

How many employees living outside of zipcodes 45873 and 91111?

This requires a NOT operator.

First, let’s find the condition that identifies those living in 45873 or 91111. We need to use IN operator here.

zipcode IN (45873, 91111)

Since we want those who are not living there, we simply need to negate the above condition:

NOT zipcode IN (45873, 91111)

Since we need the count, we need to use the COUNT aggregate function here.

Query:

SELECT COUNT(*)
FROM employee
WHERE NOT zipcode IN (45873, 91111);

Output:

Note: Another way to run the above query:

SELECT COUNT(*)
FROM employee
WHERE NOT (zipcode = 45873 OR zipcode = 91111)

Yet another way:

SELECT COUNT(*)
FROM employee
WHERE NOT zipcode = 45873 AND NOT zipcode = 91111

(The above is from set theory: NOT (A OR B) = (NOT A) AND (NOT B))

You can have more than two conditions in an expression.

Find those employees who are in department 1 and hold the position of Software Engineer or Data Analyst.

Query:

SELECT name
FROM employee
WHERE department_id = 1 AND (position = 'Software Engineer' OR position = 'Data Analyst');

Output:

ORDER BY

This allows us to order our results in ascending or descending order.

Syntax:

SELECT column1, column2
FROM table1
ORDER BY column1, column2 ASC | DESC;

The default order is ascending (ASC). Let’s understand this clause through examples.

Show the salaries in ascending order:

Query:

SELECT salary
FROM employee
ORDER BY salary;

Output:

Now let’s do the same in the descending order:

Query:

SELECT salary
FROM employee
ORDER BY salary DESC;

Output:

You can combine WHERE and ORDER BY clauses together. You need to have WHERE before ORDER BY.

Show the names and salaries of female employees in the descending order of salaries:

SELECT name, salary
FROM employee
WHERE sex = 'F'
ORDER BY salary DESC;

Output:

You can order by multiple columns. Let’s say we want to order by name after ordering by salary.

SELECT name, salary
FROM employee
WHERE sex = 'F'
ORDER BY salary DESC, name;

Output:

Notice that first salaries are ordered in the descending order and then by the ascending order of names.

LIMIT

LIMIT clause allows you to limit the number of rows returned by a query. Let’s say we want to return the top three earners name and their salary.

SELECT name, salary
FROM employee
ORDER BY salary DESC
LIMIT 3

Output:

Practice Questions:

  1. Find the male employees who are older than 35 years.

2. Find the average salary of males who are older than 35 years.

3. Count the number of employees who are not living zipcodes 95123 or 95128.

4. Find the employees whose position has the keyword Engineer or who works for department 1.

5. Find the name and salary of employees in department 1 and display them in the descending order of age.

6. Find the name and age of the 5 oldest employees.

Don’t look at the answers until you try them. It is important for you to try them if you truly want to learn!

.

.

.

.

.

Answers:

1. SELECT name
FROM employee
WHERE sex = 'M' AND age > 35;
2. SELECT AVG(salary)
FROM employee
WHERE sex = 'M' AND age > 35;
3. SELECT COUNT(*)
FROM employee
WHERE NOT zipcode IN (95123, 95128);
4. SELECT name
FROM employee
WHERE position LIKE '%Engineer%' OR department_id = 1;
5. SELECT name, salary
FROM employee
WHERE department_id = 1
ORDER BY age DESC;
6. SELECT name, age
FROM employee
ORDER BY age DESC
LIMIT 5;

Hope you enjoyed the lesson as much as I did putting everything together.

Stay tuned for lesson 4 where I explore Null values and GROUP BY operator.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Digest

Digest

59 Followers

One Digest At a Time. I value your time! #datascience #dataanalyst #datascientist #probability #statistics #ML #AI #savetime #digest