LEAD and LAG SQL queries are not difficult to understand

AI/Data Science Digest
3 min readNov 13, 2022

--

When you want to find day over day, month over month, quarter over quarter, or year over year progress, we use LEAD and LAG SQL queries.

I use MySQL as an illustration. You should be able to run similar queries on other relational databases as well.

Instead of going into technical details, let me show you how to write LEAD and LAG queries using examples

LAG

LAG accesses the value in the row above and makes it available for you to use.

Say we have the following sales table:

Yearly sales in sales table

Our goal is to find the YoY sales progress.

With a LAG query, we can create a new column (say previous_amount) where we bring the sale amount of the previous year to the current year.

SELECT year, amount,
LAG(amount) OVER(ORDER BY year) AS previous_amount,
FROM sales;

LAG(amount) is actually LAG(amount, 1) where the second parameter is the OFFSET. The default OFFSET value is 1. This simply says that take the value of the immediate row above.

OVER(ORDER BY year) says that we first should order the sale amount by year and then do the LAG operation.

There is another keyword that goes with OVER, which is PARTITION BY: this is used to group the results by another column. We will skip this to keep the example simple.

AS previous_amount simply assigns the column name to the LAG computed field.

The output is:

Using LAG to bring the amount of the previous year to the current year. This creates a new column called previous_amount. Notice that the first row of previous_amount is NULL as there is no previous amount available.

We can leverage the previous_amount to find the YoY sales progress:

SELECT year, amount,
LAG(amount) OVER(ORDER BY year) AS previous_amount,
amount - LAG(amount) OVER(ORDER BY year) as difference
FROM sales;

The output is:

difference column shows the YoY difference in the sales amount. For example, in 2016, we had 1500 more sales whereas, in 2020, we had 8000 fewer sales (possibly due to the pandemic)

LEAD

Similar to LAG, LEAD brings another row’s value to the current row. The difference is, instead of getting the previous row’s value to the current row, LEAD brings the following row’s value to the current row.

Let’s say we want to bring the following year’s sale amount to the current year:

SELECT year, amount,
LEAD(amount) OVER(ORDER BY year) AS next_amount,
FROM sales;
Bring the next year’s sale amount to current year

The above are the most frequent use of LEAD and LAG keywords.

In a follow-up post, I will go into additional operations used with these keywords: OFFSET and PARTITION BY.

Hope this example-based approach helped you.

Feel free to provide your feedback and share it with those who may benefit from it.

Appreciate your likes to share my writings with a wider audience.

Happy SQLing!

--

--

AI/Data Science Digest

One Digest At a Time. I value your time! #datascience #AI #GenAI #LLMs #dataanalyst #datascientist #probability #statistics #ML #savetime #digest