Customer attrition, otherwise known as customer churn is the loss of customers or clients. A churn analysis is key to measuring customer retention efforts.  In addition this analysis will provide an insight into the growth and current health of an e-commerce, software as a service, or subscription based business.

This post will focus on the technical details of performing said churn analysis using SQL (postgres 9.6). You can follow along using the sample dataset which is located on GitHub.The following example will study a sample data set from an e-commerce business. First some definitions we’ll use throughout the example to characterise the customer’s status.

  • new customer – a customer is new for the first 30 days
  • active customer – an active customer is someone who has made a purchase within 90 days
  • churn event – a customer who has not made a purchase for 90 days
  • return event – a purchase made after a churn event, the customer will be considered an active following the return event

Figure 1: User status definition

By the end of this post you will be able to:

  • break out revenue by new, active, and returning customers at any point past or present
  • Track churn events

In order to track churn events we will need to build a historic time series for each customer and their statuses.  It will start at each user’s first purchase to the current date incrementing by one day each time step. We will define each user’s status throughout time. Consider the following sample orders table below.  Notice that there are a variety of different purchase frequencies, some customers have sporadic purchases and others make purchases more frequently. From the orders table it’s not immediately obvious which customers have churned since the churn even happens between orders. 

SELECT * FROM orders ORDER BY customer, ts;

The first step for our churn analysis is to categorize customer status at the time of a purchase.  In order to correctly identify if a purchase was made by a new, active or returning customer we will need the date of the first purchase, the next purchase and the previous purchase in each row.  To do so we will use the first_value, lead, and lag window functions highlighted in the query below. If you’re unfamiliar with window functions it’s a type of aggregation, the difference is that the rows are not aggregated and the calculation is only performed on a subset of rows (which the user specifies).  We create the customer_facts  view using the query below:

CREATE VIEW customer_facts
AS
SELECT *,
FIRST_VALUE(ts) OVER (PARTITION BY customer order by ts) first_purchase,
LEAD(ts) OVER (PARTITION BY customer order by ts) next_purchase,
LAG(ts) OVER (PARTITION BY customer order by ts) previous_purchase
FROM orders;

Now we’ve created a convenient way to display customer facts by using select * from customer_facts. The results are shown in the table below.

Let’s take this a step further, using the customer_facts view we just built we can add a case statement to define the user status at each purchase.  Using the query below we create the customer_econ_states view.

CREATE VIEW customer_econ_states
AS
SELECT *,
CASE
WHEN ts = first_purchase THEN ‘NEW’
WHEN (ts – first_purchase) < 30 THEN 'NEW'
WHEN (ts – previous_purchase) <= 90 THEN 'ACTIVE'
WHEN (ts – previous_purchase) > 90 THEN ‘RETURNED’
END AS status
FROM customer_facts
ORDER BY customer,
ts;

With the customer_econ_states we can visualize the user statuses from Figure 1 for each purchase the customer made.

With the customer_facts view we can now split out the revenue between new, returning, and active customers by aggregating on user status.  These metrics can be a great asset for evaluating how to spend advertisting, and promotional dollars. For example questions like “Is it worth it to try to reactivate churned customers? Or is it better to send promotions to active customers to keep them making purchases”.

SELECT SUM(total),
status
FROM customer_econ_states
GROUP BY 2;

The customer_facts view is no doubt useful but the churn events are notably missing from the status column.  Since churn events happen on days when purchases are not made we need to fill in all the days inbetween.  In order to fill in the gaps we’ll need a utility table which we’ll call everday.  It will include each date between the first known purchase and the current date.  We can create this table dynamically with the help of the generate_series function (note: this is native to postgres).

INSERT INTO everyday
SELECT (DATE ‘2018-10-01’ +(GENERATE_SERIES(0,245)))::DATE;

Now we’re going to put it all together by combining customer_econ_states view with the everyday table.  If we left join the customer_econ_states on the dates greater than or equal to the current purchase month and less than the next purchase date this will successfully fill in orders between the first purchase and the next purchase dates.  However to cover the days that happen after the user’s last purchase we must join on the date in the everyday table coalesced with the next_purchase and the current_date  using the query below.

WITH timeseries AS
(
SELECT

     mes.customer,
     ed.ts as day,
     mes.ts as current_purchase,
     first_purchase,
     next_purchase,
     previous_purchase,
     status,
     (ed.ts – previous_purchase) AS days_since_last_purchase
FROM customer_econ_states AS mes
  LEFT JOIN everyday AS ed
ON (ed.ts >= mes.ts
AND ed.ts <COALESCE (mes.next_purchase,CURRENT_DATE))
ORDER BY 1,
2
)
SELECT customer, day, current_purchase, first_purchase, previous_purchase,
FROM timeseries;

The last step is to apply the user status logic from Figure 1 into the case statement highlighted below.

WITH timeseries AS
(
SELECT

     mes.customer,
     ed.ts as day,
     mes.ts as current_purchase,
     first_purchase,
     next_purchase,
     previous_purchase,
     status,
     (ed.ts – previous_purchase) AS days_since_last_purchase
FROM customer_econ_states AS mes
LEFT JOIN everyday AS ed
ON (ed.ts >= mes.ts
AND ed.ts <COALESCE (mes.next_purchase,CURRENT_DATE))

ORDER BY 1,
2
)
SELECT customer, day, current_purchase, first_purchase, previous_purchase,
     CASE
         WHEN day = first_purchase THEN ‘NEW’
         WHEN (day – first_purchase) < 30 THEN ‘NEW’
         WHEN (current_purchase – previous_purchase) > 90 AND current_purchase = day THEN ‘RETURNED’
         WHEN (day – current_purchase) < 90 THEN ‘ACTIVE’
         WHEN (day- current_purchase) = 90 THEN ‘CHURN’
         WHEN (day- current_purchase) > 90 THEN NULL
         ELSE status
     END as status_through_time
FROM timeseries;

This query creates a timeseries for each customer from their first purchase to the current date with their status.  Below is a sample of Cameron’s purchases, you can see his first purchase was made on 10/8/2018. He remains an active user until the 90th day when he churns.  He then makes another purchase on 5/5/18 marking a return event. Cameron will remain an active user for at least 89 more days after. The full timeseries can be found here

customer| day        | current_pur| 1st_purchas| previous_purchase | status
Cameron | 2019-01-05 | 2018-10-08 | 2018-10-08 |                   | ACTIVE
Cameron | 2019-01-06 | 2018-10-08 | 2018-10-08 |                   | CHURN
Cameron | 2019-01-07 | 2018-10-08 | 2018-10-08 |                   |
.
.
.
Cameron | 2019-05-03 | 2018-10-08 | 2018-10-08 |                   |
Cameron | 2019-05-04 | 2018-10-08 | 2018-10-08 |                   |
Cameron | 2019-05-05 | 2019-05-05 | 2018-10-08 | 2018-10-08        | RETURNED
Cameron | 2019-05-06 | 2019-05-05 | 2018-10-08 | 2018-10-08        | ACTIVE
Cameron | 2019-05-07 | 2019-05-05 | 2018-10-08 | 2018-10-08        | ACTIVE