3-fold increase in conversions due to targeting of mailings based on predictive model

Today’s consumers are constantly inundated with messages from various brands. Many brands send multiple messages, through multiple channels. This makes it difficult to attract and keep the consumer’s attention for a long time. At the same time, it is easy for the consumer to become tired of the communication and pay less and less attention to it. Thus, it becomes more important than ever to choose the right content, to send the most tailored message to the consumer, and to limit messages that are not interesting and only increase the risk that the consumer will become insensitive to the message.

Predictive modeling is helping to solve the problem. Systems based on machine learning are able to predict consumer interest in a particular type of message or offer with a high degree of accuracy. The article uses a concrete and current example (from May 2023) to show how to apply the aforementioned tools in practice. Due to the highest standard of confidentiality, the numbers we will present will be scaled or shown as indexes. However, they will faithfully represent the observed differences and effects.

The problem with the traditional approach to email targeting and the need for change

The organization to which the example relates, like many others, for many years used a method of so-called “maximizing revenue” from its communications base through broad and frequent mailings. That is, in practice, information about an offer was sent to all consumers who had permission to communicate through a given channel. In a few cases, using expert criteria, the communicated base was narrowed down somewhat. However, this was based on simple criteria such as: has ever bought the promoted product before, has not bought product X in the last 6 months, is in a woman over 55, etc. The results were very good for a long time, and no one saw the need to change the process used. At some point, however, a slow decline in the email open rate (the so-called “open rate”) began to be observed. The downward trend began to be pronounced. Combined with the declining number of newly acquired consumers, this led the organization to wonder if it was possible to work better with the existing base. What can be done to reverse the trend of declining interest in the communications being sent?

The decision was made to test the integration of machine learning and predictive analytics into the process of selecting consumers for mailing campaigns. We prepared a predictive modeling system that generates “tailor-made” scoring models for each campaign. The general architecture of the system is shown in the diagram below.

architecture of predictive modeling system

Use of predictive modeling in mail targeting

For the purpose of training the model, more than 100 variables from the areas listed in the diagram were used as input data. The model is built on the basis of advanced algorithms, able to cope with such a multitude of attributes and extract from them as much information as possible about the actual profile of the consumer. The final result is an estimate of the probability of interest in a given communication by each consumer. This is then used for the final selection of consumers for the campaign.

The results of the changes in the communication targeting process met (and even exceeded in some aspects) expectations. To prove the usefulness of the model, we conducted experiments. Half of the base was subjected to selection by the old way, while the other half was selected using the model’s prediction. It should be noted here that in both groups we used exactly the same emails – the same subject, exactly the same creation. Also, the timing of the mailing was the same. Therefore, none of these factors could have affected the results of the experiment. The only difference between the groups was the way consumers were selected.

Effects of using predictive modeling

In the group targeted with the model, it was possible to reduce the size of the communicated group by nearly 14 times – for every 100 communicated with traditional criteria, there are only 7 communicated according to the process based on the predictive model.

At the same time, such a small group generated similar (only about 2% lower) sales.

This was achieved by significantly higher (3 times) conversion in the group assigned to the campaign in the new way. And also a much (4 times) higher average receipt value in that group.

Narrowing the communicated group allowed to limit it to those really interested in the offer. This is evidenced by a much higher open rate (3.2x higher) and click to open rate (almost 2x higher). The click to open rate in this case is calculated as CTOR = LC/LO, where LC is the number of consumers who clicked on the link from the email, and LO is the number of consumers who opened the email. While the open rate is highly dependent on the subject line of the email, a higher CTOR indicates actual interest in the content and offer that is included in the email.

Targeting mailings based on predictive model – summary

By using an advanced data science tool in the form of a predictive model, it was possible to achieve:

  • better matching of communications to consumer interests and needs
  • a significant reduction in the number of communications in a given campaign with minimal damage to the sales result (just over 2%)
  • reduction of communication “overload” – the consumer will receive communication less frequently but it will be better tailored in the new process

The exact impact of the model and the new targeting process on the trend of open and click-through rates of mailings, can only be studied over a longer period and requires at least several months of observation. However, the first recorded results look promising and give reason to expect a reversal of the clear negative trend seen in the months before the introduction of the scoring model.

Finally, it is worth noting that an advantage of the system is the openness of its architecture to new data sources. If new variables become available, they will be automatically incorporated into the model training process and used for prediction. Another important feature of the described solution is the model’s ability to update itself as new data arrives, including data on executed campaigns and their effectiveness. As a result, the model will automatically adapt to the changing needs and behavior of consumers and their reactions to the communications sent. This guarantees the usability of the system over the long term as well.

Attributional modeling – the key to understanding the effectiveness of marketing activities

In an era of increasing number of communication channels and brand touch points, proper identification of the importance and impact of each channel is becoming increasingly important. Correctly answering the question: to what extent did the use of a given message and channel affect the achievement of a goal is crucial for optimizing activities and maximizing the return on the invested marketing budget. The problem is as important as it is difficult. However, attribution modeling and data science methods come to the rescue.

What is attribution modeling?

Attribution modeling is the process of building a model to assign value to each of the touchpoints along a customer’s conversion path. It aims to understand which marketing channels and activities contribute to achieving business goals, such as making a sale, acquiring a new customer, activating dormant customers, recruiting new loyalty program participants or increasing brand awareness. Under the term attribution model, there can be many different constructs, from very (too) simple to very complex. In general, models can be divided into: single-point, rule-based multi-point and algorithmic multi-point models.

Single-point modeling

Single-point models allocate the entire value of a conversion (or, more broadly, goal achievement) to only one point of contact. Typical approaches are first-click or last-click. These are simplistic models. They do not take into account the entire customer conversion path. They don’t take into account the interactions between different points of contact and the context in which it takes place. Their advantage is simplicity and ease of application. However, in the complex world of today’s marketing, they are too simple to reliably reflect reality.

Rule-based multi-point models

Multi-point models distribute value among different touch points along the customer path. At the same time, they are divided into rule-based and algorithmic models. The former assign value to individual contacts based on predefined rules. For example:

  • linear model – assigns equal value to each contact point encountered by the consumer on his path to conversion;
  • U-shaped model – assigns the greatest value to the first and last points of contact, intermediate points are of lesser (though non-zero) importance in this model;
  • model based on conversion time – assigns the greater value the closer the point was to the moment of conversion. In this model, the greatest weight is assigned to the last contact immediately preceding the conversion.

The advantage of rule-based models is their clarity and relative simplicity. Also that they do not omit any touch points on the path to conversion. However, their weights are given based on arbitrary rules. Justification can be found for each of them. However, it is impossible to say which one is the best. As with single-point models, their disadvantage is also that they do not take into account interactions between different points of contact and do not take context into account.

Algorithmic multipoint models

Algorithmic models, like rule-based models, assign a weight to each touchpoint along the customer path. However, instead of arbitrary rules, they use sophisticated statistical methods to determine these weights. So instead of adopting predefined rules, these models “learn rules” from real data (using machine learning methods). Such models take into account the order of contact points and interactions between them. For example, the impact of an email on conversion may be greater when it was preceded by a banner display. They also take into account context, e.g. time of year, weather, media activity of competitors, pricing. They can operate at a very detailed level, e.g. distinguish the impact of individual creative variants or where and when they are displayed.

It is hard not to agree with the statement that today these types of models are the “gold standard”. Only they make it possible to take into account the entire complexity of consumer-brand contact paths. However, behind the accuracy and benefits of algorithmic models, there are associated challenges. In particular, as to the quantity, quality and scope of the data and the analytical competence required to create them. They also have the disadvantage of limited transparency due to the complexity of the rules that govern reality and are identified by the model. Algorithmic models, however, allow for advanced simulation (what if?) of various scenarios, e.g. what if we dropped channel A altogether? what if we reduced the budget for channel B? what if we switched the order of messages in the sequence? This in turn allows you to optimize your budget and activities. The investment in this type of model can therefore more than pay for itself.

Summary

Marketing attribution models have undergone a long evolution from simple single-point models to multi-point models based on complex machine learning algorithms and statistical methods (including those based on deep artificial neural networks). In doing so, it is still an area of intensive research and experimentation both in the scientific community and among practitioners. Despite the complexities and challenges of their creation and application, they are increasingly accessible thanks to the falling costs of data collection and processing. Thus, we are entering an era where we should not ask “whether” they are worth using, but “how” to build and use them effectively.

Predicting sales in unpredictable times. Why is it important to forecast not only sales but also demand?

Unfortunately, the unstable economic situation is making it increasingly difficult to maintain a profitable retail business. Retailers must be able to predict the future with some accuracy in order to run a profitable business. Forecasting sales and demand are therefore becoming two key aspects of business planning.

Sales prediction or demand prediction – which to choose?

The terms sales prediction and demand prediction are sometimes used interchangeably. However, there is a fundamental difference between them. What does this difference refer to and which prediction should we particularly focus on? That’s what we’ll discuss in today’s article.

To begin with, it is worth taking a moment to recall the relationship between the key terms demand, sales and supply. Demand refers to the amount of products or services that customers would like to purchase in a given period. Sales, on the other hand, is the amount of products or services that were actually sold during that period. For sales to occur, there must be a supply of products or services capable of meeting demand. This is because supply is the amount of products and services supplied that are available during a given period. Therefore, there are no sales when there is no demand. However, there are also no sales when there is demand and not enough supply. Generally, therefore, we can deal with three situations:

  1. Demand = supply
    Ideal situation: customers are satisfied with the ability to meet their needs, and the company is satisfied because it sells all available inventory.
  2. Demand > supply
    Not all customers are able to satisfy their needs, while the company bears the cost of lost potential sales. Such a situation arises, for example, when there is a shortage of a particular commodity in the warehouse or on the store shelf at the time when the consumer would like to purchase it. In a competitive market, the customer can then buy a substitute product/service from a competitor.
  3. Demand < supply
    An unfavorable situation for a company that has frozen money in merchandise lingering on the shelves, loses the ability to use store space and logistical resources to supply products in demand, and runs the risk of losing the value of the product altogether (e.g., as a result of exceeding the expiration date).

Accurate demand prediction avoids situations 2 and 3, or at least minimizes their scale and associated costs. At the same time, we can identify 5 areas where demand prediction brings benefits.

Benefits of demand forecasting

  • Optimization of production and inventory
    With accurate demand prediction, a company can better predict how much product it will need for a given period This allows it to optimize production processes and control inventory levels.
  • Increase sales
    Ensuring the right amount of products in stock allows the company to increase its sales and customer satisfaction.
  • Better planning of marketing campaigns
    By having an accurate prediction of demand, a company can better plan which products (or product categories) and during what period it pays to promote.
  • Optimization of prices
    With demand prediction and knowledge of inventory, a company can optimize the price of a product to balance demand with supply and maximize profit.
  • Cost reduction
    With accurate demand prediction, a company can avoid the costs of excess inventory and unnecessary logistics costs.

Sales prediction vs. demand prediction – differences

However, what if we prepare a sales prediction instead of a demand prediction? In such a situation, we risk underestimating. As we have already noted, sales occur when demand meets supply. In a situation where supply is insufficient (lack of goods) then demand will not be met and sales will be lower than they could be. In the extreme case with a total lack of goods on the shelf, sales will be 0. A predictive sales model can correctly predict the lack of sales in such a case. However, using such a model to decide on the right product inventory will result in underestimation and loss of potential sales. To make matters worse, the accuracy rates of such a model can be very high. This is because we may be dealing with a self-fulfilling prophecy:

No goods → zero sales → model predicts no sales in the next period →
decision to not supply the product (since no sales are assumed) → no goods.

And the circle closes.

This is a potentially costly mistake at the model conception stage and a trap into which companies sometimes fall. Meanwhile, machine learning methods make it possible to build and train predictive models capable of predicting demand (and not just sales). Such models take into account a number of different factors influencing demand (including seasonality, price, weather, promotions) and can operate at any level of aggregation (product group/single product, region/store group/single store, etc.).

Summary

Accurate demand prediction is the key to success. It allows you to reduce costs, increase sales and improve customer satisfaction. However, these benefits can only be provided by the right selection of data science methods suitable for solving this kind of problem.

Where should I look for customers?

A customer base is an important asset for any business. The data collected about customers allows better targeting of communications and preparation of more tailored offers. However, a healthy business needs a steady stream of new customers. In turn, there is usually no (or little) data on them. Where to look for customers? And can data science therefore help in reaching them?

The question posed above is best answered with an example. Some time ago, one company wanted to significantly expand the customer base buying its flagship product. Experience suggested that this product appealed to a completely different group of consumers than the company’s typical customer. An advertising campaign using billboards and flyers was planned. With a limited budget, however, the company did not want to “flood” the entire city and surrounding areas in which it operates with materials. It intended to focus its efforts and budget in locations with the highest probability of high return on investment.

The first idea on how to use data to solve this problem was to see where the current customers purchasing the product were coming from. Their address data was in the database thanks to the loyalty program in place. An analysis of the demographic and behavioral profile of customers buying the flagship item – the subject of the campaign – was conducted. Compared to typical customers, this group was characterized by an overrepresentation of the 30-35 age group by more than 10 percentage points, a higher proportion of men and higher income. It was assumed that particularly attractive from the point of view of the planned campaign would be areas with an above-average share of residents with such characteristics. Therefore, areas (neighborhoods, districts, municipalities) were selected on the basis of several data sources. These came from, among other sources, information made publicly available by the Central Statistical Office and offered commercially by various private providers.

Concerned that simply identifying customer locations of higher interest would not be enough, a more precise estimate of the sales potential of individual locations was needed. In short, an answer was sought to the question: how many sales can we count on? For this purpose, a predictive model was built, which was able to indicate for each area the expected future sales in any defined period. The model used variables such as the age and gender structure of each district, income per household, travel time to the service point, and purchasing behavior of existing customers in the area, among others. The average prediction error of the model varied within +/- 6%. To illustrate the level of detail with which the model was able to pinpoint locations, the following table contains the definition of the top two recommendations of the predictive model.

The areas with the greatest potential identified by the model were also visualized on maps (example of one below).

In order to assess the relevance of the model’s recommendations, the effects of the activities carried out in the group of the top 10 locations identified by the model were compared with the 10 locations from places 11-20 of the ranking. The return on investment in the group recommended by the model was more than 21% higher compared to the comparison group.

Data science in the right way, combining internal and external data sources with different levels of detail (individual customer data with aggregated data describing entire areas), can help solve various problems faced by business. Thus, it contributes to increasing return on investment.