BigData in Retail
Hierarchical Tree of Needs as a Method of Assortment Formation in a Trade Network
Have you ever wondered why on the shelves of any stores - from electronics hypermarkets to grocery stores - there are exactly these products and in exactly this sequence?

And Category Managers (and data scientists, by the way) think about it all the time. Today we will tell you how an electronics hypermarket used artificial intelligence to fill the shelves and studied the needs of customers in assortment management.
Traditionally, category planning has been based on historical sales, expert assessment of market dynamics, indirect data sources, together with a forecast of future sales.
Customers do not just buy goods, they cover certain needs with purchases. This can be considered to improve efficiency.
Of course, you want to put on the shelf the product that will be best sold in pieces, as well as bring maximum profit in money. It turns out that for each product it is necessary to make a forecast of unit sales, multiply it by the margin of one unit and get the expected profit.
Next, you can sort all the products by the value of this profit and put on the shelf those with the highest profit. The shelf size is limited (in the case of a physical store), the size of the warehouse is also limited (in the case of an online store), so you have to choose.

What are the disadvantages of the above approach?
The first and foremost is that goods are not independent from the point of view of the customer. What does it mean. Let's imagine that historically we have two TV models that sell well. They have the same diagonal, the same screen quality and a comparable set of options. Both TVs fit on the shelf of the store. Please note that we are talking about a shelf, that is, where you can put the goods so that the client sees it. How many TVs in stock we do not consider.
Let's also imagine that there is a third TV model, which is sold slightly worse than the first two. Moreover, it is not necessarily worse in pieces, for example, marginality is lower. The logic described above says that such a model will not be put on a shelf if the shelf can only accommodate two TVs.
However, one smart category manager really likes this third TV - he wants to put it on the shelf at all costs and asks this question: What if the first two TVs, although they sell well, are sold within the same customer group for the same targets. In other words, they cover just one popular need. That is, a customer who bought TV 1 would just as readily buy TV 2 and vice versa. Coin tossing random factors cause one customer to buy TV 1 and another to buy TV 2 (we are oversimplifying of course). And, if this is the case, then only one of these two TVs can be left on the shelf (the most marginal, for example, or the most sold in pieces), and next to put that third one.
The economic effect from the sale of the first two TVs will not decrease (provided that we ensure that the stock of the model is in stock). At the same time, the total result from the sale of all TVs in general will increase significantly, since we add sales of the third model to the sales of the first two models, which will appear due to new customers who have seen and bought the third model.
Understanding that there are needs is one thing, learning to identify them and using them in work is another. How to check that the goods cover the same need? How to automate the calculation of such requirements? How to integrate such an approach into the work process of a category manager?
The main tool for identifying customer needs is building a tree of customer preferences
To begin with, it is not so much the needs themselves that are important to us, but the assignment of our goods to them. In other words, our image of the result is a table where for each product there is a need that it fills.
Mathematically speaking, what exactly is the problem we are solving? There are two options: classification task or clustering task. The task of classification means that we have a list of needs, formed in some way, there is some subset of goods for which the assignment of the goods to the need has already been marked, and we need to attribute the remaining goods to the needs.
Here the question arises, how can we form the initial list of needs and attribute goods to them, because we do not know in advance what needs exist. The goods themselves and their sales are our way to find out.
Accordingly, the task of clustering suggests itself. That is, all goods must be broken down into a set of clusters, which would be needs. The interpretation of the needs themselves is a secondary matter, it is important for us to know whether a pair of goods belongs to the same or to different needs, what these needs are less important.
The first question is, according to what parameter of the product will we do the clustering so that it reflects what we need, namely the needs. The product does not have and cannot have a parameter reflecting the need, otherwise the problem would be solved. This is the parameter we needed to enter. Due to the fact that goods that cover the same need must be close to each other, we have introduced a measure of proximity for each pair of goods, reflecting belonging to the same need.

To do clustering by needs, we need to determine the proximity measure of goods (the measure in which two goods cover one need).
Accordingly, having received a proximity measure, we could calculate the matrix of distances between each pair of goods and, based on this matrix, make clustering. It remains to figure out how to calculate distances.
We have considered three approaches:
    • Based on purchases. If a client buys different products from the same category over a certain period, then these products are probably equivalent to him. If we find similar behavior in a large number of customers, then such products will cover a single need;

    • Based on the properties of the goods themselves. TVs with the same diagonal with the same type of matrix and in the same price segment, probably, close the needs. Let's add some more properties here, for example, the color of the body, the type of stand, and it seems that this is a complete description of the product. All properties can be encoded, vectors can be made from these numbers (coordinates in the property space), and the distance between goods is asked for itself - Euclidean distance between property vectors;

    • Based on shared product views on the retailer's website. Didn't expect? Why not? Here we proceed from the hypothesis that if a client views several products from the same category within one session, then these products are probably similar to him. If many customers do the same thing, then apparently there is something in common between these products.
      Let's take a look at each approach in turn.
      Proximity measure based on purchases. This is probably the most accurate and fair, because if a person buys different products from the same category for a certain time, then with a high probability these products replace each other for him.
      Unfortunately, this approach is difficult to apply in an electronics retailer, as customers don't shop every day, for some categories every month, or even not every year. We don't buy a new refrigerator every month or a new TV. Even phones we usually change at most once a year. For this reason, a purchase-based proximity measure would work well for a grocery retailer, but not so much for an electronics retailer.
      Proximity measure based on product properties. It's a pretty obvious measure of proximity. Let's take a kettle, for example. They, teapots, are made of glass, metal and plastic, there are teapots, and there are thermopots. Kettles also come in premium (very different in price from others), multi-colored, with various functions regarding keeping the temperature, heating instead of boiling, and so on.
      Further, the following conclusion suggests itself: if two teapots are similar in properties, then they cover the same need. This is a very logical and simple conclusion: if I have two kettles, which are both metal, the same size, with the same functionality and approximately the same price, and one is more marginal than the second, then we can put one on the shelf instead of two and we won't be mistaken. Reality, however, shows that we are wrong.
      There is a popular belief that the more different products you put on the shelf (if the shelf allows), the more sales will be. People buy with their eyes. Kettles, even very similar in properties, may differ in appearance. Accordingly, this teapot may be liked, but this one is not.
      In addition, in some categories of goods there is strong advertising support from manufacturers and products, even very similar in properties, in the minds of buyers can differ like heaven and earth.
      It is also important that the properties of the goods in the systems are filled with people who, as before, tend to make mistakes. In other words, a similarity measure based on product properties is also not ideal. The question, however, is how bad it is. Probably, its main drawback is that we cannot measure it exactly, but we can estimate it approximately. How - we will describe below.
      Proximity measure based on website views similar to the measure based on purchases, but based on views. The advantage of this measure is that it clearly correlates with customer behavior, unlike a property-based measure. With properties, we only assume that products with similar properties fulfill similar needs for the client. In this case, we objectively observe it.
      What are the disadvantages of this measure? The most obvious downside is that pageviews correlate with rankings on the site. What does issuance depend on? From the popularity of the product (by sales), from promo. All this, of course, gives an offset. However, the promo affects the preferences of the customers in any case.
      To estimate proximity based on views from the site, it is necessary to build a metric between pairs of products that describes the logic "the more joint views, the less distance between products." That is, in this case we are working with sets. The distance metrics of Jaccard and Hulet come to mind, we tried both and settled on Jaccard because of the simplicity and prevalence of the metric. Plus, it has more understandable business results.
      As you can see, the correlation is at the level of white noise. From this we conclude that the measure by properties is definitely not suitable for us, although the properties themselves will still be useful to us (we will return to this later).
      So, we have defined a measure of proximity for each pair of products, which allows us to build a distance matrix, where on each of the axes there are products, and at the intersection, the distance between them is based on the measure of views. It remains to do clustering.
      An example of a distance matrix:
      We promised to talk about how to evaluate how bad a property-based metric is.
      So, we understand that the metric based on views is not ideal,
      but nevertheless it correlates with the needs of the customers.
      Thus, if a property-based metric would have a right to life,
      then it would at least partially correlate with the measure based on views.
      Let's take a look at this correlation.

      Note that the distance matrix is quite sparse, as there are only a small number of "substitutes" for any particular product.
      Having determined the measure of proximity, it's time to cluster according to needs. That's what our tree is for.
      How will we do clustering? Classical clustering will not suit us here, because, firstly, we do not know in advance how many clusters there will be, and secondly, it is not at all a fact that all goods are neatly grouped around several centers.
      It is more likely that some goods are similar to others, those others are similar to some others, and so on. That is, similarity is built through "intermediaries", and not directly. Here, hierarchical clustering is more appropriate. We need to build a tree of related goods, and call the large branches of this tree needs. So we have come to the tree of client preferences (needs).
      Let's assign a cluster to each product. Next, we will collect goods in pairs, minimizing the metric selected above (Jaccard distance). After the first iteration, we will have two products in each cluster, how now to collect pairs of products in a higher-level cluster?
      There are many ways (minimum distance, average distance, total distance, the Ward method and others), but given the metric itself, which is not Euclidean and the meaning of the hierarchy, which assumes that each product in the cluster should represent a certain customer need, we settled on the method "average".
      Clusters come together when the average distance between all their products is the shortest among all possible cluster pairs. Such an algorithm continues to be executed until we have 1 large cluster with all products, in the process building a tree of product partitions for customer preferences (mini-cluster).
      And what, your clustering is immediately obtained for you? But what about the enumeration of the parameters of the algorithm? What about the interpretation (markup) of the result?
      So, what is needed to build a complete tree? As in any clustering, it is necessary to iterate over the parameters in order to obtain the optimal tree. Let's take a look at several trees in the same category and see how they differ.
      Here is such a tree turned out, for example, after the first calculation of the subcategory Refrigerators
      Having chosen the initial parameters, we see that several branches are too "smeared" horizontally (most likely they include SKUs with low views and other sales indicators), we will continue to iterate over the parameters to collapse them into one cluster. Let's count once more.
      The tree has become noticeably better, but there are still some very scattered branches.
      Let's do it one more time.
      It is important not only to perform clustering, but also to make a competent business interpretation of the result. No one does it better than the Category Manager, and we had to create a whole visual tool for this markup.
      Let's see how the markup of a customer preference tree looks like from the point of view of a category manager using the example of the refrigerator tree we received.
      When marking a tree, the CM moves from left to right, sequentially opening clusters. Further, all figures and descriptions correspond to our example "Embedded technology".
      The user navigates the tree. On the right side of the table, you can see the products that are in the selected cluster. In the figure, the selected one (from which the arrows go) contains different product groups. Both child clusters must be examined.
      When viewed, it turned out that the top child cluster contains only Built-In Washing Machines (table on the right).
      Further, when moving along the tree, a cluster of four goods with similar characteristics in the same price category was found
      This cluster is suitable for creating a need. The tree looks like this after creation:
      So, we have received the labeled needs of the client for the "Refrigerators" group and will continue to use this tree to optimize the assortment in stores, both in physical and online. Read about how we do it in the following publications. For now, let's briefly dwell on the new role of Category Manager in light of the changes in approach outlined.
      The new role of the category manager
      Building a CDT (Customer Decision Tree) greatly simplifies the work of the CM, because on the basis of the already created markup and clustering, it can identify individual groups of goods and form an assortment based on the actual needs of customers.
      Real-data-driven analytics help you make decisions faster, avoid misinterpretation of indirect customer data, and get rid of intermittent and distorted survey metrics. For example, after the implementation of the system, we immediately became convinced that there may not be any correlation in the properties of products. That is, people do not consider teapots of the same size and brand - buyers always start from some of their starting points, and CDT helps to find them.
      Of course, the assortment is managed within a certain range (kitchen products, digital products, and so on). But to build it, you need a hierarchy of goods - expert and even technical. And today we are talking about category managers receiving already structured information "on the fly".
      For example, CDT may indicate that a particular customer need is fading. A month ago, users wanted to buy computer mice with backlighting, but today they no longer. As a result, the manager can deprioritize it within the logic of building an assortment. We are just working to add automation of making such decisions to the system and reduce the amount of logic that CM needs to keep in mind, thereby freeing up an invaluable human resource for more complex and creative tasks that are extremely relevant in modern retail.
      Data Studio and M.Video-Eldorado company worked on the article, in which the project on introducing the assortment matrix was implemented by the team.
      I agree to the Data Usage Terms
      Our contacts
      +7 (967) 215-75-05