Proximity measure based on purchases. This is probably the most accurate and fair, because if a person buys different products from the same category for a certain time, then with a high probability these products replace each other for him.
Unfortunately, this approach is difficult to apply in an electronics retailer, as customers don't shop every day, for some categories every month, or even not every year. We don't buy a new refrigerator every month or a new TV. Even phones we usually change at most once a year. For this reason, a purchase-based proximity measure would work well for a grocery retailer, but not so much for an electronics retailer.
Proximity measure based on product properties. It's a pretty obvious measure of proximity. Let's take a kettle, for example. They, teapots, are made of glass, metal and plastic, there are teapots, and there are thermopots. Kettles also come in premium (very different in price from others), multi-colored, with various functions regarding keeping the temperature, heating instead of boiling, and so on.
Further, the following conclusion suggests itself: if two teapots are similar in properties, then they cover the same need. This is a very logical and simple conclusion: if I have two kettles, which are both metal, the same size, with the same functionality and approximately the same price, and one is more marginal than the second, then we can put one on the shelf instead of two and we won't be mistaken. Reality, however, shows that we are wrong.
There is a popular belief that the more different products you put on the shelf (if the shelf allows), the more sales will be. People buy with their eyes. Kettles, even very similar in properties, may differ in appearance. Accordingly, this teapot may be liked, but this one is not.
In addition, in some categories of goods there is strong advertising support from manufacturers and products, even very similar in properties, in the minds of buyers can differ like heaven and earth.
It is also important that the properties of the goods in the systems are filled with people who, as before, tend to make mistakes. In other words, a similarity measure based on product properties is also not ideal. The question, however, is how bad it is. Probably, its main drawback is that we cannot measure it exactly, but we can estimate it approximately. How - we will describe below.
Proximity measure based on website views similar to the measure based on purchases, but based on views. The advantage of this measure is that it clearly correlates with customer behavior, unlike a property-based measure. With properties, we only assume that products with similar properties fulfill similar needs for the client. In this case, we objectively observe it.
What are the disadvantages of this measure? The most obvious downside is that pageviews correlate with rankings on the site. What does issuance depend on? From the popularity of the product (by sales), from promo. All this, of course, gives an offset. However, the promo affects the preferences of the customers in any case.
To estimate proximity based on views from the site, it is necessary to build a metric between pairs of products that describes the logic "the more joint views, the less distance between products." That is, in this case we are working with sets. The distance metrics of Jaccard and Hulet come to mind, we tried both and settled on Jaccard because of the simplicity and prevalence of the metric. Plus, it has more understandable business results.