Make money on Big Data
What is the profit of Big Data processing? And why doesn't mathematics by itself work?
Any innovations and technologies in business are needed only with one pragmatic goal - to bring benefits. Possibly in a complex chain. For example, well-designed digital loyalty programs increase customer return and sales grow. Or an unusual feature in a mobile service distinguishes the company from others, which is why the flow of customers and, again, sales increase.

The same is true for such an established term as big data. Is it worth assembling a team and investing in ordering and using databases if it doesn't pay off in the end? Certainly not. But not every exhaust can be turned into a benefit.

Let's talk about a few cases in which automated processing of big data has affected business performance. As well as the nuances that should not be forgotten.
Defect prediction
It's about a product defect at a large steel plant. If it is not managed, a possible defect forms big disadvantages and increases the cost.

Steel production is a multi-step process. So that raw materials turn into finished material - rails, rods, etc. - in furnaces, heating and cooling of the metal semi-finished product takes place many times. The thing is that the defect - even if it crept in at the initial stage - is visible only at the exit. At the same time, a lot of electricity has already been spent on each of the heating and cooling processes.
Therefore, the plant decided to try to predict the defect at an earlier stage - using machine learning.
To control the production, at each stage, special equipment takes samples of the quality of the metal. And a decision is made whether to continue production or you can reject everything.

The big data team organized the process. Data began to flow into the Oracle database. For each measurement, a data vector was accumulated that characterized it - about 50 parameters in total. Also, the production partially had data on past defects. The team measured which parameters lead to which deviations in the final product, how they collectively affect the result.
Based on all these data, the model was trained. The model began to predict the probability of a defect. It was necessary to determine the threshold of probability, above which the product had to be rejected.

There are two standard problems for machine learning here. The first is that defects need to be "caught" as much as possible and earlier in order to save energy. The second is that good products should be cut off as little as possible. Again, in order not to drive high-quality parties for several cycles.
This was the main effort of the team. For about two months, the team corrected the model and improved the forecast accuracy.

As a result, the prediction works from the first stage of production, but it really helps from the second or third. The system immediately receives the parameters of each stage and, in case of suspicion, informs the operator "maybe this batch has received a defect".
Final result: 30% of defects are found at 5% of false detections.

Of course, the closer to the finished material, the easier it is to give a more accurate forecast. But even less money is saved. There needs to be a compromise, a middle ground where there is good coverage, high accuracy - and yet we are not too close to the end of the production cycle. After all, then you won't save much and there's no point in additional actions.
So far, these processes have been implemented at the plant on a single installation. We must not forget that the implementation of such a project is expensive in itself, since not all installations have the necessary parameter meters. Therefore it is necessary to consider the business case.
Predicting the tendency to buy
This case well describes optimization in retail sales.
Artificial intelligence solves the problem of prediction based on historical data. First, buyers who purchased a product are taken. AI builds a model according to pre-selected parameters that indicate the propensity to buy. Then the model hosts a list of customers who have not yet purchased the same product. The model trains and points to those who are more likely to buy the product if offered to them.
The disadvantage of the approach is that for each product you need to build your own model. For example, for online stores with thousands of products, this is very costly. In addition, the model greatly narrows the target sample of customers - we only focus on those who potentially need the product.

This is where recommendation systems (RSs) come into play. Instead of hundreds of models, a "customers-products" matrix is being built. The intersection shows which of the customers bought which product. And on the basis of similar purchases, new offers are made where there are no intersections. So, for example, the system of online cinemas works.

The main advantage of the recommendation system is not the increase in the conversion of a client into a buyer. Because for both the model and the recommender system it is about 10-15%. The advantage of the recommender system is an increase in reach by about 40%. For those who need the product, there will be a 10-15% increase in conversion. And for those who are less inclined to buy it, it is only 1-2%. But these 1-2% - across the entire client base. And so with one marketing offer, you can reach many more people.

An important nuance: math cannot be considered in isolation from business.

If I am an online retailer, all I need to do is attach a recommender system to the site that starts offering products. The customer clicks on the products and, if interested, buys..

If I am an offline retailer, a bank, an insurance company, a telecom operator, for sales I have to make outgoing communication - calls, SMS, e-mail. And here we must admit that although the model itself gives a conversion increase of 10-15%, the conversion is strongly influenced by the method of sale. If I advertise something to a client and he needs to do something to buy (to go to the store, to an event, etc.), this creates a barrier. And automatically greatly drops the conversion. A customer might be very inclined to buy jeans, but if they were around, that's one possibility. And if you need to go somewhere or go after them, the degree of desire decreases.

There is no modeling here. There is a delivery process - the model will show one efficiency, no - a completely different one.
If we are talking about remote sales via phone, then I have not seen a conversion of more than 2-3% in principle. And if a person is connected to something remotely (for example, some kind of tariff), the total conversion from a call can reach 11-12%. If there is a process for delivering goods - for example, a bank offers cards and delivers them - the total conversion can reach up to 5%. That is, this part is even more dependent on the sales business process than on artificial intelligence modeling.
Default prediction
This is one of the most sought-after tasks in banks and microfinance organizations that issue loans to individuals. The more accurately artificial intelligence predicts how dangerous a potential borrower is by not returning money, the more effort the bank will save and the more it will earn on conscientious clients.
Previously, banks used only personal data and plus information from the credit history bureau (BKI). The latter was very simple - these were just the stories whether a person returned loans or not, how many loans he has now and for what amount, what is the monthly payment. Now analytics has been added to all this.

First, the bureaus themselves, having data on clients and default rates, began to build their own models. Instead of dumping a barrage of information on the bank, they give the banks variables of their model as one of the features. This improves the prediction accuracy on the bank side. And BKI started offering it as a service.
Then there were telecom operators. On their side, they started making scoring models. An important point is that in the operator's scoring models everything happens anonymously, without disclosing personal data. This is pure machine learning: there are no hard and fast rules like "if ARPU is higher/lower, then…" The operator's scoring system calculates a certain score from the phone number, which is passed to the bank.
Companies have also emerged that take social media data and turn it into features. They build models that also predict default.

It turns out that unlike a few years ago, the bank receives aggregated estimates from different structures. All this gives an increase in the accuracy of guessing the default by 5-7 percentage points. Sometimes more - up to 10 percentage points. This also translates into millions, depending on the volume of business. The important thing is that these are percentage points. 5% to 0.6 is 0.61, and 5 percentage points to 0.6 is 0.65. That is the difference at times.

Big data has already passed the initial test run in business. You don't have to expect stars from the sky from technology, but with the right approach and a smart team, it can increase profits and reduce losses. Not at times, but by tangible interest.
I agree to the Data Usage Terms
+7 (967) 215-75-05