How to determine the value and efficacy of machine learning
Machine learning (ML) related topics are a major trend. Countless blog posts exist covering how to create various models, juggle them around, use IT applications, etc. However, these articles rarely address the prerequisites for making them production worthy or even possible. What I wanted to touch upon is everything that revolves around those models. Similar to microservices, the concept is nice but only works when other capabilities are mature enough (i.e., being able to provision required resources quickly).
Often the companies starting to venture into the ML world misjudge the scope and breadth of things that should already be in place for a successful ML model performance in a production environment. While playing with ML sandbox data in isolated environments can give some insights, it won't bring you value. ML models need to be rolled out into production and used to provide value for teams and customers. Additionally, after rolling them out, you have to make sure you are delivering that value and must monitor the performance of your models as time evolves.
In this article, I'll explore the capabilities that should be in place to leverage ML models in an automated fashion and how to use them in production environments. I will also present prerequisites for successful production deployment and operations.
Identify prerequisites for machine learning
Many companies try to play with ML and figure out if it can bring competitive advantages for a company or add value to their clients. Companies tend to start ML initiatives, create models, and mine some insights. When the point comes where those models need to be incorporated into products or processes, they realize it's impossible because the prerequisites are not available. The time and money already spent on the models are never recovered.
Gartner predicts that through 2022, 85 percent of AI projects will deliver erroneous outcomes due to bias in data, algorithms, or the teams responsible for managing them.
This prediction is why we should be thinking about the plumbing part as soon as we decide to go onto this endeavor. We need to make sure all the prerequisites are mapped out and understand the full scope of the journey.
The importance of a CI/CD pipeline
When discussing prerequisites for developing software products, most engineers agree that a functioning/automated CI/CD pipeline is a must. Creating the software itself is not enough; it needs to be tested, verified, and repeatably deployed to production environments.
The same applies to ML. After all, the model IS the code. However, there are additional requirements beyond code. The model requires data as an input while being created, meaning that ML needs to extend to a traditional CI/CD. Let's take a closer look at the ML landscape and identify items "serving" ML models.
The ML model part is just a fraction of all of the items that are servicing it. In most cases, it's the part that gets the most attention even though other parts are larger in scope and more difficult to integrate properly in the operational environment.
As an enhancement, take a traditional code CI/CD pipeline and add data elements to support the ML model build and monitoring.
Building a fully functional CI/CD for just software is a challenge in itself, and a lot of companies are still struggling to get it working. Adding the data part of ML complicates matters further. It could be easier if data was clean, well structured, and static for years. However, in the real world, data is messy, comes in various structures and from various sources, out of band and changes over time. Another factor adding complexity is that it is not just the structure (schema) that is changing and the bias of the data (distribution of data changes). In other words, ML CI/CD must account for all of these changes. It becomes coupled with the process of how the company ingests this data, cleanses, transforms, and stores.
Normally the trained model efficiency starts degrading as soon as a couple of weeks (depends on the data type used) after it is deployed. If there are no measures in place to monitor this degradation (or drift) of a model, you will not know when you have to retrain your models, which leads to incorrect predictions and, at the same time, losing the trust of your customers. The model performance monitoring part usually comes as an afterthought after the model is already live.
Assuming you already detected that your model is incorrect, it is not straightforward to retrain it. The problem is that you need to retrain your model with the new data or the data that was changed, which forces you to have means of building out a new data set for training continuously and automatically.
In order to effectively apply ML in production, companies need a clear process of how data gets in, gets cleaned, ML models are built, code is deployed and monitored thought its lifetime. Basically, ML + CI/CD.
Determining must have capabilities
A clear process of how data flows is essential to successfully productize ML. To build those flows, you need certain capabilities in place. Those capabilities can be in various automation (maturity) states, with some items fully automated and others fully manual. The level of automation simply means how quickly you can rerun your full process to fix possible issues.
There are core items you should be considering before productizing ML. One aspect to keep in mind is evaluating capabilities to ensure that each serves ML and benefits the whole organization (i.e., accessible data sources, data correctness, working automation pipelines, automated testing, and monitoring benefits product delivery as a whole).
The following four elements are critical capabilities for effective machine learning automation.
1. Data abundance
Data is a core resource required to create an ML model. In other words, there needs to be an easy way to access data and consume it independently, factoring where it comes from and in what shape. Usually, larger organizations have a problem with data silos where the data is spread around different organization departments and in different shapes. Usually, access to this data is also limited to individual department systems.
Organizations tend to resort to data warehousing to consolidate the data into a single place but then get stuck in a phase trying to define a canonical data model. Even if this phase is successful, the solution in bringing all the data together becomes a maze of different ETL processes that requires huge maintenance costs and continuous attention. Because of this complexity, the new data sources are added very slowly, which in turn spans new ETL jobs to consolidate, adding further complexity.
To enable proper data abundance and at the same making sure the data is secure and is handed out only to appropriate applications, it has to be properly split out and served by sole owner of the data (think microservices where data is split per business domain and aggregates are built out from same single source). This cannot be just an ML application responsibility, but it is a strategic decision for a whole organization on how to handle data and create this capability (sometimes it's called Data-Driven Organization).
2. Data correctness
Data scientists know that correct data is of utmost importance for training proper ML models. Sometimes even a small amount of dirty data can lead to a completely wrong model. Tracing down such problems is not an easy task and requires a high degree of resources. Usually, ML projects start as a sandbox POC where data scientists import data, then working on cleansing and preparing it for model training. In most cases, you would also want the data cleansing part to be applied to all incoming data. At the same time, you would expect a model to run on data that was post-processed (or of the same correctness) as in a POC for results to be correct.
It is very important to have means of additionally adding cleaning efforts into the whole data ingestion pipeline (before it hits data scientists). The worst thing that might happen if decisions are made on data that is incorrect (dirty) is a loss of trust in decisions by humans. In these instances, even after fixing the issues on data, it becomes very difficult to convince people that from now on, the data will be correct.
The figure below gives an idea of how dirty or poorly selected sample set impact modeling results.
3. Data flexibility
When running systems in dynamic environments, the data that is ingested will change and evolve. Both the schema (fields appearing, changing, disappearing) and that the bias of the values in a field change over time. Usually, problems arise when different systems are tightly coupled via a canonical data model with multiple replications. This results in a very rigid data pipeline, and becomes very difficult to change the schema, add additional fields as changes are propagated to downstream systems. The issue prevents the usage of new data in models from improving decision making or providing additional harvesting insights.
The data ingestion pipeline needs to be flexible enough to ingest additional data or allow schema changes quickly. There should be an easy way to branch off part of the data for experimentation without impacting systems downstream. Creating efficient and flexible data ingestion requires commitment from the whole organization. While a difficult undertaking initially, it does benefit more than just ML people. The effort reshapes the whole infrastructure and data ingestion workflows and opens up capabilities of adding additional data sources, data use, re-usability of cleansing, and more.
4. Monitoring
In addition to standard monitoring for regular software in the ML system, you should have the capability to add metrics produced by your model. The efficiency values received during training should be monitored in the production environment, and the system should fire alerts if thresholds are reached. Neglecting to capture the information runs the risk of system degradation problems (like false predictions).
The decisions that the model makes typically are not intuitive and involve complex calculations. Verifying quickly by humans is nearly impossible. Most of the time, the data being ingested changes over time. While it might not be a structural change, it could change in value bias (e.g., people might like product 1 now, but after a few weeks, it morphs into product 2).
It is important to know quickly how the model is behaving. You should have a model performance monitoring capability that tells you if the decisions made are still effective enough as calculated in training. You also should monitor if the data that is coming in has similar properties that the model is trained for (i.e., a new value for a field appeared is unaccounted for in the model).
Monitoring and alerting capabilities along with data recording, sampling, and software utilizing model prediction should be able to record the outcomes easily (i.e., manual annotation of results).
Summing up capabilities
In general, a proper ML pipeline requires the ML model creation capability and a well-functioning software development delivery practices with advanced data processing capabilities. Before you dive into productizing ML models, identify the needs and your capabilities. Keep in mind that most of the required capabilities are serve the ML piece and also is beneficial for the whole organization.
Building user trust
An important factor to keep in mind is that the ML model results typically are consumed by people. The human element comes into play and is critical. As model predictions might not always be intuitive and verification of those decisions are not a trivial task, it is very easy to lose trust in your system (even though it performs by the numbers). Once the trust is lost, people tend to ignore decisions (be it good or bad), which might render your system useless.
In order to prevent degradation in trust from happening, there should be a clear understanding of how decisions are made, including:
Educating users
Providing core metrics to clarify decisions
Shadow mode (a new model runs in parallel with the old system and verifies decisions)
Canary releases (where a new model rolls out to a subset of people to help verify decisions)
Note: User trust and validation require additional product delivery capabilities.
Summing up
ML, on its own, does not solve the problem. Multiple capabilities working in tandem does. A lot of companies still struggle to have a proper product delivery life cycle, and ML brings even more to the table. The good news is that most of the capabilities needed by ML benefit your organization whether or not teams use ML. In addition to CI/CD, be sure to have proper capabilities in:
Data acquisition/abundance,
Data correctness,
Data flexibility,
Monitoring,
And user trust.
Having an arsenal of capabilities at your disposal allows you to quickly iterate on model creation and run them efficiently in production.