Notes on Machine Learning Systems Design Course (CS 329s) Lecture 2

In this post, I would like to share some takeaways I had while I was studying the Lecture 2 of CS329S course . The topic of this lecture is to review the ML and Data Systems fundementals.

ML System as an Iterative Process

Deploying ML systems in production is an iterative process. I think this is quite obvious since in today’s world, most of the underlying business concepts and data is constantly evolving. As a result, the deployed ML system should be able cope with these changes by means of establishing robust data pipelines, using most relevant features, training a model in parallel with the selected business metric and monitoring the model in production for identifying any drift between training and production environments.

Choosing the right framing

It is very important to find a correct framing to the problem at hand. In some cases, a problem that looks like a classification task can be converted into a regression problem which can simplify the overall system development (with less training of course!).

For example, if we were to predict the next mobile app that will be downloaded by a user given the features, we can easily frame it as a classification problem where each available app will act as a class to be predicted. However, classification task could be costly to maintain because the model needs to be retrained whenever an app is introduced or removed. However, if the problem can be framed as a regression problem, then you can get a scalar output from the model that will measure how likely the user will download te app given the features. In this case, you won’t need to retrain your model for every change in app marketplace.

ML Objectives vs Business Objectives

It might not be obvious from start, but the objectives that you set for your ML solution might contradict with the business objectives set by the management. These contradictions can occur in different aspects such as you might aim for a fast inference time with SOTA deep learning based solution (requires powerful GPU clusters) but this can contradict with the cost reduction goals of the business. Another aspect can be the success of the ML solution might have adverse effects on other business metrics. As an example, a ML system that helps the users to find relevant products faster on the website can lead to decreased advertisement income due to users spending less time on the website. I think this fact is closely related to the first fact that you should continuously monitor your ML system in production and relevant business metrics in order to optimize your solution iteratively and if needed pivot the optimization objectives.

Rethinking multi objective optimization

In some cases, you might want to do multi objective optimization such as ranking the user posts by quality and engagement. While developing your ML solution, you can choose to optimize your ML model based on the combined loss of these objectives. However, if you would want to change the weights of the individual loss components, you will need to retrain your model. Alternatively, you can train two seperate models for each objective and adjust the weights of the model outputs according to your needs. Frankly, I am not very convinced with the alternative solution which is training/maintaining multiple models for each objective because it can be equally costly to maintain, monitor and evaluate multiple models once the objectives start to increase. I think it is still an open question to how to approach multi objective optimization problems in an effective manner.