Notes on Machine Learning Systems Design Course (CS 329s) Lecture 1

Recently, I have been following the ML Systems design course from Stanford University. I highly recommend following the course if you are interested in working with production level ML systems. A big thanks to instructor Chip Huyen for preparing this course.

In this post, I would like to share some takeaways I had while studying the Lecture 1 of the course . The topic of this lecture is understanding ML production.

Enterprise vs Consumer Applications

Enterprise ML applications seem to have more impact for companies compared to consumer applications. A really good example for this is given in the lecture notes is that if you improve a speech recognition system’s accuracy from 95% to 95.5%, the difference will be almost unnoticable for the users. However, if you can improve an efficiency of an enterprise process by just 0.1% it can make a huge difference for a company in terms of cost savings.

Model is just a small part of the system

While designing ML systems for production, it is important to know that the actual ML model is just a cog in the machine. There are various other parts that need to be in sync with the model so that the whole system can generate business value. Some of these parts are data pipelines, APIs, model monitoring etc. In a typical ML system development, 90% of the effort is spent on these parts other than the model itself.

Data is the key for current ML solutions

With the current ML algorithms we have at hand, the data used for training makes the biggest difference. That’s why it is important to leverage relevant data for the problem that you want to solve. In the future, we can come up with smarter algorithms/models that will decrease the dependency to the data ( e.g. causal models that can learn the underlying cause/effect relationships between events).

Fairness

It is important to emphasize that current ML models in production are not optimized for any type of fairness metrics. Thus, many people including ourselves could be affected by these biased algorithms that do credit risk scoring, resume analysis, visa application etc. Unfortunately, most of the companies do not tend to mitigate these kinds of bias and discrimination due to cost concerns.

ML Systems vs Traditional Software

ML system development cycle is considerably different than traditional Software Development cycle. In the traditional Software Development, the focus is on the code itself. However, in ML systems, both the code, model and the data should be tracked simultaneously. Currently, code versioning has reached to some level of maturity but model and data versioning is still in its early steps. In terms of data versioning, there are many things to consider such as how to assign a value to each data point, how to deal with data poisoning or how to decide adding/removing samples etc.