Engineering Lab #1 — TEAM 3: “When PyTorch meets MLFlow” for Review Classification
Last week, Artyom Yushkovskyi, of Neu.ro’s MLOps engineering team joined us at the regular MLOps coffee sessions with MLOps.Community. His team recently successfully implemented a sentiment analysis solution for a large public dataset of restaurant reviews. Using an NLP approach, they were able to automate the classification of all such reviews as either positive or negative. Here are the key takeaways from this session (original article by MLOps community here).
Team 3 participants:
- Artem Yushkovsky (@artemlops): MLOps Engineer @ Neu.ro
- Paulo Maia (@paulomaia20): DS @ NILG.AI
- Dimitrios Mangonakis (@dmangonakis): MLE @ Big 4
- Laszlo Sragner (@xLaszlo): Founder @ Hypergolic
Project: Summarising What We Did
The initial task definition was quite open: each team was required to develop an ML solution using PyTorch for model training and MLflow for model tracking. The team members all had more or less deep knowledge in different areas of Machine Learning, from Data Science and underlying math, to infrastructure and ML tooling, including DS project management and enterprise system architecture. So, the most difficult problem for us was to choose a dataset 😜. At the end, we chose to use the Yelp Review dataset for training an NLP model for classifying the provided texts as either positive or negative reviews. The data included reviews on restaurants, museums, hospitals, etc., and the number of stars associated with each review (0–5). We modelled this task as a binary classification problem: determining whether the review was positive (>=3 stars) or negative (otherwise).
Figure 2. Metrics of Our Model in the MLflow Experiment Tracking UI
😎 From an MLOps perspective, there were several stages of the project’s evolution. First, we came up with a way of deploying the MLflow server on GCP and exposing it publicly. Also, we developed a nice Web UI where the user can write a review text and specify whether he or she considers this review to be positive or not, and then get the model’s response along with the statistics over all past requests. Having a Web UI talking to the model via REST API allowed us to decouple the front-end and back-end and parallelise development. Also, in order to decouple the logic of collecting model inference statistics in a database from the inference itself, we decided to implement a Model Proxy service with database access, and a Model Server exposing the model via a REST API. Thus, the Model Server could be seamlessly upgraded and replicated, if necessary. For the automatic model upgrade, we implemented another service called Model Operator, which constantly polls the state of the model registry in MLflow and, if the release model version has changed, it automatically re-deploys the Model Server.
😊 So, in the end we managed to build a pipeline with the following properties:
- partial reproducibility: a manually triggered model training pipeline running in a remote environment,
- model tracking: all model training metadata and artifacts are stored in an MLflow model registry deployed in GCP and exposed to the outside world,
- model serving: a horizontally scalable REST API microservice for model inference balanced by a REST API proxy microservice that stores and serves some inference metadata,
- automatic model deployment: the model server gets automatically re-deployed once the user changes a model’s tag in the MLflow model registry.
😢 Unfortunately, we didn’t have time to completely finish the model development cycle. Namely, we didn’t implement:
- immutable training environment: training docker image is built once and used always
- code versioning: we use of code as a snapshot, without involving a SVC
- data versioning: we use dataset snapshot,
- model lineage: can only be implemented if usingcode and data versioning,
- GitOps: automatically re-training the model once input has changed (code, data or parameters),
- model testing before deployment
- model monitoring and alerts (no hardware characteristics, health checks, data drift detection),
- fancy ML tools (hyperparameter tuning, model explainability tools, etc.),
- business logic features required for production (HTTPS, authentication & authorization, etc)
Technical takeaways
Though PyTorch (and Pytorch Lightning) is great and has tons of tutorials and examples, pickle for Deep Learning is still a pain. You need to dance around it for a while to save and load the model. We hope that the world will eventually come to a standardised solution with an easy UX for this process.
MLflow is an awesome tool for tracking your model development progress and storing model artifacts.
- It can be easily deployed in Kubernetes and has a nice minimalistic and intuitive interface.
- Though we couldn’t find any good solution for authentication and role-based access control, so this went out of the project scope.
- We also found MLflow Model Serving too difficult to run in a few hours, mostly because of the lack of clear documentation.
- In addition, we were surprised that we couldn’t find a solution for automatically deploying the model that gets the “Production” tag in MLflow UI. This could be a viable pattern, to deploy models directly from the MLflow Server dashboard, and could be a good addition to “MLflow core functionality”
Kubernetes is amazing! It’s terrifying at first, but terrific after a while. It enables you to deploy, scale, connect and persist your apps easily in a very clear and transparent way. However, we found it difficult to parametrize bare Kubernetes resource definitions (without using helm charts). We needed to pass a single or a few parameters to the yaml definition before applying it, and here are the ways we figured out how to tackle this problem:
- Pack the set of k8s configuration files into a Helm chart (or use alternatives of Helm like kubegen). This is a jedi way to manage complex deployments as it gives you full flexibility, but it takes time to implement.
- Use k8s resource ConfigMap to configure other resources. This approach is very easy to implement (just add a resource configuration), but is not flexible enough (for example, you can’t parametrize container images). However, we used it for parametrizing the Model Server configuration.
- Another, the most “dirty” way to solve this problem, is by using the envsubst utility. Briefly, you process your configuration yaml with a tool that syntactically replaces all instances of specified environment variables with their actual values (see example for Model Operator). Any other sed-like tool would work here as well.
Self-management takeaways
Looking back, we can say that our team suffered from a lack of communication: we started discussing system design without having a single call to meet each other and understand each other’s feedback and wishes; we didn’t define a clear MVP and didn’t have a common understanding of what the final goal was. Nevertheless, we have learned many important truths in collaboration and project planning, namely:
- Do not try to over-plan the project from the beginning (each step in the project plan at the beginning should cover a large piece of responsibility, rather than being too specific),
- Use an iterative approach (define a clear MVP and the steps to achieve it, and then distribute tasks among the team members),
- Respect project timing (avoid situations where you have to write code on the last night before the deadline). This is especially hard in teams working in their free time, after work!