How to Train Your Model

Introduction #

In this tutorial, we describe the recommended way to train a simple machine learning model on Neuro Platform. As our ML engineers prefer PyTorch over other ML frameworks, we show the training and evaluation of one of the basic PyTorch examples.

We assume that you have already signed up to the platform, installed Neuro CLI, and logged into the platform (see Getting started).

We base our example on the Classifying Names with a Character-Level RNN tutorial.

Initializing a new project #

To simplify working with Neuro Platform and to help to establish the best practices in the ML environment, we provide a project template. This template consists of the recommended directories and files. It is designed to operate smoothly with our base environment.

Let’s initialize a new project from this template:

neuro project init

This command asks several questions about your project:

full_name [Your name]: Mariya Davydova email [Your email address (e.g. you@example.com)]: mariya.davydova@neuromation.io project_name [Name of the project]: Neuro Tutorial project_slug [neuro-tutorial]: project_short_description [A short description of the project]: A simple tutorial. code_directory [modules]: rnn Select license: 1 - BSD 2-Clause License 2 - BSD 3-Clause License 3 - MIT license 4 - ISC license 5 - Apache Software License 2.0 6 - no Choose from 1, 2, 3, 4, 5, 6 (1, 2, 3, 4, 5, 6) [1]: 3

Project structure #

After you execute the command mentioned above, you get the following structure:

neuro-tutorial ├── data <- a directory for the local copy of data; we do not keep it under source control ├── notebooks <- a directory for Jupyter notebooks ├── rnn <- a directory for the models' code ├── .gitignore <- a file, which contains most popular git ignores for Python projects ├── LICENSE <- a file, which contains a chosen license ├── Makefile <- a file, which contains a list of useful targets, linking a local project, the platform storage, and the training environment ├── README.md <- a file, which contains the information about this project structure as well as development instructions ├── apt.txt <- a file, which contains the system packages to be installed in the training environment ├── requirements.txt <- a file, which contains the pip packages to be installed in the training environment ├── setup.cfg <- a file, which contains the lint settings └── setup.py <- a file, which contains the information about this Python project

The directories are mapped as follows:

Mount PointDescriptionStorage URI
/project/data/Datastorage:neuro-tutorial/data/
/project/rnn/Python modulesstorage:neuro-tutorial/rnn/
/project/notebooks/Jupyter notebooksstorage:neuro-tutorial/notebooks/
/project/results/Logs and resultsstorage:neuro-tutorial/results/

If you are not satisfied with the directories' names, you can always rename local folders and update corresponding variables on the top of Makefile.

Filling in the gaps #

Now we need to fill this template with content:

  • Copy the model source in your rnn folder.
  • Replace requirements.txt in your project root folder with this file.
  • Download data from here, extract ZIP’s content and put it in your data folder.

Training and evaluating the model #

When you start working with a project on Neuro Platform, the basic flow looks as follows: you set up the remote environment, upload data and code to your storage, run training, and evaluate the results.

To set up the remote environment, run

make setup

To upload data and code to your storage, run

make upload

To run training, as you may expect, you need to run make training, but before that you need to update the training target in Makefile:

  • open Makefile in editor,
  • find the following line:
TRAINING_COMMAND?='echo "Replace this placeholder with a training script execution"'
  • and replace it with the following line:
TRAINING_COMMAND?="bash -c 'cd $(PROJECT_PATH_ENV) && python -u $(CODE_PATH)/char_rnn_classification_tutorial.py'"

Now, you can run

make training

and observe the output. You will see how some checks are made at the beginning of the script, and then the model is being trained and evaluated:

['data/names/German.txt', 'data/names/Polish.txt', 'data/names/Irish.txt', 'data/names/Vietnamese.txt', 'data/names/French.txt', 'data/names/Japanese.txt', 'data/names/Spanish.txt', 'data/names/Chinese.txt', 'data/names/Korean.txt', 'data/names/Czech.txt', 'data/names/Arabic.txt', 'data/names/Portuguese.txt', 'data/names/English.txt', 'data/names/Italian.txt', 'data/names/Russian.txt', 'data/names/Dutch.txt', 'data/names/Scottish.txt', 'data/names/Greek.txt'] Slusarski ['Abandonato', 'Abatangelo', 'Abatantuono', 'Abate', 'Abategiovanni'] tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) torch.Size([5, 1, 57]) tensor([[-2.8248, -2.9118, -2.8999, -2.9170, -2.8916, -2.9699, -2.8785, -2.9273, -2.8397, -2.8539, -2.8764, -2.9278, -2.8638, -2.9310, -2.9546, -2.9008, -2.8295, -2.8441]], grad_fn=<LogSoftmaxBackward>) ('German', 0) category = Vietnamese / line = Vu category = Chinese / line = Che category = Scottish / line = Fraser category = Arabic / line = Abadi category = Russian / line = Adabash category = Vietnamese / line = Cao category = Greek / line = Horiatis category = Portuguese / line = Pinho category = Vietnamese / line = To category = Scottish / line = Mcintosh 5000 5% (0m 19s) 2.7360 Ho / Portuguese ✗ (Vietnamese) 10000 10% (0m 38s) 2.0606 Anderson / Russian ✗ (Scottish) 15000 15% (0m 58s) 3.5110 Marqueringh / Russian ✗ (Dutch) 20000 20% (1m 17s) 3.6223 Talambum / Arabic ✗ (Russian) 25000 25% (1m 35s) 2.9651 Jollenbeck / Dutch ✗ (German) 30000 30% (1m 54s) 0.9014 Finnegan / Irish ✓ 35000 35% (2m 13s) 0.8603 Taverna / Italian ✓ 40000 40% (2m 32s) 0.1065 Vysokosov / Russian ✓ 45000 45% (2m 52s) 3.6136 Blanxart / French ✗ (Spanish) 50000 50% (3m 11s) 0.0969 Bellincioni / Italian ✓ 55000 55% (3m 30s) 3.1383 Roosa / Spanish ✗ (Dutch) 60000 60% (3m 49s) 0.6585 O'Kane / Irish ✓ 65000 65% (4m 8s) 4.7300 Satorie / French ✗ (Czech) 70000 70% (4m 27s) 0.9765 Mueller / German ✓ 75000 75% (4m 46s) 0.7882 Attia / Arabic ✓ 80000 80% (5m 5s) 2.1131 Till / Irish ✗ (Czech) 85000 85% (5m 25s) 0.5304 Wei / Chinese ✓ 90000 90% (5m 44s) 1.6258 Newman / Polish ✗ (English) 95000 95% (6m 2s) 3.2015 Eberhardt / Irish ✗ (German) 100000 100% (6m 21s) 0.2639 Vamvakidis / Greek ✓ > Dovesky (-0.77) Czech (-1.11) Russian (-2.03) English > Jackson (-0.92) English (-1.65) Czech (-1.85) Scottish > Satoshi (-1.32) Italian (-1.81) Arabic (-2.14) Japanese