Table of Contents |
---|
About MLFow
MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles four primary functions:
Tracking experiments to record and compare parameters and results (MLflow Tracking). Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production (MLflow Projects). Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models). Providing a central model store to collaboratively manage the full lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations (MLflow Model Registry). MLflow is library-agnostic. You can use it with any machine learning library, and in any programming language, since all functions are accessible through a REST API and CLI. For convenience, the project also includes a Python API, R API, and Java API.
(more details... https://mlflow.org/docs/latest/index.html)
...
Make Python3 default
Python2 and Python3 are already installed on Debian 10, but the default version is Python2. How to check it...
Code Block | ||
---|---|---|
|
...
python --version |
# That will return Python 2.x.x
Check for Python3
$ python3
Code Block | ||||
---|---|---|---|---|
| ||||
python3 --version |
# That will return Python 3.x.x
We have to make Python3 as the default version for the distro.
Firstly we have to find the directories for each version...
$ ls
Code Block | ||||
---|---|---|---|---|
| ||||
ls /usr/bin/python* -la |
In this example it returns a bunch of directories...
...
Now we must update/create the "Python alternatives list" using the above paths. $ sudo
Code Block | ||
---|---|---|
| ||
sudo update-alternatives --install /usr/bin/python python /usr/bin/python2.7 1 |
...
Code Block | ||
---|---|---|
| ||
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.7 2 |
Finally we can switch between Python versions. Running the following command, a menu will appear in order to choose the Python version that we want as default.$
Code Block |
---|
sudo update-alternatives --config python |
There are 2 choices for the alternative python (providing /usr/bin/python).
...
Choosing the selection 2 then Python3 will become the default version.
We could check if Python3 has applied as default, running...
Code Block | ||
---|---|---|
|
...
python --version |
# In my case it returns Python 3.7.3
...
Download and install Miniforge3
Miniforge is the community (conda-forge) driven minimalistic conda installer. Subsequent package installations come thus from conda-forge channel, and it started because miniconda didn't support aarch64,
But first check if there is the Download directory on your distro. If there is none, then create one...
$ mkdir
Code Block | ||
---|---|---|
| ||
mkdir ~/Download |
...
cd ~/Download |
...
sudo wget https://github.com/conda-forge/miniforge/releases/download/4.8.3-4/Miniforge3-Linux-aarch64.sh |
...
bash Miniforge3-Linux-aarch64.sh |
* Miniforge3 will be installed at this location: ~/miniforge3/bin/ and add it to your PATH variable.
In order to check the location of executable "conda" file, type the following...$ sudo find /
Code Block | ||
---|---|---|
| ||
sudo find / -type f -name "conda" |
For In my case it returned "~/miniforge3/bin/" and the variable PATH must be changed to...
Code Block | ||
---|---|---|
|
...
export PATH=$PATH:~/miniforge3/bin |
...
Install MLFlow using Conda
...
Code Block | ||
---|---|---|
| ||
conda install -c https://conda.anaconda.org/paulscherrerinstitute mflow |
...
conda update conda |
...
conda --version |
# In my case the version is 4.9.2
...
Create and activate a conda virtual environment and install MLFlow
A conda environment is a directory that contains a specific collection of conda packages that you have installed. For example, you may have one environment with NumPy 1.7 and its dependencies. The name of the environment is env_mlflow.
Use your Python's version to create an environment...
$ conda create
Code Block | ||
---|---|---|
| ||
conda create --name env_mlflow python=3.7.3 |
...
conda activate env_mlflow |
Reboot the machine and login again. If you are using a remote machine, reboot the remote instance and use ssh to login again.
After the reboot we can install MLFlow using pip3.
$ pip3 install
Code Block | ||
---|---|---|
| ||
pip3 install mlflow |
...
Install Sklearn using pip3 and other dependencies
We need GCC and G++ compilers as dependencies to build Sklearn.
Code Block | ||
---|---|---|
|
...
sudo apt-get install gcc |
...
sudo apt-get install g++ |
This app helps us to save the point of the process, when the connection drops. As a result, when we reconnect, we can continue from the point where our process was disrupted and not from the very beginning.
(optional) Install tmux.
It takes a lot of time to build Sklearn, and there is a high possibility the internet access will be lost. So, tmux helps us to continue the proccess from where it left off. $
Code Block |
---|
sudo apt-get install tmux |
$ tmux
# We open tmux environment Install sklearn To open tmux just type... tmux
Install Sklearn using pip3 and the dependencies. $ pip3 install sklearn
$ pip3 install Cython
$ pip3 install
Code Block | ||
---|---|---|
| ||
pip3 install sklearn pip3 install Cython pip3 install --upgrade setuptools |
...
Install Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+.$ conda install
Code Block | ||
---|---|---|
| ||
conda install -c conda-forge matplotlib |
In order to check the installation...
Code Block | ||
---|---|---|
|
...
pip3 install matplotlib |
# The return must be "Requirement already satisfied"
...
Install PostgreSQL
PostgreSQL, also known as Postgres, is a free and open-source relational database management system emphasizing emphasising extensibility and SQL compliance. $ sudo
Code Block | ||
---|---|---|
| ||
sudo apt-get install postgresql |
...
sudo apt-get install postgresql-contrib |
...
sudo apt-get install postgresql-server-dev-all |
...
Install Psycopg
Psycopg is the most popular PostgreSQL database adapter for the Python programming language. Its main features are the complete implementation of the Python DB API 2.0 specification and the thread safety (several threads can share the same connection).
$ pip3 install
Code Block | ||
---|---|---|
| ||
pip3 install psycopg2 |
...
Download and run MLFlow examples
Check if Git is already installed...
Code Block | ||
---|---|---|
|
...
git --version |
If the Git is not installed... $ sudo apt update
$ sudo apt install
Code Block | ||
---|---|---|
| ||
sudo apt update sudo apt install git |
Clone MLFlow example code
Code Block | ||
---|---|---|
|
...
cd ~ |
...
git clone https://github.com/mlflow/mlflow |
...
cd ~/mlflow/examples/sklearn_elasticnet_diabetes/linux |
Run the example "train_diabetes"
Code Block | ||
---|---|---|
|
...
python train_diabetes.py 0.1 0.9 |
...
python train_diabetes.py 0.5 0.5 |
...
python train_diabetes.py 0.9 0.1 |
# For each example above the return must be something like this... RMSE: 71.98302888908191
MAE: 60.5647520017933
R2: 0.2165516143465459
MLflow runs can be recorded either locally in files or remotely to a tracking server. By default, the MLflow Python API logs runs to files in an mlruns directory wherever you ran your program. You can then in the "mlruns" directory which is being created during the execution of the "python train_diabetes.py x.x x.x" in the directory of the code.
~/mlflow/examples/sklearn_elasticnet_diabetes/linux/mlruns
Code Block | ||
---|---|---|
| ||
$ ls ~/mlflow/examples/sklearn_elasticnet_diabetes/linux/ |
You will see the directory "mlruns" among the others.
In this directory has been created a numerical directory (0, 1, 2, e.t.c.) which has all the results of every execution. In the example we had 3 executions (python train_diabetes.py x.x x.x), so in this numerical directory has been created 3 encrypted directories that correspond to every single execution. Also there is a 4th file which is called "meta.yaml".
Code Block | ||
---|---|---|
| ||
ls ~/mlflow/examples/sklearn_elasticnet_diabetes/linux/mlruns |
It returns a number like this... 0
Code Block | ||
---|---|---|
| ||
ls ~/mlflow/examples/sklearn_elasticnet_diabetes/linux/mlruns/0 |
The return must be something like that...
1f8d0c25da6f4634a31f445a3e2fe987
9514eac210f447e2b2e0c1af7374dcbc
f54784481e294434918dfccd63a24d4d
meta.yaml
Change working directory
To change the default working directory (mlruns) to a custom of our choice, we have to add one more line in the source code. Go to the source file and edit it using a text editor.
For our example the file lives.$ nano ~/mlflow/examples/sklearn_elasticnet_diabetes/linux/train_diabetes.py
I use "nano" in order to modify the file. You can use any editor you want to modify it (vi is the default terminal editor).
Find the following lines...
import mlflow
import mlflow.sklearn
at the next line add this...
Code Block | ||
---|---|---|
| ||
mlflow.set_tracking_uri('file:myDirectory') |
The directory "myDirectory" will be created in the directory of the source file (~/mlflow/examples/sklearn_elasticnet_diabetes/linux/myDirectory
) when the source code will executed.
Code Block | ||
---|---|---|
| ||
ls ~/mlflow/examples/sklearn_elasticnet_diabetes/linux/myDirectory/0 |
# The return must be something like this (as the default working directory)...1f8d0c25da6f4634a31f445a3e2fe987
9514eac210f447e2b2e0c1af7374dcbc
f54784481e294434918dfccd63a24d4d
meta.yaml
Get the results on the Web Browser
In the directory of the source file that has been also created the directory "mlruns", we can run "mlflow ui" to see the logged runs.
$ mlflow ui
# results.
Code Block | ||
---|---|---|
| ||
cd ~/mlflow/examples/sklearn_elasticnet_diabetes/linux/
mlflow ui |
The return must be something like that...
2020-12-03 14:17:36 +0000] [1267] [INFO] Starting gunicorn 20.0.4
[2020-12-03 14:17:36 +0000] [1267] [INFO] Listening at: http://127.0.0.1:5000 (1267)
[2020-12-03 14:17:36 +0000] [1267] [INFO] Using worker: sync
[2020-12-03 14:17:36 +0000] [1270] [INFO] Booting worker with pid: 1270
NOTICE - When executing "mlflow ui" it looks for the directory "mlruns", if it does not exist then it creates an empty one (only with the file "meta.yaml").
For the custom working directory we have to run...
Code Block | ||
---|---|---|
| ||
mlflow ui --backend-store-uri file:myDirectory \
--default-artifact-root file:myDirectory \
--host 127.0.0.1 \
--port 5000 |
At the local Web browser URL bar type... 127.0.0.1:5000
NOTE - The command "mlflow ui" should be on run in order to have access at the 127.0.0.1:5000
SSH Tunnel Forwarding
When running "mlruns ui" you can notice the 2nd line of the return, there is the localhost's IP and the port (127.0.0.1:5000) which corresponds to the working directory ("mlruns" as default or anything else we have declared as custom). Now we can use them the IP and the Port to get the results.
Using a remote machine (OpenStack), the instances does do not have a Desktop Environment (GNOME, XFCE, e.t.c.) in order to see the results. But we can check if it has created the "index.html" file at http://127.0.0.1:5000...
$ cd ~/Download
$ wget 127. 0.0.1:5000
$ cat index.html But, using SSH Tunneling Tunnel Forward we can see the OpenStack's instance localhost results on the local Web browser.
Syntax: ssh -L port:localhost:port
userName@FloatingIPinstanceUserName@FloatingIP
Open a new window of the local terminal and type...
Example: ssh -L 5000:localhost127.0.0.1:5000 debian@214.152.131.68
NOTE - In this case the user name of Openstack instance is "debian" and the Floating IP 214.152.131.68
(We can replace the localhost with
Mozilla Firefox modification for the SSH Tunnel Forward
To modify the local Web browser. The following description take place for the Mozilla Firefox.Preferences -> General -> Scroll down to find "Network Settings" and click the "Settings" button -> "Configure Proxy Access to the Internet" click "No proxy" -> OK
At the local Web browser URL bar type... 127.0.0.1:5000
NOTE - The command "mlflow ui" should be on run in order to have access at the 127.0.0.1:5000
Understanding the files and folders of MLFlow directory by matching the elements of the GUI environment
For cases where there is no GUI for accessing the produced data (when "mlflow ui" cannot be executed). In this case we need to know the folders and the files that the graphical environment uses to extract the data and present them on the screen. This means that, we have to get them manually through their files.
The following image shows the correspondence between files/folders and GUI's elements.