[Yandex Cloud documentation](../../index.md) > [Yandex Compute Cloud](../index.md) > [Tutorials](index.md) > Microsoft products in Yandex Cloud > Creating an MLFlow server for logging experiments and artifacts

# Creating an MLFlow server for logging experiments and artifacts

This tutorial describes how to deploy an [MLFlow tracking server](https://mlflow.org/docs/latest/tracking.html) for logging experiments and artifacts on a separate [Yandex Compute Cloud](../index.md) VM. We will run experiments in JupyterLab Notebook. We will use a [Yandex Managed Service for PostgreSQL](../../managed-postgresql/index.md) database to store internal objects and a [Yandex Object Storage](../../storage/index.md) bucket to store artifacts.

To create an MLFlow server for logging JupyterLab Notebook experiments and artifacts:

1. [Set up your infrastructure](#infra).
1. [Create a static access key](#create-static-key).
1. [Create an SSH key pair](#create-ssh-keys).
1. [Create a VM](#create-vm).
1. [Create a managed DB](#create-db).
1. [Create a bucket](#create-bucket).
1. [Install the MLFlow tracking server and add it to the VM auto start](#setup-mlflow).
1. [Create secrets](#create-secrets).
1. [Train your model](#train-model).

If you no longer need the resources you created, [delete them](#clear-out).

## Getting started {#before-you-begin}

Before getting started, register in Yandex Cloud, set up a [community](../../datasphere/concepts/community.md), and link your [billing account](../../billing/concepts/billing-account.md) to it.
1. [On the DataSphere home page](https://datasphere.yandex.cloud), click **Try for free** and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
1. Select the [Yandex Identity Hub organization](../../organization/index.md) you are going to use in Yandex Cloud.
1. [Create a community](../../datasphere/operations/community/create.md).
1. [Link your billing account](../../datasphere/operations/community/link-ba.md) to the DataSphere community you are going to work in. Make sure you have a linked billing account and its [status](../../billing/concepts/billing-account-statuses.md) is `ACTIVE` or `TRIAL_ACTIVE`. If you do not have a billing account yet, create one in the DataSphere interface.


### Required paid resources {#paid-resources}

* Managed Service for PostgreSQL cluster: computing resources allocated to hosts, storage and backup size (see [Managed Service for PostgreSQL pricing](../../managed-postgresql/pricing.md)).
* VM instance: use of computing resources, storage, public IP address, and OS (see [Compute Cloud pricing](../pricing.md)).
* Object Storage bucket: use of storage, data operations (see [Object Storage pricing](../../storage/pricing.md)).
* DataSphere project: Use of computing resources and storage (see [DataSphere pricing](../../datasphere/pricing.md)).


## Set up your infrastructure {#infra}

Log in to the Yandex Cloud [management console](https://console.yandex.cloud) and select the organization you use to access DataSphere. On the [**Yandex Cloud Billing**](https://center.yandex.cloud/billing/accounts) page, make sure you have a billing account linked.

If you have an active billing account, you can go to the [cloud page](https://console.yandex.cloud/cloud) to create or select a folder to run your infrastructure.

{% note info %}

If you are using an [identity federation](../../organization/concepts/add-federation.md) to work with Yandex Cloud, you might not have access to billing details. In this case, contact your Yandex Cloud organization administrator.

{% endnote %}

### Create a folder {#create-folder}

{% list tabs group=instructions %}

- Management console {#console}

   1. In the [management console](https://console.yandex.cloud), select a cloud and click ![create](../../_assets/console-icons/plus.svg) **Create folder**.
   1. Name your folder, e.g., `data-folder`.
   1. Click **Create**.

{% endlist %}

### Create a service account for Object Storage {#create-sa}

To access a bucket in Object Storage, you will need a [service account](../../iam/concepts/users/service-accounts.md) with the `storage.viewer` and `storage.uploader` roles.

{% list tabs group=instructions %}

- Management console {#console}

   1. In the [management console](https://console.yandex.cloud), navigate to `data-folder`.
   1. [Go](../../console/operations/select-service.md#select-service) to **Identity and Access Management**.
   1. Click **Create service account**.
   1. Name your service account, e.g., `datasphere-sa`.
   1. Click **Add role** and assign the `storage.viewer` and `storage.uploader` roles to the service account.
   1. Click **Create**.

{% endlist %}

## Create a static access key {#create-static-key}

To access Object Storage from DataSphere, you will need a static key.

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), navigate to the folder the service account belongs to.
  1. [Go](../../console/operations/select-service.md#select-service) to **Identity and Access Management**.
  1. In the left-hand panel, select ![FaceRobot](../../_assets/console-icons/face-robot.svg) **Service accounts**.
  1. In the list that opens, select `datasphere-sa`.
  1. In the top panel, click ![](../../_assets/console-icons/plus.svg) **Create new key**.
  1. Select **Create static access key**.
  1. Specify the key description and click **Create**.
  1. Save the ID and secret key. After you close this dialog, the key value will no longer be available.

- CLI {#cli}

  1. Create an access key for the `datasphere-sa` service account:

     ```bash
     yc iam access-key create --service-account-name datasphere-sa
     ```

     Result:

     ```text
     access_key:
       id: aje6t3vsbj8l********
       service_account_id: ajepg0mjt06s********
       created_at: "2022-07-18T14:37:51Z"
       key_id: 0n8X6WY6S24N7Oj*****
     secret: JyTRFdqw8t1kh2-OJNz4JX5ZTz9Dj1rI9hx*****
     ```

  1. Save the `key_id` and `secret` values. You will not be able to get the secret key again.

- API {#api}

  To create an access key, use the [create](../../iam/awscompatibility/api-ref/AccessKey/create.md) method for the [AccessKey](../../iam/awscompatibility/api-ref/AccessKey/index.md) resource.

{% endlist %}

## Create an SSH key pair {#create-ssh-keys}

To connect to a [VM](../concepts/vm.md) over SSH, you will need a key pair: the public key resides on the VM, and the private one is kept by the user. This method is more secure than login and password authentication.

{% note info %}

SSH connections using a login and password are disabled by default on public Linux images that are provided by Yandex Cloud.

{% endnote %}

To create a key pair:

{% list tabs group=operating_system %}

- Linux/macOS {#linux-macos}

  1. Open the terminal.
  1. Use the `ssh-keygen` command to create a new key:
  
      ```bash
      ssh-keygen -t ed25519 -C "<optional_comment>"
      ```
  
      You can specify an empty string in the `-C` parameter to avoid adding a comment, or you may not specify the `-C` parameter at all: in this case, a default comment will be added.
  
      After running this command, you will be prompted to specify the name and path to the key files, as well as enter the password for the private key. If you only specify the name, the key pair will be created in the current directory. The public key will be saved in a file with the `.pub` extension, while the private key, in a file without extension.
  
      By default, the command prompts you to save the key under the `id_ed25519` name in the following directory: `/home/<username>/.ssh`. If there is already an SSH key named `id_ed25519` in this directory, you may accidentally overwrite it and lose access to the resources it is used in. Therefore, you may want to use unique names for all SSH keys.

- Windows 10/11 {#windows}

  If you do not have [OpenSSH](https://en.wikipedia.org/wiki/OpenSSH) installed yet, follow this [guide](https://learn.microsoft.com/en-us/windows-server/administration/openssh/openssh_install_firstuse?tabs=gui) to install it.
  
  1. Run `cmd.exe` or `powershell.exe` (make sure to update PowerShell before doing so).
  1. Use the `ssh-keygen` command to create a new key:
  
      ```shell
      ssh-keygen -t ed25519 -C "<optional_comment>"
      ```
  
      You can specify an empty string in the `-C` parameter to avoid adding a comment, or you may not specify the `-C` parameter at all: in this case, a default comment will be added.
  
      After running this command, you will be prompted to specify the name and path to the key files, as well as enter the password for the private key. If you only specify the name, the key pair will be created in the current directory. The public key will be saved in a file with the `.pub` extension, while the private key, in a file without extension.
  
      By default, the command prompts you to save the key under the `id_ed25519` name in the following folder: `C:\Users\<username>/.ssh`. If there is already an SSH key named `id_ed25519` in this directory, you may accidentally overwrite it and lose access to the resources it is used in. Therefore, you may want to use unique names for all SSH keys.

- Windows 7/8 {#windows7-8}

  Create keys using the PuTTY app:
  
  1. [Download](https://www.putty.org) and install PuTTY.
  1. Add the folder with PuTTY to the `PATH` variable:
  
      1. Click **Start** and type **Change system environment variables** in the Windows search bar.
      1. Click **Environment Variables...** at the bottom right.
      1. In the window that opens, find the `PATH` parameter and click **Edit**.
      1. Add your folder path to the list.
      1. Click **OK**.
  
  1. Launch the PuTTYgen app.
  1. Select **EdDSA** as the pair type to generate. Click **Generate** and move the cursor in the field above it until key creation is complete.
  
      ![ssh_generate_key](../../_assets/compute/ssh-putty/ssh_generate_key.png)
  
  1. In **Key passphrase**, enter a strong password. Enter it again in the field below.
  1. Click **Save private key** and save the private key. Do not share its key phrase with anyone.
  1. Click **Save public key** and save the public key to a file named `<key_name>.pub`.

{% endlist %}

## Create a VM {#create-vm}

{% list tabs group=instructions %}

- Management console {#console}

  1. On the [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) dashboard in the [management console](https://console.yandex.cloud), click **Create resource** and select `Virtual machine instance`.
  1. Under **Boot disk image**, in the **Product search** field, enter `Ubuntu 22.04` and select a public [Ubuntu 22.04](https://yandex.cloud/en/marketplace/products/yc/ubuntu-22-04-lts) image.
  1. Under **Location**, select the `ru-central1-a` [availability zone](../../overview/concepts/geo-scope.md).
  1. Under **Disks and file storages**, select the `SSD` [disk type](../concepts/disk.md#disks_types) and specify its size: `20 GB`.
  1. Under **Computing resources**, navigate to the `Custom` tab and specify the [platform](../concepts/vm-platforms.md), number of vCPUs, and amount of RAM:

      * **Platform**: `Intel Ice Lake`
      * **vCPU**: `2`
      * **Guaranteed vCPU performance**: `100%`
      * **RAM**: `4 GB`

  1. Under **Network settings**:

      * In the **Subnet** field, select the subnet specified in the DataSphere [project settings](../../datasphere/operations/projects/update.md). Make sure to [set up a NAT gateway](../../vpc/operations/create-nat-gateway.md) for the subnet.
      * In the **Public IP address** field, keep `Auto` to assign the VM a random external IP address from the Yandex Cloud pool or select a static address from the list if you reserved one.

  1. Under **Access**, select **SSH key** and specify the VM access credentials:

      * In the **Login** field, enter the username. Do not use `root` or other OS-reserved usernames. To perform operations requiring root privileges, use the `sudo` command.
      * In the **SSH key** field, select the SSH key saved in your [organization user](../../organization/concepts/membership.md) profile.
        
        If there are no SSH keys in your profile or you want to add a new key:
        
        1. Click **Add key**.
        1. Enter a name for the SSH key.
        1. Select one of the following:
        
            * `Enter manually`: Paste the contents of the public SSH key. You need to [create](../operations/vm-connect/ssh.md#creating-ssh-keys) an SSH key pair on your own.
            * `Load from file`: Upload the public part of the SSH key. You need to create an SSH key pair on your own.
            * `Generate key`: Automatically create an SSH key pair.
            
              When adding a new SSH key, an archive containing the key pair will be created and downloaded. In Linux or macOS-based operating systems, unpack the archive to the `/home/<user_name>/.ssh` directory. In Windows, unpack the archive to the `C:\Users\<user_name>/.ssh` directory. You do not need additionally enter the public key in the management console.
        
        1. Click **Add**.
        
        The system will add the SSH key to your organization user profile. If the organization has [disabled](../../organization/operations/os-login-access.md) the ability for users to add SSH keys to their profiles, the added public SSH key will only be saved in the user profile inside the newly created resource.

  1. Under **General information**, specify the VM name: `mlflow-vm`.
  1. Under **Additional**, select the `datasphere-sa` [service account](../../iam/concepts/users/service-accounts.md).
  1. Click **Create VM**.

{% endlist %}

## Create a managed DB {#create-db}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the folder where you want to create your database cluster.
  1. [Go](../../console/operations/select-service.md#select-service) to **Managed Service for&nbsp;PostgreSQL**.
  1. Click **Create cluster**.
  1. Name the cluster, e.g., `mlflow-bd`.
  1. Under **Host class**, select the `s3-c2-m8` configuration.
  1. Under **Storage size**, select `250 GB`.
  1. Under **Database**, enter your username and password. You will need these to establish a connection.
  1. Under **Hosts**, select the `ru-central1-a` availability zone.
  1. Click **Create cluster**.
  1. Go to the DB you created and click **Connect**.
  1. Save the host link from the `host` field: you will need it to establish a connection.

{% endlist %}

## Create a bucket {#create-bucket}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the folder where you want to create a bucket.
  1. [Go](../../console/operations/select-service.md#select-service) to **Object Storage**.
  1. At the top right, click **Create bucket**.
  1. In the ** Name** field, enter a name for the bucket, e.g., `mlflow-bucket`.
  1. In the **Read objects**, **Read object list**, and **Read settings** fields, select **With authorization**.
  1. Click **Create bucket**.
  1. To create a folder for MLflow artifacts, open the bucket you created and click **Create folder**.
  1. Enter a name for the folder, e.g., `artifacts`.

{% endlist %}

## Install the MLFlow tracking server and add it to the VM auto start {#setup-mlflow}

1. [Connect](../operations/vm-connect/ssh.md#vm-connect) to the VM over SSH.
1. Download the `Anaconda` distribution:

   ```bash
   curl --remote-name https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh
   ```

1. Run its installation:

   ```bash
   bash Anaconda3-2023.07-1-Linux-x86_64.sh
   ```

   Wait for the installation to complete and restart the shell.

1. Create an environment:

   ```bash
   conda create -n mlflow
   ```

1. Activate the environment:

   ```bash
   conda activate mlflow
   ```

1. Install the required packages by running these commands one by one:

   ```bash
   conda install -c conda-forge mlflow
   conda install -c anaconda boto3
   pip install psycopg2-binary
   pip install pandas
   ```

1. Create the environment variables for S3 access:

   * Open the file with the variables:

      ```bash
      sudo nano /etc/environment
      ```
   
   * Add these lines to the file while substituting the placeholders with your VM's internal IP address:

     ```bash
     MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net/
     MLFLOW_TRACKING_URI=http://<VM_internal_IP_address>:8000
     ```

1. Specify the data for the `boto3` library to access S3:

   * Create the `.aws` folder:

     ```bash
     mkdir ~/.aws
     ```

   * Create the `credentials` file:

     ```bash
     nano ~/.aws/credentials
     ```

   * Add these lines to the file while substituting the placeholders with the static key ID and value:

     ```bash
     [default]
     aws_access_key_id=<static_key_ID>
     aws_secret_access_key=<secret_key>
     ```

1. Run the MLFlow tracking server while substituting the placehoders with your cluster data:

   ```bash
   mlflow server --backend-store-uri postgresql://<username>:<password>@<host>:6432/db1?sslmode=verify-full --default-artifact-root s3://mlflow-bucket/artifacts -h 0.0.0.0 -p 8000
   ```

   You can check your connection to MLFlow at `http://<VM_public_IP_address>:8000`.

### Enable MLFlow autorun {#autorun}

For MLFlow to run automatically after the VM restarts, you need to convert it into a `Systemd` service.

1. Create directories for storing logs and error details:

   ```bash
   mkdir ~/mlflow_logs/
   mkdir ~/mlflow_errors/
   ```

1. Create the `mlflow-tracking.service` file:

   ```bash
   sudo nano /etc/systemd/system/mlflow-tracking.service
   ```

1. Add these lines to the file while substituting the placeholders with your data:

   ```bash
   [Unit]
   Description=MLflow Tracking Server
   After=network.target

   [Service]
   Environment=MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net/
   Restart=on-failure
   RestartSec=30
   StandardOutput=file:/home/<VM_user_name>/mlflow_logs/stdout.log
   StandardError=file:/home/<VM_username>/mlflow_errors/stderr.log
   User=<VM_username>
   ExecStart=/bin/bash -c 'PATH=/home/<VM_username>/anaconda3/envs/mlflow_env/bin/:$PATH exec mlflow server --backend-store-uri postgresql://<DB_user_name>:<password>@<host>:6432/db1?sslmode=verify-full --default-artifact-root s3://mlflow-bucket/artifacts -h 0.0.0.0 -p 8000'

   [Install]
   WantedBy=multi-user.target
   ```
   Where:

   * `<VM_user_name>`: VM account user name.
   * `<DB_user_name>`: User name specified when creating the database cluster.

1. Run the service and enable autoload at system startup:

   ```bash
   sudo systemctl daemon-reload
   sudo systemctl enable mlflow-tracking
   sudo systemctl start mlflow-tracking
   sudo systemctl status mlflow-tracking
   ```

## Create secrets {#create-secrets}

1. Select the project in your community or on the DataSphere [home page](https://datasphere.yandex.cloud) in the **Recent projects** tab.
1. Under **Project resources**, click ![secret](../../_assets/console-icons/shield-check.svg)**Secret**.
1. Click **Create**.
1. In the **Name** field, enter the name for the secret: `MLFLOW_S3_ENDPOINT_URL`.
1. In the **Value** field, paste the URL: `https://storage.yandexcloud.net/`.
1. Click **Create**.
1. Create three more secrets:
   * `MLFLOW_TRACKING_URI` with the `http://<VM_internal_IP_address>:8000` value.
   * `AWS_ACCESS_KEY_ID` with the static key ID.
   * `AWS_SECRET_ACCESS_KEY` with the static key value.

## Train your model {#train-model}

This tutorial features a set of data for predicting the quality of wine based on quantitative properties, such as acidity, pH, residual sugar, etc. To train your model, copy and paste the code into the notebook cells.

1. Open the DataSphere project:
   
   1. Select the project in your community or on the DataSphere [home page](https://datasphere.yandex.cloud) in the **Recent projects** tab.
   1. Click **Open project in JupyterLab** and wait for the loading to complete.
   1. Open the notebook tab.

1. Install the required modules:

    ```python
    %pip install mlflow
    ```

1. Import the required libraries:

    ```python
    import os
    import warnings
    import sys

    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNet
    from urllib.parse import urlparse
    import mlflow
    import mlflow.sklearn
    from mlflow.models import infer_signature
    import logging
    ```

1. Create an experiment in MLFlow:

    ```python
    mlflow.set_experiment("my_first_experiment")
    ```

1. Create a function for prediction quality assessment:

    ```python
    def eval_metrics(actual, pred):
      rmse = np.sqrt(mean_squared_error(actual, pred))
      mae = mean_absolute_error(actual, pred)
      r2 = r2_score(actual, pred)
      return rmse, mae, r2
    ```

1. Get your data ready, train your model, and register it in MLflow:

    ```python
    logging.basicConfig(level=logging.WARN)
    logger = logging.getLogger(__name__)

    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Uploading the dataset to assess wine quality
    csv_url = (
       "https://raw.githubusercontent.com/mlflow/mlflow/master/tests/datasets/winequality-red.csv"
    )
    try:
        data = pd.read_csv(csv_url, sep=";")
    except Exception as e:
        logger.exception(
            "Unable to download training & test CSV, check your internet connection. Error: %s", e
       )

    # Splitting the dataset into a training sample and a test sample
    train, test = train_test_split(data)

    # Allocating a target variable and variables used for prediction
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    alpha = 0.5
    l1_ratio = 0.5

    # Creating an `mlflow` run
    with mlflow.start_run():
      
       # Creating and training the ElasticNet model
       lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
       lr.fit(train_x, train_y)
      
       # Making quality predictions against the test sample
       predicted_qualities = lr.predict(test_x)

       (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

       print("Elasticnet model (alpha={:f}, l1_ratio={:f}):".format(alpha, l1_ratio))
       print("  RMSE: %s" % rmse)
       print("  MAE: %s" % mae)
       print("  R2: %s" % r2)

       # Logging data on hyperparameters and quality metrics in MLflow
       mlflow.log_param("alpha", alpha)
       mlflow.log_param("l1_ratio", l1_ratio)
       mlflow.log_metric("rmse", rmse)
       mlflow.log_metric("r2", r2)
       mlflow.log_metric("mae", mae)

       predictions = lr.predict(train_x)
       signature = infer_signature(train_x, predictions)

       tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

       # Registering the model in MLflow
       if tracking_url_type_store != "file":
         mlflow.sklearn.log_model(
               lr, "model", registered_model_name="ElasticnetWineModel", signature=signature
         )
       else:
          mlflow.sklearn.log_model(lr, "model", signature=signature)
    ```

    You can check the result at `http://<VM_public_IP_address>:8000`.

## How to delete the resources you created {#clear-out}

To stop paying for the resources you created:
* [Delete the VM](../operations/vm-control/vm-delete.md).
* [Delete the database cluster](../../managed-postgresql/operations/cluster-delete.md).
* [Delete the objects](../../storage/operations/objects/delete-all.md) from the bucket.
* [Delete the bucket](../../storage/operations/buckets/delete.md).
* [Delete the project](../../datasphere/operations/projects/delete.md).