[Yandex Cloud documentation](../../index.md) > [Yandex Data Processing](../index.md) > [Step-by-step guides](index.md) > Yandex Data Processing subclusters > Updating subclusters

# Updating a subcluster

You can perform the following actions for any subcluster:

* [Change the number of hosts](#change-host-number).
* [Change the host class](#change-resource-preset).
* [Change the autoscaling rule for data processing subclusters](#change-autoscaling-rule).
* [Expand the storage](#change-disk-size).
* [Edit security groups](#change-sg-set).

You can also switch availability zones for subclusters. By doing so, you will be migrating them to a different availability zone along with the Yandex Data Processing cluster. This process depends on the cluster type:

* [Migrating a lightweight cluster to a different availability zone](migration-to-an-availability-zone.md).
* [Migrating an HDFS cluster to a different availability zone](../tutorials/hdfs-cluster-migration.md).

## Changing the number of hosts {#change-host-number}

You can change the number of hosts in data storage and processing subclusters:

{% list tabs group=instructions %}

- Management console {#console}

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Click the name of your cluster and select the **Subclusters** tab.
    1. Click ![image](../../_assets/console-icons/ellipsis.svg) for the subcluster you need and select **Edit**.
    1. Enter or select the required number of hosts in the **Hosts** field.
    1. Optionally, specify the [decommissioning](../concepts/decommission.md) timeout.
    1. Click **Save changes**.

    Yandex Data Processing will start adding hosts.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To change the number of hosts for a subcluster:

    1. View the description of the CLI command for updating a subcluster:

        ```bash
        yc dataproc subcluster update --help
        ```
    
    1. Run this command to update the subcluster, specifying the new number of hosts:

        ```bash
        yc dataproc subcluster update <subcluster_name_or_ID> \
           --cluster-name=<cluster_name> \
           --hosts-count=<number_of_hosts>
        ```

        You can get the subcluster ID and name with the [list of subclusters in the cluster](subclusters.md#list-subclusters), and the cluster name, with the [list of clusters in the folder](cluster-list.md#list).

- Terraform {#tf}

    1. Open the current Terraform configuration file with the infrastructure plan.

        To learn how to create this file, refer to [Creating a cluster](cluster-create.md).

    1. In the description of the Yandex Data Processing cluster, edit the `hosts_count` value under `subcluster_spec` for your data storage or data processing subcluster:

        ```hcl
        resource "yandex_dataproc_cluster" "data_cluster" {
          ...
          cluster_config {
            ...
            subcluster_spec {
              name        = "<subcluster_name>"
              ...
              hosts_count = <number_of_hosts_in_subcluster>
            }
          }
        }
        ```

{% endlist %}

## Changing the host class {#change-resource-preset}

{% note warning %}

Changing host properties through the Yandex Compute Cloud interfaces may result in host failure. To change the cluster host settings, use the Yandex Data Processing interfaces, such as the management console, CLI, Terraform, or API.

{% endnote %}

You can change the compute capacity of hosts in a specific subcluster. It depends on the driver deploy mode:

* In `deployMode=cluster` mode, when the driver runs on one of the cluster's `compute` hosts, a subcluster with the `master` host requires 4–8 CPU cores and 16 GB of RAM.
* In `deployMode=client` mode, when the driver run on the cluster's master host, the compute capacity depends on the job logic and the number of concurrent jobs.

For more information on driver deploy modes and computing resource usage, see [Resource allocation](../concepts/spark-sql.md#resource-management).

{% list tabs group=instructions %}

- Management console {#console}

    To change the [host class](../concepts/instance-types.md) for a subcluster:

    1. In the [management console](https://console.yandex.cloud), select the folder with the cluster whose subcluster you want to change.
    1. Go to **Yandex Data Processing** and select the cluster.
    1. Navigate to **Subclusters**.
    1. Click ![image](../../_assets/console-icons/ellipsis.svg) for the subcluster you need and select **Edit**.
    1. Select the platform and configuration under **Host class**.
    1. Optionally, specify the [decommissioning](../concepts/decommission.md) timeout.
    1. Click **Save changes**.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To change the [host class](../concepts/instance-types.md) for a subcluster:

    1. View the description of the CLI command for updating a subcluster:

        ```bash
        yc dataproc subcluster update --help
        ```

    1. Get the list of available host classes (the `ZONE IDS` column lists the availability zones you can select each class in):

        ```bash
        yc dataproc resource-preset list
        ```

        Result:

        ```text
        +-----------+--------------------------------+-------+----------+
        |    ID     |            ZONE IDS            | CORES |  MEMORY  |
        +-----------+--------------------------------+-------+----------+
        | b3-c1-m4  | ru-central1-a, ru-central1-b,  |     2 | 4.0 GB   |
        |           | ru-central1-c                  |       |          |
        | ...                                                           |
        +-----------+--------------------------------+-------+----------+
        ```

    1. Run this command to update the subcluster, specifying the new host class:

        ```bash
        yc dataproc subcluster update <subcluster_name_or_ID> \
           --cluster-name=<cluster_name> \
           --resource-preset=<host_class>
        ```

        You can request the subcluster name or ID with the [list of cluster subclusters](subclusters.md#list-subclusters), and the cluster name, with the [list of folder clusters](cluster-list.md#list).

- Terraform {#tf}

    1. Open the current Terraform configuration file with the infrastructure plan.

        To learn how to create this file, refer to [Creating a cluster](cluster-create.md).

    1. In the description of the Yandex Data Processing cluster, edit the `resource_preset_id` value under `subcluster_spec.resources` for your subcluster:

        ```hcl
        resource "yandex_dataproc_cluster" data_cluster {
          ...
          cluster_config {
            ...
            subcluster_spec {
              name = "<subcluster_name>"
              ...
              resources {
                resource_preset_id = "<subcluster_host_class>"
              ...
            }
          }
        }
        ```

    1. Make sure the settings are correct.

        1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
        1. Run this command:
        
           ```bash
           terraform validate
           ```
        
           Terraform will show any errors found in your configuration files.

    1. Confirm resource changes.

        1. Run this command to view the planned changes:
        
           ```bash
           terraform plan
           ```
        
           If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
        
        1. If everything looks correct, apply the changes:
           1. Run this command:
        
              ```bash
              terraform apply
              ```
        
           1. Confirm updating the resources.
           1. Wait for the operation to complete.

    For more information about the resources you can create with Terraform, see [this provider guide](../../terraform/resources/dataproc_cluster.md).

{% endlist %}

Yandex Data Processing will start updating the subcluster. Note that this will restart all hosts in the subcluster being updated.

## Changing the autoscaling rule for data processing subclusters {#change-autoscaling-rule}

You can configure the [autoscaling](../concepts/autoscaling.md) rule in data processing subclusters.

Make sure your cloud quota is sufficient to scale up the VMs. Open the [Quotas](https://console.yandex.cloud/cloud?section=quotas) page for your cloud and make sure the following **Compute Cloud** quotas are not fully used:

* **Total HDD capacity**
* **Total SSD capacity**
* **Number of disks**
* **Number of vCPUs for instances**
* **Number of instances**.

To enable autoscaling, [assign](../../iam/operations/sa/assign-role-for-sa.md) the following roles to the Yandex Data Processing cluster service account:

* [dataproc.agent](../security/index.md#dataproc-agent): To enable the service account to get info on cluster host states, [jobs](../concepts/jobs.md), and [log groups](../../logging/concepts/log-group.md).
* [dataproc.provisioner](../security/index.md#dataproc-provisioner): To enable the service account to work with an autoscaling instance group. This will enable [subcluster autoscaling](../concepts/autoscaling.md).
* [resource-manager.auditor](../../resource-manager/security/index.md#resource-manager-auditor) or higher for the folder where you want to create a Yandex Data Processing cluster: For connection to the cluster using [OS Login](../../organization/concepts/os-login.md).

{% note tip %}

To restrict the permissions of a cluster's service account (its IAM token is available when running jobs):

1. Specify a separate service account for cluster autoscaling when [creating](cluster-create.md) or [updating](cluster-update.md) the cluster via the Yandex Cloud CLI, Terraform, or API.
1. Assign the `dataproc.provisioner` role to this account only.

{% endnote %}

{% list tabs group=instructions %}

- Management console {#console}

    To configure autoscaling for subclusters:

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Select the cluster and open the **Subclusters** tab.
    1. Click ![horizontal-ellipsis](../../_assets/console-icons/ellipsis.svg) for the subcluster you need and select **Edit**.
    1. Under **Scaling**, enable **Autoscaling** if it is disabled.
    1. Configure the autoscaling settings.
    1. The default metric used for autoscaling is `yarn.cluster.containersPending`. To enable scaling based on CPU utilization, disable the **Default scaling** setting and specify the target CPU utilization level.
    1. Click **Save changes**.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To configure autoscaling for subclusters:

    1. View the description of the CLI command for updating a subcluster:

        ```bash
        yc dataproc subcluster update --help
        ```

    1. Run this command to update the subcluster, specifying the autoscaling settings:

        ```bash
        yc dataproc subcluster update <subcluster_name_or_ID> \
           --cluster-name=<cluster_name> \
           --hosts-count=<minimum_number_of_hosts> \
           --max-hosts-count=<maximum_number_of_hosts> \
           --enable-preemptible=<using_preemptible_VMs> \
           --warmup-duration=<VM_warmup_period> \
           --stabilization-duration=<stabilization_period> \
           --measurement-duration=<load_measurement_interval> \
           --cpu-utilization-target=<target_CPU_utilization> \
           --autoscaling-decommission-timeout=<decommissioning_timeout>
        ```

        Where:

        * `--hosts-count`: Minimum number of hosts (VMs) in a subcluster. The minimum value is `1`, and the maximum value is `32`.
        * `--max-hosts-count`: Maximum number of hosts (VMs) in a subcluster. The minimum value is `1`, and the maximum value is `100`.
        * `--enable-preemptible`: Specifies if [preemptible VMs](../../compute/concepts/preemptible-vm.md) are used. It can be either `true` or `false`.
        * `--warmup-duration`: Time required to warm up a VM, in `<value>s` format. The minimum value is `0s`, and the maximum value is `600s`.
        * `--stabilization-duration`: Period, in seconds, during which the required number of VMs cannot be decreased, in `<value>s` format. The minimum value is `60s` and the maximum value is `1800s`.
        * `--measurement-duration`: Period, in seconds, for which average utilization is calculated for each VM, in `<value>s` format. The minimum value is `60s` (1 minute), and the maximum value is `600s` (10 minutes).
        * `--cpu-utilization-target`: Target CPU utilization, in %. Use this setting to enable [scaling](../concepts/autoscaling.md) based on CPU utilization. Otherwise, `yarn.cluster.containersPending` will be used for scaling based on the number of pending resources. The minimum value is `10`, and the maximum value is `100`.
        * `--autoscaling-decommission-timeout`: [Decommissioning timeout](../concepts/decommission.md), in seconds. The minimum value is `0`, and the maximum value is `86400` (24 hours).

        You can get the subcluster ID and name with the [list of subclusters in the cluster](#list-subclusters), and the cluster name, with the [list of clusters in the folder](cluster-list.md#list).

- Terraform {#tf}

    To configure autoscaling for subclusters:

    1. Open the current Terraform configuration file with the infrastructure plan.

        To learn how to create this file, refer to [Creating a cluster](cluster-create.md).

    1. In the description of the Yandex Data Processing cluster, add a section named `subcluster_spec.autoscaling_config` with the required autoscaling settings for your subcluster:

        ```hcl
        resource "yandex_dataproc_cluster" "data_cluster" {
          ...
          cluster_config {
            ...
            subcluster_spec {
              name = "<subcluster_name>"
              role = "COMPUTENODE"
              ...
              autoscaling_config {
              max_hosts_count        = <maximum_number_of_VMs_in_group>
              measurement_duration   = <load_measurement_interval>
              warmup_duration        = <warmup_period>
              stabilization_duration = <stabilization_period>
              preemptible            = <use_of_preemptible_VMs>
              cpu_utilization_target = <target_CPU_utilization>
              decommission_timeout   = <decommissioning_timeout>
            }
          }
       ```

       Where:

       * `max_hosts_count`: Maximum number of hosts (VMs) in a subcluster. The minimum value is `1`, and the maximum value is `100`.
       * `measurement_duration`: Period, in seconds, for which average utilization is calculated for each VM, in `<value>s` format. The minimum value is `60s` (1 minute), and the maximum value is `600s` (10 minutes).
       * `warmup_duration`: Time required to warm up a VM, in `<value>s` format. The minimum value is `0s`, and the maximum value is `600s`.
       * `stabilization_duration`: Period, in seconds, during which the required number of VMs cannot be decreased, in `<value>s` format. The minimum value is `60s` and the maximum value is `1800s`.
       * `preemptible`: Indicates if [preemptible VMs](../../compute/concepts/preemptible-vm.md) are used. It can be either `true` or `false`.
       * `cpu_utilization_target`: Target CPU utilization, in %. Use this setting to enable [scaling](../concepts/autoscaling.md) based on CPU utilization. Otherwise, `yarn.cluster.containersPending` will be used for scaling based on the number of pending resources. The minimum value is `10`, and the maximum value is `100`.
       * `decommission_timeout`: [Decommissioning timeout](../concepts/decommission.md), in seconds. The minimum value is `0`, and the maximum value is `86400` (24 hours).

    1. Make sure the settings are correct.

        1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
        1. Run this command:
        
           ```bash
           terraform validate
           ```
        
           Terraform will show any errors found in your configuration files.

    1. Confirm resource changes.

        1. Run this command to view the planned changes:
        
           ```bash
           terraform plan
           ```
        
           If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
        
        1. If everything looks correct, apply the changes:
           1. Run this command:
        
              ```bash
              terraform apply
              ```
        
           1. Confirm updating the resources.
           1. Wait for the operation to complete.

    For more information about the resources you can create with Terraform, see [this provider guide](../../terraform/resources/dataproc_cluster.md).

{% endlist %}

## Expanding a storage {#change-disk-size}

{% note warning %}

Changing host properties through the Yandex Compute Cloud interfaces may result in host failure. To change the cluster host settings, use the Yandex Data Processing interfaces, such as the management console, CLI, Terraform, or API.

{% endnote %}

You can expand the storage allocated to each host in a specific subcluster.

{% note info %}

Currently, you cannot reduce the storage size. To do so, you must re-create the Yandex Data Processing subcluster.

{% endnote %}

Make sure the cloud quota is sufficient to increase the VM resources. Open the [Quotas](https://console.yandex.cloud/cloud?section=quotas) page for your cloud and make sure the following **Compute Cloud** quotas are not fully used:

* **Total HDD capacity**.
* **Total SSD capacity**.
* **Number of disks**.

{% list tabs group=instructions %}

- Management console {#console}

  To change the storage size for a subcluster:

    1. In the [management console](https://console.yandex.cloud), select the folder with the cluster whose subcluster you want to change.
    1. Navigate to **Yandex Data Processing** and select the cluster.
    1. Navigate to **Subclusters**.
    1. Click ![image](../../_assets/console-icons/ellipsis.svg) for the subcluster you need and select **Edit**.
    1. Enter or select the storage size you need under **Storage size**.
    1. Click **Save changes**.

    Yandex Data Processing will start updating the subcluster.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To change the storage size for a subcluster:

    1. View the description of the CLI command for updating a subcluster:

        ```bash
        yc dataproc subcluster update --help
        ```

    1. Run this command to update the subcluster, specifying the storage size you need:

        ```bash
        yc dataproc subcluster update <subcluster_name_or_ID> \
           --cluster-name=<cluster_name> \
           --disk-size=<storage_size_in_GB>
        ```

        You can request the subcluster name or ID with the [list of cluster subclusters](#list-subclusters), and the cluster name, with the [list of folder clusters](cluster-list.md#list).

    If all these conditions are met, Yandex Data Processing starts expanding the storage.

- Terraform {#tf}

    To expand the subcluster storage:

    1. Open the current Terraform configuration file with the infrastructure plan.

        To learn how to create this file, refer to [Creating a cluster](cluster-create.md).

    1. In the description of the Yandex Data Processing cluster, edit the `disk_size` value under `subcluster_spec.resources` for your subcluster:

        ```hcl
        resource "yandex_dataproc_cluster" "data_cluster" {
          ...
          cluster_config {
            ...
            subcluster_spec {
              name = "<subcluster_name>"
              ...
              resources {
                disk_size = <storage_size_in_GB>
                ...
              }
            }
          }
        }
        ```

    1. Make sure the settings are correct.

        1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
        1. Run this command:
        
           ```bash
           terraform validate
           ```
        
           Terraform will show any errors found in your configuration files.

    1. Confirm resource changes.

        1. Run this command to view the planned changes:
        
           ```bash
           terraform plan
           ```
        
           If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
        
        1. If everything looks correct, apply the changes:
           1. Run this command:
        
              ```bash
              terraform apply
              ```
        
           1. Confirm updating the resources.
           1. Wait for the operation to complete.

    For more information about the resources you can create with Terraform, see [this provider guide](../../terraform/resources/dataproc_cluster.md).

{% endlist %}

## Updating security groups {#change-sg-set}

{% list tabs group=instructions %}

- Management console {#console}

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Click the name of your cluster and select the **Hosts** tab.
    1. Click the host name.
    1. Under **Network**, click ![image](../../_assets/console-icons/ellipsis.svg) and select **Edit**.
    1. Select the security groups.
    1. Click **Save**.

- Terraform {#tf}

    1. Open the current Terraform configuration file with the infrastructure plan.

        To learn how to create this file, refer to [Creating a cluster](cluster-create.md).

    1. Edit the `security_group_ids` value in the cluster description:

        ```hcl
        resource "yandex_dataproc_cluster" "data_cluster" {
          ...
          security_group_ids = [ "<list_of_cluster_security_group_IDs>" ]
        }
        ```

    1. Make sure the settings are correct.

        1. In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
        1. Run this command:
        
           ```bash
           terraform validate
           ```
        
           Terraform will show any errors found in your configuration files.

    1. Confirm resource changes.

        1. Run this command to view the planned changes:
        
           ```bash
           terraform plan
           ```
        
           If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
        
        1. If everything looks correct, apply the changes:
           1. Run this command:
        
              ```bash
              terraform apply
              ```
        
           1. Confirm updating the resources.
           1. Wait for the operation to complete.

    For more information about the resources you can create with Terraform, see [this provider guide](../../terraform/resources/dataproc_cluster.md).

{% endlist %}

{% note warning %}

You may need to additionally [configure security groups](security-groups.md) to enable access to your cluster.

{% endnote %}