[Yandex Cloud documentation](../../index.md) > [Yandex MPP Analytics for PostgreSQL](../index.md) > [Concepts](overview.md) > Expanding a cluster

# Expanding a Yandex MPP Analytics for PostgreSQL cluster

You can [expand](../operations/cluster-expand.md) a cluster to add additional segment hosts to it.

The expansion procedure consists of a [preparation stage](#preparation) and a [data redistribution](#redistribution) stage. The data redistribution stage can be completed either right after the preparation stage or later on in the [background](#setting-delay-redistribution).

Each of these stages [may take a long time](#duration). You cannot influence the duration of the preparation stage, but you can influence that of the data redistribution stage, thus controlling the overall duration of the cluster expansion procedure.

## Preparation stage {#preparation}

At this stage, the following processes take place:

1. New segment hosts are added to the cluster.
1. The [gpexpand](https://techdocs.broadcom.com/us/en/vmware-tanzu/data-solutions/tanzu-greenplum/7/greenplum-database/utility_guide-ref-gpexpand.html) utility gears up for for table redistribution:

    1. Creates the `gpexpand` service data schema in the `postgres` database.

    1. Generates a table redistribution queue.

        All tables from all the cluster databases will be redistributed but they all will get different priorities affecting their positions in the queue.

        [You can manage their priorities](../operations/cluster-expand.md#table-priority) provided that data redistribution for a particular table [has not started yet](../operations/cluster-expand.md#redistribute-monitoring) and the cluster is not [closed from load](#setting-close-cluster).

    1. Prepares [partitioned tables](https://techdocs.broadcom.com/us/en/vmware-tanzu/data-solutions/tanzu-greenplum/7/greenplum-database/admin_guide-ddl-ddl-partition.html#about-the-table-partitioning-methods) for data redistribution.

The approximate duration of this stage is several hours; there is no way to influence it. For more information on how long the stages take, see [below](#duration).

{% note warning %}

Technically, new segment hosts will be added to the cluster already at this stage, but the expansion will be considered complete only after the data redistribution stage is over.

{% endnote %}

## Data redistribution stage {#redistribution}

At this stage, the following processes take place:

1. The cluster's table data is redistributed using the [gpexpand](https://techdocs.broadcom.com/us/en/vmware-tanzu/data-solutions/tanzu-greenplum/7/greenplum-database/utility_guide-ref-gpexpand.html) utility for even distribution across all segment hosts.

1. The `gpexpand` service data schema is deleted.

The approximate duration of this stage is several days. You can influence it using [settings](#settings). For more information on stage durations, see [below](#duration).

## Stage duration and duration control {#duration}

The approximate durations of the stages:

* Several hours for the [preparation stage](#preparation).
* Several days for the [data redistribution stage](#redistribution).

The actual duration of each stage depends not only on the size of the cluster databases and the total number of tables but also on the level and nature of the cluster load.

This is because the [gpexpand](https://techdocs.broadcom.com/us/en/vmware-tanzu/data-solutions/tanzu-greenplum/7/greenplum-database/utility_guide-ref-gpexpand.html) utility, which operates at every stage of cluster expansion, captures exclusive [locks](https://techdocs.broadcom.com/us/en/vmware-tanzu/data-solutions/tanzu-greenplum/7/greenplum-database/ref_guide-sql_commands-LOCK.html) at the individual table level. User requests may also capture locks when they are executed. This may considerably slow down both `gpexpand` and user request processing: it depends on which process captures the lock first and which one has to wait for the lock to be released. Both of these processes can generate increased load on the cluster.

You cannot shorten the preparation stage, but you can influence the duration of the data redistribution stage. To do this, before you run the procedure, [configure](../operations/cluster-expand.md) the [settings](#settings) that control the cluster's behavior at this stage. By combining settings, you can find the right balance between the speed of data redistribution and the speed of processing user requests.

As the data redistribution stage can potentially take a long time, there are tools for Yandex MPP Analytics for PostgreSQL clusters to [monitor](../operations/cluster-expand.md#redistribute-monitoring) the data redistribution process. Use these tools while cluster expansion is ongoing to get more accurate information about its progress and be able to estimate its completion time.

## Settings affecting data redistribution process {#redistribution}

The following settings are available:

* **Block cluster from load**{#setting-close-cluster} <code><b><small>Management console</small></b></code> <code><b><small>CLI</small></b></code> <code><b><small>API</small></b></code>

    If this setting is enabled (`true`), you cannot connect to the cluster and it does not receive new user requests. As a result, cluster expansion will run faster because you do not have to wait for releasing of locks that would otherwise be captured by incoming user requests.

    {% note warning %}
    
    If you close the cluster from load and disable [background data redistribution](expand.md#setting-delay-redistribution), you will lose access to the cluster until its expansion is complete.
    
    The expansion process can be [time-consuming](expand.md#duration).
    
    {% endnote %}

* **Background data redistribution**{#setting-delay-redistribution} <code><b><small>Management console</small></b></code> <code><b><small>CLI</small></b></code> <code><b><small>API</small></b></code>

    This setting affects the data redistribution strategy:

    * If the setting is disabled (`false`), data redistribution will start as soon as the cluster expansion preparation stage is over.

        The cluster will remain in the `Updating` status until all cluster expansion stages are completed.

        The data redistribution process will be run once and will continue either until all the cluster's tables are redistributed or [until the timeout expires](#setting-duration).

        If data redistribution terminates upon reaching the timeout with some tables remaining unprocessed, you will need to [redistribute those tables manually](../operations/cluster-expand.md#start-redistribute).

    * If the setting is enabled (`true`), data redistribution will be delayed.

        The cluster will remain in the `Updating` status only during preparation for cluster expansion.

        The data redistribution process will be run on a schedule during [routine maintenance operations](maintenance.md#regular-ops) until all tables are processed.

        When background data redistribution is enabled, routine maintenance operations are performed according to the following algorithm:

        1. [Custom table vacuuming](maintenance.md#custom-table-vacuum) (`VACUUM`).

        1. Data redistribution (`REDISTRIBUTE`):

            1. If all tables were processed before the [timeout expired](#setting-duration), the data redistribution process will be removed from the routine maintenance schedule and will not be started again.
            1. If only some of the tables were processed before the timeout expired, the process will be restarted during the next routine maintenance, and table processing will continue.

        1. [Collecting statistics](maintenance.md#get-statistics) (`ANALYZE`).

* **Redistribution timeout**{#setting-duration} <code><b><small>Management console</small></b></code> <code><b><small>CLI</small></b></code> <code><b><small>API</small></b></code>

    Timeout (in seconds) after which the data redistribution process will be interrupted.

    Reaching the timeout does not stop the process immediately: data redistribution will be completed for the tables currently being processed (`IN PROGRESS` status). You can request the [status of the tables](../operations/cluster-expand.md#redistribute-monitoring) if the cluster is not closed from load.
    
    The minimum value is `0`. The timeout will be calculated automatically depending on the cluster configuration and data size.
    
    The maximum value depends on whether background data redistribution is enabled:
    
    * If enabled, the maximum value is `28800` (eight hours).
    * If disabled, the maximum value is not limited.

* **Number of redistricting streams**{#setting-parallel} <code><b><small>Management console</small></b></code> <code><b><small>CLI</small></b></code> <code><b><small>API</small></b></code>

    Number of threads that will be started during the data redistribution process.

    Using more threads will speed up data redistribution but it will also increase the cluster load. 
    
    The minimum value (default) is `0` (the number will be calculated automatically depending on the cluster configuration and data size), and the maximum one is `25`.