# Transferring data to a ClickHouse® target endpoint

Yandex Data Transfer enables you to migrate data to a ClickHouse® database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

1. [Explore possible data transfer scenarios](#scenarios).
1. [Configure one of the supported data sources](#supported-sources).
1. [Prepare the ClickHouse®](#prepare) database for the transfer.
1. [Configure the target endpoint](#endpoint-settings) in Yandex Data Transfer.
1. [Create](../../transfer.md#create) a transfer and [start](../../transfer.md#activate) it.
1. Perform required operations with the database and [control the transfer](../../monitoring.md).
1. In case of any issues, [use ready-made solutions](#troubleshooting) to resolve them.

## Scenarios for transferring data to ClickHouse® {#scenarios}

1. Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
    * [Migrating a ClickHouse® cluster](../../../tutorials/managed-clickhouse.md).
    * [Copying data from Managed Service for OpenSearch to Managed Service for ClickHouse® using Yandex Data Transfer](../../../tutorials/opensearch-to-clickhouse.md).

1. Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
    * [Delivering data from Apache Kafka® to ClickHouse®](../../../tutorials/mkf-to-mch.md).
    * [Delivering data from YDS to ClickHouse®](../../../tutorials/yds-to-clickhouse.md).

1. Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
    * [Loading Greenplum® data to ClickHouse®](../../../tutorials/greenplum-to-clickhouse.md).
    * [Loading MySQL® data to ClickHouse®](../../../tutorials/mysql-to-clickhouse.md).
    * [Loading Yandex Metrica data to ClickHouse®](../../../tutorials/metrika-to-clickhouse.md).
    * [Loading Yandex Direct data to ClickHouse®](../../../tutorials/direct-to-mch.md).

    * [Loading PostgreSQL data to ClickHouse®](../../../tutorials/rdbms-to-clickhouse.md).
    * [Loading data from Object Storage to ClickHouse®](../../../tutorials/object-storage-to-clickhouse.md).
    * [Loading data from YDB to the ClickHouse® data mart](../../../tutorials/ydb-to-clickhouse.md).

For a detailed description of possible Yandex Data Transfer scenarios, see [Tutorials](../../../tutorials/index.md).

## Configuring the data source {#supported-sources}

Configure one of the supported data sources:

* [PostgreSQL](../source/postgresql.md)
* [MySQL®](../source/mysql.md)
* [ClickHouse®](../source/clickhouse.md)
* [Greenplum®](../source/greenplum.md)
* [Apache Kafka®](../source/kafka.md)
* [Airbyte®](../../../transfer-matrix.md#airbyte)
* [Yandex Metrica](../source/metrika.md)
* [YDS](../source/data-streams.md)
* [Yandex Object Storage](../source/object-storage.md)
* [YTsaurus](../source/yt.md)
* [Oracle](../source/oracle.md)
* [OpenSearch](../source/opensearch.md)

For a complete list of supported sources and targets in Yandex Data Transfer, see [Available transfers](../../../transfer-matrix.md).

{% note info %}

ClickHouse® has date range restrictions. If the source database contains unsupported dates, this may result in an [error](#date-range) and stop the transfer.

{% endnote %}

## Preparing the target database {#prepare}

{% list tabs %}

- Managed Service for ClickHouse®

    1. [Create a target database](../../../../managed-clickhouse/operations/databases.md#add-db).

       If you need to transfer multiple databases, create a separate transfer for each one of them.

    1. [Create a user](../../../../managed-clickhouse/operations/cluster-users.md#adduser) with access to the target database.

       Once started, the transfer will connect to the target on behalf of this user.

    1. If [user management via SQL](../../../../managed-clickhouse/operations/cluster-users.md#sql-user-management) is enabled in the cluster, grant the new user the following permissions:

        ```sql
        GRANT CLUSTER ON *.* TO <username>
        GRANT SELECT, INSERT, ALTER DELETE, CREATE TABLE, CREATE VIEW, DROP TABLE, TRUNCATE, dictGet ON <DB_name>.* TO <username>
        GRANT SELECT(macro, substitution) ON system.macros TO <username>
        ``` 

       If user management via SQL is disabled, [permissions are assigned](../../../../managed-clickhouse/operations/cluster-users.md) via the management console and CLI.

    1. [Create a security group](../../../../vpc/operations/security-group-create.md) and [configure it](../../../../managed-clickhouse/operations/connect/index.md#configuring-security-groups).

    1. Assign the created security group to the Managed Service for ClickHouse® cluster.

- ClickHouse®

    1. If not planning to use [Cloud Interconnect](../../../../interconnect/concepts/index.md) or [VPN](https://en.wikipedia.org/wiki/Virtual_private_network) for connections to an external cluster, make such cluster accessible from the Internet from [IP addresses used by Data Transfer](../../../../overview/concepts/public-ips.md#virtual-private-cloud).
       
       For details on linking your network up with external resources, see [this concept](../../../concepts/network.md#source-external).

    1. Create a target database. Its name must be the same as the source database name. If you need to transfer multiple databases, create a separate transfer for each one of them.

    1. Create a user with access to the target database.

       Once started, the transfer will connect to the target on behalf of this user.

    1. Grant the new user the following permissions:

        ```sql
        GRANT CLUSTER ON *.* TO <username>
        GRANT SELECT, INSERT, ALTER DELETE, CREATE TABLE, CREATE VIEW, DROP TABLE, TRUNCATE, dictGet ON <DB_name>.* TO <username>
        GRANT SELECT(macro, substitution) ON system.macros TO <username>
        ```

{% endlist %}

## Configuring the ClickHouse® target endpoint {#endpoint-settings}

When [creating](../index.md#create) or [updating](../index.md#update) an endpoint, you can define:

* [Yandex Managed Service for ClickHouse® cluster](#managed-service) connection or [custom installation](#on-premise) settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
* [Additional parameters](#additional-settings).

See also the [endpoint setup recommendations](#recommended-settings-queue) if [ClickHouse® gets data from queues](#scenarios).

### Managed Service for ClickHouse® cluster {#managed-service}


{% note warning %}

To create or edit an endpoint of a managed database, you will need the [`managed-clickhouse.viewer`](../../../../managed-clickhouse/security.md#managed-clickhouse-viewer) role or the primitive [`viewer`](../../../../iam/roles-reference.md#viewer) role for the folder the cluster of this managed database resides in.

{% endnote %}


Connection to the database with the cluster specified in Yandex Cloud.

{% list tabs group=instructions %}

- Management console {#console}

    * **Connection type**: Select a database connection option:
    
      * **Self-managed**: Allows you to specify connection settings manually.
    
        Select **Managed Service for ClickHouse cluster** as the installation type and configure these settings:
    
        * **Managed cluster**: Select the cluster to connect to.
    
        * **Shard group**: Specify the shard group to transfer the data to. If this value is not set, data is transferred to all shards.
    
        * **Database**: Specify the name of the database in the selected cluster.
    
        * **User**: Specify the username Data Transfer will use to connect to the database.
    
        * **Password**: Enter the user password for access to the database.
    
      * **Connection Manager**: Allows connecting to the cluster via [Yandex Connection Manager](../../../../metadata-hub/quickstart/connection-manager.md):
    
        * Select the folder with the Managed Service for ClickHouse® cluster.
        * Select **Managed DB cluster** as the installation type and configure these settings:
    
          * **Cluster for Managed DB**: Select the cluster to connect to.
          * **Connection**: Select or create a connection in Connection Manager.
    
          * **Database**: Specify the name of the database in the selected cluster.
    
          * **Shard group**: Specify the shard group to transfer the data to. If this value is not set, the data will go to all shards.
    
        {% note warning %}
        
        To use a connection from Connection Manager, the user must have [access permissions](../../../../metadata-hub/operations/connection-access.md) for this connection of `connection-manager.user` or higher.
        
        {% endnote %}
    
    * **Security groups**: Select the cloud network to host the endpoint and security groups for network traffic. This will allow you to apply the specified security group rules to the VMs and clusters in the selected network without changing their settings. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
    
      Make sure the selected security groups are [configured](../../../../managed-clickhouse/operations/connect/index.md#configuring-security-groups).

- CLI {#cli}

    * Endpoint type: `clickhouse-target`.

    * `--cluster-id`: ID of the cluster you need to connect to.
    * `--cluster-name`: Shard group to transfer the data to. If this parameter is not set, data will go to all shards.
    * `--database`: Database name.
    * `--user`: Username that Data Transfer will use to connect to the database.
    * `--security-group`: Network traffic security groups whose rules apply to VMs and clusters without altering their configurations. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
    
        Make sure the specified security groups are [configured](../../../../managed-clickhouse/operations/connect/index.md#configuring-security-groups).
    
    
    * To set a user password to access the DB, use one of the following parameters:
    
        * `--raw-password`: Password as text.
        * `--password-file`: The path to the password file.

- Terraform {#tf}

    * Endpoint type: `clickhouse_target`.

    * `connection.connection_options.mdb_cluster_id`: ID of cluster to connect to.
    * `clickhouse_cluster_name`: Shard group to transfer the data to. If this parameter is not set, data will go to all shards.
    * `subnet_id`: ID of the [subnet](../../../../vpc/concepts/network.md#subnet) the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.
      
      If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
    * `security_groups`: [Security groups](../../../../vpc/concepts/security-groups.md) for network traffic.
      
      Security group rules apply to a transfer. They allow opening network access from the transfer VM to the cluster. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
      
      Security groups and the `subnet_id` subnet, if the latter is specified, must belong to the same network as the cluster.
      
      {% note info %}
      
      In Terraform, it is not required to specify a network for security groups.
      
      {% endnote %}
    
       Make sure the specified security groups are [configured](../../../../managed-clickhouse/operations/connect/index.md#configuring-security-groups).
    
    * `connection.connection_options.database`: Database name.
    * `connection.connection_options.user`: Username that Data Transfer will use to connect to the database.
    * `connection.connection_options.password.raw`: Password in text form.

    Here is an example of the configuration file structure:

    
    ```hcl
    resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
      name = "<endpoint_name>"
      settings {
        clickhouse_target {
          clickhouse_cluster_name="<shard_group>"
          security_groups = ["<list_of_security_group_IDs>"]
          subnet_id       = "<subnet_ID>"
          connection {
            connection_options {
              mdb_cluster_id = "<cluster_ID>"
              database       = "<name_of_database_to_migrate>"
              user           = "<username_for_connection>"
              password {
                raw = "<user_password>"
              }
            }
          }
          <additional_endpoint_settings>
        }
      }
    }
    ```


    For more information, see [this Terraform provider guide](../../../../terraform/resources/datatransfer_endpoint.md).

- API {#api}

    * `securityGroups`: Network traffic security groups whose rules apply to VMs and clusters without altering their configurations. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
    
       Make sure the specified security groups are [configured](../../../../managed-clickhouse/operations/connect/index.md#configuring-security-groups).
    
    * `mdbClusterId`: ID of the cluster you need to connect to.
    * `clickhouseClusterName`: Shard group to transfer the data to. If this parameter is not set, the data will go to all shards.
    * `database`: Database name.
    * `user`: Username that Data Transfer will use to connect to the database.
    * `password.raw`: Database user password (in text form).

{% endlist %}

### Custom installation {#on-premise}

Connection to the database with explicitly specified network addresses and ports.

{% list tabs group=instructions %}

- Management console {#console}

    * **Connection type**: Select a database connection option:
    
      * **Self-managed**: Allows you to specify connection settings manually.
    
        Select **Custom installation** as the installation type and configure these settings:
    
        * **Shards**
          
          * **Shard**: Specify a row that will allow the service to distinguish shards from each other. If sharding is disabled in your custom installation, specify any name.
          * **Hosts**: Specify FQDNs or IP addresses of the hosts in the shard.
        * **Cluster**: Specify the name of the cluster to transfer the data from. If this parameter is not set, the default cluster's data will be transferred (the `{cluster}` macro).
        * **HTTP port**: Set the number of the port that Data Transfer will use for the connection.
          
          When connecting via the HTTP port:
          
          * For optional fields, default values are used (if any).
          * Recording complex types is supported (such as `array` and `tuple`).
        * **Native port**: Set the number of the native port that Data Transfer will use for the connection.
        * **SSL**: Enable if the cluster supports only encrypted connections.
        * **CA certificate**: If transmitted data has to be be encrypted, e.g., to meet the PCI DSS, upload the [certificate](../../../../managed-clickhouse/operations/connect/index.md#get-ssl-cert) file or add its contents as text.
          
          {% note warning %}
          
          If no certificate is added, the transfer may [fail with an error](../../../troubleshooting/index.md#failed-to-connect).
          
          {% endnote %}
    
        * **Database**: Specify the name of the database in the selected cluster.
    
        * **Subnet ID**: Select or [create](../../../../vpc/operations/subnet-create.md) a subnet in the required [availability zone](../../../../overview/concepts/geo-scope.md). The transfer will use this subnet to access the cluster.
    
          If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
    
        * **User**: Specify the username Data Transfer will use to connect to the database.
        * **Password**: Enter the user password for access to the database.
    
      * **Connection Manager**: Allows connecting to the database using [Yandex Connection Manager](../../../../metadata-hub/quickstart/connection-manager.md):
    
        * Select the folder where the Connection Manager connection was created.
        * Select **Custom installation** as the installation type and configure these settings:
    
          * **Connection**: Select or create a connection in Connection Manager.
    
          * **Database**: Specify the name of the database in the selected cluster.
    
          * **Subnet ID**: Select or [create](../../../../vpc/operations/subnet-create.md) a subnet in the required [availability zone](../../../../overview/concepts/geo-scope.md). The transfer will use this subnet to access the cluster.
    
            If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
    
          * **Shard group**: Specify the shard group to transfer the data to. If this value is not set, the data will go to all shards.
    
        {% note warning %}
        
        To use a connection from Connection Manager, the user must have [access permissions](../../../../metadata-hub/operations/connection-access.md) for this connection of `connection-manager.user` or higher.
        
        {% endnote %}
    
    * **Security groups**: Select the cloud network to host the endpoint and security groups for network traffic.
    
      Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).

- CLI {#cli}

    * Endpoint type: `clickhouse-target`.

    * `--cluster-name`: Name of the cluster to transfer the data to.
    * `--host`: List of IP addresses or FQDNs of hosts to connect to, in `{shard_name}:{host_IP_address_or_FQDN}` format. If sharding is disabled in your custom installation, specify any shard name.
    * `http-port`: Port number Data Transfer will use for HTTP connections.
    * `native-port`: Port number Data Transfer will use for connections to the ClickHouse® native interface.
    * `--ca-certificate`: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements.
      
      {% note warning %}
      
      If no certificate is added, the transfer may [fail with an error](../../../troubleshooting/index.md#failed-to-connect).
      
      {% endnote %}
    * `--subnet-id`: ID of the subnet the host is in. The transfer will use that subnet to access the host.
    * `--database`: Database name.
    * `--user`: Username that Data Transfer will use to connect to the database.
    * `--security-group`: Network traffic security groups whose rules apply to VMs and clusters without altering their configurations. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
    
    * To set a user password to access the DB, use one of the following parameters:
    
        * `--raw-password`: Password as text.
        * `--password-file`: The path to the password file.

- Terraform {#tf}

    * Endpoint type: `clickhouse_target`.

    * Shard settings:
      
      * `connection.connection_options.on_premise.shards.name`: Shard name that the service will use to distinguish shards from each other. If sharding is disabled in your custom installation, specify any name.
      * `connection.connection_options.on_premise.shards.hosts`: specify the FQDNs or IP addresses of the hosts in the shard.
    * `connection.connection_options.on_premise.http_port`: Port number that Data Transfer will use for HTTP connections.
    * `connection.connection_options.on_premise.native_port`: Port number that Data Transfer will use for connections to the ClickHouse® native interface.
    * `connection.connection_options.on_premise.tls_mode.enabled.ca_certificate`: CA certificate if the data to transfer must be encrypted, e.g., to comply with the PCI DSS requirements.
      
      {% note warning %}
      
      If no certificate is added, the transfer may [fail with an error](../../../troubleshooting/index.md#failed-to-connect).
      
      {% endnote %}
    * `clickhouse_cluster_name`: Name of the cluster to transfer the data to.
    * `subnet_id`: ID of the [subnet](../../../../vpc/concepts/network.md#subnet) the cluster is in. The transfer will use this subnet to access the cluster. If the ID is not specified, the cluster must be accessible from the internet.
      
      If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
    * `security_groups`: [Security groups](../../../../vpc/concepts/security-groups.md) for network traffic.
      
      Security group rules apply to a transfer. They allow opening network access from the transfer VM to the database VM. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
      
      Security groups must belong to the same network as the `subnet_id` subnet, if the latter is specified.
      
      {% note info %}
      
      In Terraform, it is not required to specify a network for security groups.
      
      {% endnote %}
    * `connection.connection_options.database`: Database name.
    * `connection.connection_options.user`: Username that Data Transfer will use to connect to the database.
    * `connection.connection_options.password.raw`: Password in text form.

    Here is an example of the configuration file structure:

    
    ```hcl
    resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
      name = "<endpoint_name>"
      settings {
        clickhouse_target {
          clickhouse_cluster_name="<cluster_name>"
          security_groups = ["<list_of_security_group_IDs>"]
          subnet_id       = "<subnet_ID>"
          connection {
            connection_options {
              on_premise {
                http_port   = "<HTTP_connection_port>"
                native_port = "<port_for_native_interface_connection>"
                shards {
                  name  = "<shard_name>"
                  hosts = [ "list of IP addresses and FQDNs of shard hosts" ]
                }
                tls_mode {
                  enabled {
                    ca_certificate = "<certificate_in_PEM_format>"
                  }
                }
              }
              database = "<name_of_database_to_migrate>"
              user     = "<username_for_connection>"
              password {
                raw = "<user_password>"
              }
            }
          }
          <additional_endpoint_settings>
        }
      }
    }
    ```


    For more information, see [this Terraform provider guide](../../../../terraform/resources/datatransfer_endpoint.md).

- API {#api}

    * `onPremise`: Database connection parameters:
        * `shards`: Shard settings:
          
          * `name`: Shard name the service will use to distinguish shards one from another. If sharding is disabled in your custom installation, specify any name.
          * `hosts`: Specify FQDNs or IP addresses of the hosts in the shard.
        * `httpPort`: Port number Data Transfer will use for HTTP connections.
        * `nativePort`: Port number Data Transfer will use for connections to the ClickHouse® native interface.
        * `tlsMode`: Parameters for encrypting the data to transfer, if required, e.g., for compliance with the PCI DSS requirements.
            * `disabled`: Disabled.
            * `enabled`: Enabled.
                * `caCertificate`: CA certificate.
          
                  {% note warning %}
                  
                  If no certificate is added, the transfer may [fail with an error](../../../troubleshooting/index.md#failed-to-connect).
                  
                  {% endnote %}
        * `subnetId`: ID of the subnet the host is in. The transfer will use that subnet to access the host.
    * `clickhouseClusterName`: Name of the cluster to transfer the data to.
    * `securityGroups`: Network traffic security groups whose rules apply to VMs and clusters without altering their configurations. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).
    * `database`: Database name.
    * `user`: Username that Data Transfer will use to connect to the database.
    * `password.raw`: Database user password (in text form).

{% endlist %}

### Additional settings {#additional-settings}

{% list tabs group=instructions %}

- Management console {#console}

    * **Cleanup policy**: Select a way to clean up data in the target database before the transfer:
      
      * `Don't cleanup`: Select this option only for replication without data copying.
      
      * `Drop`: Completely delete the tables included in the transfer (default).
      
         Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
      
      * `Truncate`: Delete only the data from the tables included in the transfer but keep the schema.
      
         Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.

    * **Sharding settings**: Specify the settings for [sharding](../../../../managed-clickhouse/concepts/sharding.md):
      
      * **No sharding**: No sharding is used.
      
      * **Sharding by column value**: Name of the table column to shard the data by. Uniform distribution between shards will depend on the hash of this column value. Specify the name of the column to be sharded in the appropriate field.
      
          For sharding by specific column values, specify them in the **Mapping** field. This field defines the mapping between the column and shard index values (the sequential number of the shard in the name-sorted list of shards), to enable sharding by specific data values.
      
      * **Sharding by transfer ID**: Data will be distributed between shards based on the transfer ID value. The transfer will ignore the **Mapping** setting and will only shard the data based on the transfer ID.
      
      * **Uniform random sharding**: Data will be randomly distributed between shards. Each shard will contain approximately the same amount of data.
      
        {% note warning %}
      
        For transfers from ClickHouse® to ClickHouse®, sharding is not supported.
      
        The possible workaround is to create a distributed table in the target cluster and perform a transfer to this table by selecting the `Truncate` or `Don't cleanup` cleanup policy.
      
        {% endnote %}

    * **Advanced settings**: Configure advanced settings:
      
      * **Rename tables**: Specify the settings for renaming tables during a transfer, if required.
      
      * **Flush interval**: Specify the delay with which the data should arrive at the target cluster. Increase the value in this field if ClickHouse® fails to merge data parts.
      
      * **Interpret failed updates as series of upserts**: Select this option for transfers from Managed Service for YDB with the `UPDATES` [changefeed](https://ydb.tech/docs/en/concepts/cdc) mode and for `Increment Only` transfers that move events among the documents that are not in the ClickHouse® target. Use it only if you have to because inserting `UPSERT` sequentially can reduce performance.
      
      * **Disable schema migration**: Select to prevent changes to the target data schema when the source schema is modified. By default, when the source schema is modified, the transfer will update the target schema accordingly: create new tables, add new columns, add new enumerated values and enumerated types. By default, changes like deleting tables and columns are not applied.

- CLI {#cli}

    * `--alt-name`: Rules for renaming the source database tables when transferring them to the target database. The values are specified in `<source_table_name>:<target_table_name>` format.

    * Data [sharding](../../../../managed-clickhouse/concepts/sharding.md) settings:

        * `--shard-by-column-hash`: Name of the table column to shard the data by. Uniform distribution between shards will depend on the hash of this column value.

        * `--custom-sharding-column-name`: Name of the table column to shard the data by. Data sharding is based on the column values specified by the `--custom-sharding-mapping-string` setting.

        * `--custom-sharding-mapping-string`: Mapping of the values from the column specified in the `--custom-sharding-column-name` setting and shards. The setting values are specified in `<colimn_value>:<shard_name>` format.

        * `--shard-by-transfer-id`: Data will be distributed between shards based on the transfer ID value. The parameter contains no value.

        You can only specify one of the sharding options:

        * `--shard-by-column-hash`
        * `--custom-sharding-column-name` and `--custom-sharding-mapping-string`
        * `--shard-by-transfer-id`

- Terraform {#tf}

    * `cleanup_policy`: Way to clean up data in the target database before the transfer:
      
      * `CLICKHOUSE_CLEANUP_POLICY_DISABLED`: Do not clean up (default).
      
         Select this option only for replication without data copying.
      
      * `CLICKHOUSE_CLEANUP_POLICY_DROP`: Completely delete the tables included in the transfer.
      
         Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
      
      * `CLICKHOUSE_CLEANUP_POLICY_TRUNCATE`: Delete only the data from the tables included in the transfer but keep the schema.
      
         Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.

    * `alt_names`: Rules for renaming the source database tables when transferring them to the target database.
      
      * `from_name`: Source table name.
      * `to_name`: Target table name.

    * Data [sharding](../../../../managed-clickhouse/concepts/sharding.md) settings:

        * `sharding.column_value_hash.column_name`: Name of the table column to shard the data by. Uniform distribution between shards will depend on the hash of this column value.

        * `sharding.transfer_id`: Data is distributed between shards based on the transfer ID value. The `transfer_id` section contains no parameters.

        * `sharding.custom_mapping`: Sharding by column value:
          
          * `column_name`: Name of the table column to shard the data by.
          
          * `mapping`: Mapping of column values and shards:
          
             * `column_value.string_value`: Column value.
             * `shard_name`: Shard name.

        * `sharding.round_robin`: Data will be randomly distributed between shards. Each shard will contain approximately the same amount of data. The `round_robin` section contains no parameters.

        You can only specify one of the sharding options: `sharding.column_value_hash.column_name`, `sharding.transfer_id`, `sharding.custom_mapping`, or `sharding.round_robin`. If no sharding option is specified, all data will be transferred to a single shard.

    * `is_schema_migration_disabled`: Set to `true` to prevent changes to the target data schema when the source schema is modified. By default, when the source schema is modified, the transfer will update the target schema accordingly: create new tables, add new columns, add new enumerated values and enumerated types. By default, changes like deleting tables and columns are not applied.

- API {#api}

    * `altNames`: Rules for renaming the source database tables when transferring them to the target database.
      
      * `fromName`: Source table name.
      * `toName`: Target table name.

    * `cleanupPolicy`: Way to clean up data in the target database before the transfer:
      
      * `CLICKHOUSE_CLEANUP_POLICY_DISABLED`: Do not clean up (default).
      
         Select this option only for replication without data copying.
      
      * `CLICKHOUSE_CLEANUP_POLICY_DROP`: Completely delete the tables included in the transfer.
      
         Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
      
      * `CLICKHOUSE_CLEANUP_POLICY_TRUNCATE`: Delete only the data from the tables included in the transfer but keep the schema.
      
         Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.

    * `sharding`: Settings for data [sharding](../../../../managed-clickhouse/concepts/sharding.md). You can only specify one of the sharding options:
      
      * `columnValueHash.columnName`: Name of the table column to shard the data by. Uniform distribution between shards will depend on the hash of this column value.
      
      * `customMapping`: Sharding by column value:
      
         * `columnName`: Name of the table column to shard the data by.
      
         * `mapping`: Mapping of column values and shards:
      
            * `columnValue.stringValue`: Column value.
            * `shardName`: Shard name.
      
      * `transferId`: Data will be distributed between shards based on the transfer ID value. The parameter contains no value.
      
      * `roundRobin`: Data will be randomly distributed between shards. Each shard will contain approximately the same amount of data. The parameter contains no value.
      
      If no sharding option is specified, all data will be transferred to a single shard.

{% endlist %}

After configuring the data source and target, [create and start the transfer](../../transfer.md#create).

## Endpoint setup recommendations {#recommended-settings-queue}

To accelerate the delivery of large volumes of data to ClickHouse® from queues associated withData Streams or Managed Service for Apache Kafka®, configure endpoints as follows:

{% list tabs group=instructions %}

- Management console {#console}

    * If the target ClickHouse® cluster has sharding enabled and the data is migrated into a sharded table, write the data into an [ underlying](../../../../managed-clickhouse/tutorials/sharding.md) table based on the `ReplicatedMergeTree` engine, not a distributed table (`Distributed` engine). In the target, select the migrated data from the distributed table. To redefine the write table, specify it in the target settings: **Rename tables** → **Target table name**.
    * If in the source you selected JSON in **Conversion rules** → **Data format**, then you should specify `UTF-8` instead of `STRING` for string types in the data schema.
    * If you select **Add a column for missing keys**, your data transfers may slow down.
    * If you need to migrate multiple topics, in the **Rename tables** target setting, specify the same ClickHouse® table name for all topic names of the source.

- CLI {#cli}

    * If the target ClickHouse® cluster has sharding enabled and the data is migrated into a sharded table, write the data into an [underlying](../../../../managed-clickhouse/tutorials/sharding.md) table based on the `ReplicatedMergeTree` engine, not a distributed table (`Distributed` engine). In the target, select the migrated data from the distributed table. To redefine the write table, specify it in the `--alt-name` setting for the target.
    * If you need to migrate multiple topics, in the `--alt-name` attribute of the target endpoint, specify the same target ClickHouse® table name for all topics of the source.

- Terraform {#tf}

    * If the target ClickHouse® cluster has sharding enabled and the data is migrated into a sharded table, write the data into an [ underlying](../../../../managed-clickhouse/tutorials/sharding.md) table based on the `ReplicatedMergeTree` engine, not a distributed table (`Distributed` engine). In the target, select the migrated data from the distributed table. To redefine the write table, specify it in the `alt_names.to_name` setting for the target.
    * If in the source you selected JSON in `parser.json_parser`:
      * You should specify `UTF-8` instead of `STRING` for string types in the `parser.json_parser.data_schema` data schema.
      * The `parser.json_parser.add_rest_column=true` attribute may slow down your transfer.
    * If you need to migrate multiple topics, in the `alt_names` attribute of the target endpoint, specify the same ClickHouse® table name in `alt_names.to_name` for all topics in `alt_names.from_name`.

- API {#api}

    * If the target ClickHouse® cluster has sharding enabled and the data is migrated into a sharded table, write the data into an [ underlying](../../../../managed-clickhouse/tutorials/sharding.md) table based on the `ReplicatedMergeTree` engine, not a distributed table (`Distributed` engine). In the target, select the migrated data from the distributed table. To redefine the write table, specify it in the `altNames.toName` setting for the target.
    * If in the source you selected JSON in `parser.jsonParser`:
      * You should specify `UTF-8` instead of `STRING` for string types in the `parser.jsonParser.dataSchema` data schema.
      * The `parser.jsonParser.addRestColumn=true` parameter may slow down your transfer.
    * If you need to migrate multiple topics, in the `altNames` parameter of the target endpoint, specify the same ClickHouse® table name in `altNames.toName` for all topics in `altNames.fromName`.

{% endlist %}

## Troubleshooting data transfer issues {#troubleshooting}

* [New tables cannot be added](#no-new-tables).
* [Data is not transferred](#no-transfer).
* [Unsupported date range](#date-range).
* [Lack of resources or increasing data latency](#pod-restarted).
* [Data blocks limit exceeded](#partition-blocks).

For more troubleshooting tips, see [Troubleshooting](../../../troubleshooting/index.md).

### New tables cannot be added {#no-new-tables}

​No new tables are added to _**Snapshot and increment**_ transfers.

**Solution**:

1. Create tables in the target database manually. For the transfer to work, do the following when creating a table:

    1. Add the transfer service fields to it:

        ```sql
        __data_transfer_commit_time timestamp,
        __data_transfer_delete_time timestamp
        ```

    1. Use `ReplacingMergeTree`:
        
        ```sql
        ENGINE = ReplacingMergeTree
        ```

1. [Create](../../transfer.md#create) a separate transfer of the _**Snapshot and increment**_ type and add only new tables to the list of objects to transfer. Deactivating the original _**Snapshot and increment**_ transfer is not required. [Activate](../../transfer.md#activate) the new transfer, and once it switches to the **Replicating** status, [deactivate](../../transfer.md#deactivate) it.

   To add other tables, put them into the list of objects to transfer in the created separate transfer (replacing other objects in that list), reactivate it, and, once it switches to the **Replicating** status, deactivate it.

   {% note info %}

   Since two transfers were simultaneously migrating data, you will have duplicate records in the new tables on the target. Run the `SELECT * from TABLE <table_name> FINAL` query to hide duplicate records or `OPTIMIZE TABLE <table_name>` to delete them.

   {% endnote %}

### Unsupported date range {#date-range}

If the migrated data contains dates outside the supported ranges, ClickHouse® returns the following error:

```text
TYPE_ERROR [target]: failed to run (abstract1 source): failed to push items from 0 to 1 in batch:
Push failed: failed to push 1 rows to ClickHouse shard 0:
ClickHouse Push failed: Unable to exec changeItem: clickhouse:
dateTime <field_name> must be between 1900-01-01 00:00:00 and 2262-04-11 23:47:16
```

Supported date ranges in ClickHouse®:

* For the `DateTime64` type fields: 1900-01-01 to 2299-12-31. For more information, see [this ClickHouse® guide](https://clickhouse.com/docs/enen/sql-reference/data-types/datetime64).
* For the `DateTime` type fields: 1970-01-01 to 2106-02-07. For more information, see [this ClickHouse® guide](https://clickhouse.com/docs/enen/sql-reference/data-types/datetime).

**Solution**: use one of the following options:

* Convert all dates in the source DB to a range supported by ClickHouse®.
* In the [source endpoint parameters](../index.md#update), exclude the table with incorrect dates from the transfer.
* In the [transfer parameters](../../transfer.md#update), specify the [Convert values to string](../../../concepts/data-transformation.md#convert-to-string) transformer. This will change the field type during the transfer.

### Lack of resources or increasing data latency {#pod-restarted}

You may encounter the following problems when migrating data to a ClickHouse® target:

1. Transfer fails with an error. Error message:

    ```text
    pod instance restarted
    ```

1. [Transfer state monitoring](../../monitoring.md) indicates an increasing data latency (a time difference between when the records appear in the target and when they appear in the source).

Possible cause:

The write interval specified in the target endpoint settings is too large, which leads to the lack of RAM (OOM) on the transfer VM.

**Solution**:

In the management console, set the value of the **Flush interval** target endpoint setting to 10 seconds or less.

In addition, if your transfer type is **Snapshot**, [reactivate](../../transfer.md#activate) it. Transfers of the other types will restart automatically.

### Data blocks limit exceeded {#partition-blocks}

When migrating data to a ClickHouse® target, the transfer is interrupted due to an error. Error message:

```text
ERROR Unable to Activate ... 
unable to upload tables: unable to upload data objects: unable upload part <table name> (): 
unable to start \*clickhouse.HTTPSource event source: failed to push events to destination: 
unable to push http batch: <table name>: failed: INSERT INTO ...
```

Additionally, you can also get this error:

```text
pod instance restarted
```

Errors occur when you try to insert more data blocks than allowed in the `max_partitions_per_insert_block` setting to the ClickHouse® target.

**Solution**: Increase the `max_partitions_per_insert_block` parameter for the account the transfer uses to connect to the target. For the Managed Service for ClickHouse® target, you can change this parameter in [user settings](../../../../managed-clickhouse/concepts/settings-list.md#setting-partitions-per-insert-block). For a ClickHouse® custom installation, you can create a settings profile and assign it to the account:

```sql
CREATE SETTINGS PROFILE max_partitions
SETTINGS max_partitions_per_insert_block = <setting_value>

ALTER USER <username> PROFILE 'max_partitions'
```

_ClickHouse® is a registered trademark of [ClickHouse, Inc](https://clickhouse.com)._