[Yandex Cloud documentation](../../../../index.md) > [Yandex Data Transfer](../../../index.md) > [Step-by-step guides](../../index.md) > [Configuring endpoints](../index.md) > OpenSearch > Source

# Transferring data from an OpenSearch source endpoint

Yandex Data Transfer enables you to migrate search and analytics data from an OpenSearch database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:

1. [Explore possible data transfer scenarios](#scenarios).
1. [Prepare the OpenSearch](#prepare) database for the transfer.
1. [Set up a source endpoint](#endpoint-settings) in Yandex Data Transfer.
1. [Set up one of the supported data targets](#supported-targets).
1. [Create](../../transfer.md#create) a transfer and [start](../../transfer.md#activate) it.
1. Perform required operations with the database and [control the transfer](../../monitoring.md).
1. In case of any issues, [use ready-made solutions](#troubleshooting) to resolve them.

## Scenarios for transferring data from OpenSearch {#scenarios}

Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.

* [Migrating a OpenSearch cluster](../../../tutorials/os-to-mos.md).
* [Loading data from OpenSearch to Object Storage](../../../tutorials/opensearch-to-object-storage.md).
* [Migration with change of storage from OpenSearch to YDB](../../../tutorials/opensearch-to-ydb.md).
* [Migration with change of storage from OpenSearch to Greenplum®](../../../tutorials/opensearch-to-greenplum.md).

* [Copying data from Managed Service for OpenSearch to Managed Service for ClickHouse® using Yandex Data Transfer](../../../tutorials/opensearch-to-clickhouse.md)

For a detailed description of possible Yandex Data Transfer scenarios, see [Tutorials](../../../tutorials/index.md).

## Preparing the source database {#prepare}

{% list tabs %}

- OpenSearch

  
  If not planning to use [Cloud Interconnect](../../../../interconnect/concepts/index.md) or [VPN](https://en.wikipedia.org/wiki/Virtual_private_network) for connections to an external cluster, make such cluster accessible from the Internet from [IP addresses used by Data Transfer](../../../../overview/concepts/public-ips.md#virtual-private-cloud).
  
  For details on linking your network up with external resources, see [this concept](../../../concepts/network.md#source-external).


{% endlist %}

## Configuring the OpenSearch source endpoint {#endpoint-settings}

When [creating](../index.md#create) or [updating](../index.md#update) an endpoint, you can define:

* [Yandex Managed Service for OpenSearch cluster](#managed-service) connection or [custom installation](#on-premise) settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
* [Additional parameters](#additional-settings).


### Managed Service for OpenSearch cluster {#managed-service}


{% note warning %}

To create or edit an endpoint of a managed database, you will need the [`managed-opensearch.viewer`](../../../../managed-opensearch/security/index.md#mos-viewer) role or the primitive [`viewer`](../../../../iam/roles-reference.md#viewer) role for the folder the cluster of this managed database resides in.

{% endnote %}


Connection with the cluster specified in Yandex Cloud.

{% list tabs group=instructions %}

- Management console {#console}

    * **Connection type**: Select a cluster connection option:
    
      * **Self-managed**: Allows you to specify connection settings manually.
    
        Select **Managed Service for OpenSearch cluster** as the installation type and configure these settings:
    
        * **Managed Service for OpenSearch cluster**: Select the cluster to connect to.
        * **User**: Specify the username Data Transfer will use to connect to the cluster.
        * **Password**: Enter the user password to the cluster.
    
      * **Connection Manager**: Allows connecting to the cluster via [Yandex Connection Manager](../../../../metadata-hub/quickstart/connection-manager.md):
    
        * Select the folder with the Managed Service for OpenSearch cluster.
        * Select **Managed DB cluster** as the installation type and configure these settings:
    
          * **Cluster for Managed DB**: Select the cluster to connect to.
          * **Connection**: Select or create a connection in Connection Manager.
    
        {% note warning %}
        
        To use a connection from Connection Manager, the user must have [access permissions](../../../../metadata-hub/operations/connection-access.md) for this connection of `connection-manager.user` or higher.
        
        {% endnote %}
    
    * **Security groups**: Select the cloud network to host the endpoint and security groups for network traffic.
    
      Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).

{% endlist %}

### Custom installation {#on-premise}

Connection to nodes with explicitly specified network addresses and ports.

{% list tabs group=instructions %}

- Management console {#console}

    * **Connection type**: Select a database connection option:
    
        * **Self-managed**: Allows you to specify connection settings manually.
    
            Select **Custom installation** as the installation type and configure these settings:
    
            * **Data nodes**: Click ![image](../../../../_assets/console-icons/plus.svg) to add a new data node. For each node, specify:
    
              * **Host**: IP address or FQDN of the host with the `DATA` role you need to connect to.
              * **Port**: Port number Data Transfer will use to connect to the host with the `DATA` role.
    
            * **SSL**: Select this option if a secure SSL connection is used.
    
            * **CA certificate**: Upload the [certificate](../../../../managed-opensearch/operations/connect/index.md#ssl-certificate) file or add its contents as text if you need to encrypt the data to transfer, e.g., for compliance with the PCI DSS requirements.
              
              
              {% note warning %}
              
              If no certificate is added, the transfer may [fail with an error](../../../troubleshooting/index.md#failed-to-connect).
              
              {% endnote %}
    
            * **Subnet ID**: Select or [create](../../../../vpc/operations/subnet-create.md) a subnet in the required [availability zone](../../../../overview/concepts/geo-scope.md). The transfer will use this subnet to access the database.
    
              If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
    
              If you do not specify a subnet, you may get an [error](../../../../managed-opensearch/qa/index.md#data-transfer-error) when activating the transfer.
    
            * **User**: Specify the username Data Transfer will use to connect to the database.
    
            * **Password**: Enter the user password for access to the database.
    
        * **Connection Manager**: Allows connecting to the database using [Yandex Connection Manager](../../../../metadata-hub/quickstart/connection-manager.md):
    
            * Select the folder where the Connection Manager connection was created.
            * Select **Custom installation** as the installation type and configure these settings:
    
              * **Connection**: Select or create a connection in Connection Manager.
              * 
                **Subnet ID**: Select or [create](../../../../vpc/operations/subnet-create.md) a subnet in the required [availability zone](../../../../overview/concepts/geo-scope.md). The transfer will use this subnet to access the database.
    
    
                If this field has a value specified for both endpoints, both subnets must be hosted in the same availability zone.
    
                If you do not specify a subnet, you may get an [error](../../../../managed-opensearch/qa/index.md#data-transfer-error) when activating the transfer.
    
          {% note warning %}
          
          To use a connection from Connection Manager, the user must have [access permissions](../../../../metadata-hub/operations/connection-access.md) for this connection of `connection-manager.user` or higher.
          
          {% endnote %}
    
    * **Security groups**: Select the cloud network to host the endpoint and security groups for network traffic.
    
      This will allow you to apply the specified security group rules to VMs and DBs in the selected network without reconfiguring these VMs and DBs. For more information, see [Networking in Yandex Data Transfer](../../../concepts/network.md).

{% endlist %}

### Additional settings {#additional-settings}

{% list tabs group=instructions %}

- Management console {#console}

    * **Dump an index with type mapping**: Select this option to move data types from a source to a target before a transfer is started. If the option is disabled and no index schema is set on the target, data types on the target will be identified automatically during a transfer.

    {% note warning %}
    
    If a source index includes data types that are not supported on the target, enabling this option may cause a transfer run error. In this case, disable the option and create an index schema on the target manually.
    
    {% endnote %}

{% endlist %}


## Configuring the data target {#supported-targets}

Configure the target endpoint:

* [OpenSearch](../target/opensearch.md)
* [ClickHouse®](../target/clickhouse.md)
* [Greenplum®](../target/greenplum.md)
* [Yandex Managed Service for YDB](../target/yandex-database.md)
* [Yandex Object Storage](../target/object-storage.md)
* [Apache Kafka®](../target/kafka.md)
* [YDS](../target/data-streams.md)

For a complete list of supported sources and targets in Yandex Data Transfer, see [Available transfers](../../../transfer-matrix.md).

After configuring the data source and target, [create and start the transfer](../../transfer.md#create).

## Troubleshooting data transfer issues {#troubleshooting}

* [Transfer interrupted with an error message](#ambiguous-resolution-es)
* [Document duplication on the target](#duplication)
* [Exceeding the limit on the maximum number of fields](#exceeding-fields-limit)
* [Transfer failure with the `mapper_parsing_exception` error](#data-types)
* [`SSL is required` error](#ssl-required)
* [No tables found](#no-tables)

For more troubleshooting tips, see [Troubleshooting](../../../troubleshooting/index.md).

### Transfer failure {#ambiguous-resolution-os}

Error messages:

```text
object field starting or ending with a [.] makes object resolution ambiguous <field_description>

Index -1 out of bounds for length 0
```

The transfer is aborted because the keys in the documents being transferred are not valid for the OpenSearch target. Invalid keys are empty keys and keys that:

* Consist of spaces.
* Consist of periods.
* Have a period at the beginning or end.
* Have two or more periods in a row.
* Include periods separated by spaces.

**Solution:**

In the [target endpoint additional settings](../target/opensearch.md#additional-settings), enable **Sanitize documents keys** and [reactivate](../../transfer.md#activate) the transfer.

### Document duplication on the target {#duplication}

When repeatedly transferring data, documents get duplicated on the target.

All documents transferred from the same source table end up under the same index named `<schemaName.tableName>` on the target. In which case the target automatically generates document IDs (`_id`) by default. As a result, identical documents get different IDs and get duplicated.

There is no duplication if the primary keys are specified in the source table or endpoint conversion rules. Document IDs are then generated at the transfer stage using the primary key values.

Generation is performed as follows:

1. If the key value contains a period (`.`), it is escaped with `\`: `some.key` --> `some\.key`.
1. All the primary key values are converted into a string: `<some_key1>.<some_key2>.<...>`.
1. The resulting string is converted by the [url.QueryEscape](https://pkg.go.dev/net/url#QueryEscape) function.
1. If the resulting string does not exceed 512 characters in length, it is used as the `_id`. If longer than 512 characters, it is hashed with [SHA-1](https://datatracker.ietf.org/doc/html/rfc3174), and the resulting hash is used as the `_id`.

As a result, documents with the same primary keys will receive the same ID when the data is transferred again, and the document transferred last will overwrite the existing one.

**Solution:**

1. Set the primary key for one or more columns in the source table or in the endpoint conversion rules.
1. [Run](../../transfer.md#activate) the transfer.

### Exceeding the maximum number of fields limit {#exceeding-fields-limit}

Error message:

```text
Limit of total fields [<limit_value>] has been exceeded
```

The transfer will be interrupted if the number of columns in the source database exceeds the maximum number of fields in the target database OpenSearch indexes.

**Solution:** [Increase](../target/opensearch.md#prepare) the maximum field number in the target database using the `index.mapping.total_fields.limit` parameter.

### Transfer failure with the mapper_parsing_exception error {#data-types}

Error message:

```text
mapper_parsing_exception failed to parse field [details.tags] of type [text]
```

The transfer is aborted due to incompatible data types at source and target.

**Solution:** Move the data to a new OpenSearch index with the `details` field type changed to `flat_object`.

1. Deactivate the transfer.

1. Create a new index in OpenSearch:

    ```bash
    curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/<new_index_name>/_settings' \
    --data '{"index.mapping.total_fields.limit": 2000}'
    ```

1. Change the `details` field type:

    ```bash
    curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request PUT 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/<new_index_name>/_mapping' \
    --data '
        {
            "properties": {
                "details": {
                    "type": "flat_object"
                }
            }
        }'
    ```    

1. Move the data from the source index to the new one:

    ```bash
    curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request POST 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_reindex' \
    --data '
        {
        "source":{
            "index":"<source_index_name>"
        },
        "dest":{
            "index":"<new_index_name>"
        }
        }'
    ```

1. Delete the source index:

    ```bash
    curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request DELETE 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/<source_index_name>'
    ```

1. Assign an alias to the new index:

    ```bash
    curl \
    --user <OpenSearch_username>:<password> \
    --header 'Content-Type: application/json' \
    --request POST 'https://<address_of_OpenSearch_host_with_DATA_role>:9200/_aliases' \
    --data '
        {
        "actions": [
            {
            "add": {
                "index": "<new_alias_name>",
                "alias": "<source_alias_name>"
            }
            }
        ]
        }'
    ```

### SSL is required error {#ssl-required}

This error occurs when connecting to a Managed Service for OpenSearch cluster as a custom installation via a OpenSearch host's [FQDN](../../../../managed-opensearch/concepts/network.md#hostname) if **SSL** is not enabled in the endpoint settings. By default, Managed Service for OpenSearch clusters require SSL encryption for connections via host FQDNs. 

This error may also occur if you are connecting to a custom OpenSearch installation that requires SSL.

**Solution**:

Enable **SSL** in the endpoint settings.

For MDB clusters and other sources that use certificates issued by public CAs, you do not usually need to upload a CA certificate.

If your source uses a self-signed certificate, upload your CA certificate to the relevant field in the endpoint settings.

### No tables found {#no-tables}

Error message:

```text
Unable to find any tables
```

This error may occur if the source has no available indexes or the specified user has no permissions for the indexes.

**Solution**:

* Check if there is an index. Make sure that the index name was specified correctly and that the source really has the index you want to transfer.

* Make sure the user has the required permissions to use the index.