# Transfer state monitoring

Transfer status details are available in the management console:

* Detailed diagnostic information is presented as charts. You can view them in the **Monitoring** tab of the transfer management page or in [Yandex Monitoring](../../monitoring/concepts/index.md).

You can [configure alerts](#monitoring-integration) in Yandex Monitoring to receive notifications about transfer failures. In Yandex Monitoring, there are two trigger thresholds: `Warning` and `Alarm`. If the specified threshold is exceeded, you will get alerts via the configured [notification channels](../../monitoring/concepts/alerting.md#notification-channel).

You can also use the Yandex Cloud [mobile app](https://yandex.cloud/en/mobile-app) to monitor transfer statuses and get their logs.


## Errors displayed on the transfer timeline {#errors-timeline}

Some errors you may see on the selected transfer timeline:

  * The transfer does not write all the data it reads. There are fewer events written to the target than read from the source. This may indicate insufficient throughput of the target.
  * Transfer time has increased or is too long. If a transfer takes longer time to process data, this may suggest that the data stream from the source has increased or there are issues with the target’s throughput.
  * Replication lag is increasing. The lag may increase if the source starts sending more events or due to data target issues, insufficient resources, or operational errors.
  * Replication restarts frequently. Frequent replication restarts can signify an issue in the data source or target, as well as a memory shortage.

[Learn more about errors displayed on the timeline](../troubleshooting/index.md#timeline).

## Transfer state monitoring {#monitoring}

{% list tabs group=instructions %}

- Management console {#console}

  1. Open the [folder dashboard](https://console.yandex.cloud).
  1. Navigate to **Data Transfer**.
  1. In the left-hand panel, select ![image](../../_assets/console-icons/arrow-right-arrow-left.svg) **Transfers**.
  1. Click the transfer name and open the ![image](../../_assets/console-icons/display-pulse.svg) **Monitoring** tab.
  1. To get started with Yandex Monitoring metrics, dashboards, or alerts, click **Open in Monium** in the top panel.

{% endlist %}

The following charts open on the page:

### Number of source events {#publisher.data.changeitems}
`publisher.data.changeitems`

Number of source events generated for a transfer. Apart from the data to transfer, these events may include housekeeping operations.

### Number of target events {#sinker.pusher.data.changeitems}
`sinker.pusher.data.changeitems`

Number of events written to the target. Apart from the data to transfer, these events may include housekeeping operations.

### Maximum data transfer delay {#sinker.pusher.time.row_max_lag_sec}
`sinker.pusher.time.row_max_lag_sec`

Maximum data lag (in seconds).

### Reads {#publisher.data.bytes}
`publisher.data.bytes`

The amount of data read from the source (in bytes).

### Data transfer delay {#sinker.pusher.time.row_lag_sec}
`sinker.pusher.time.row_lag_sec`

Time difference between when the records appear in the target and when they appear in the source (in seconds). The histogram is divided into `bin`s. Let us assume, the histogram is showing two `bin`s for 45 and 60 at a given point in time, with each containing a value equal to 50%. This means that half the records being transferred at the time had a delay of between 30 and 45 seconds, and the other half of between 45 and 60 seconds.

### Source buffer size {#publisher.consumer.log_usage_bytes}
`publisher.consumer.log_usage_bytes`

The size, in bytes, of the buffer or write ahead log (when supported) in the source.

### Rows written to target, by table {#sinker.table.rows}
`sinker.table.rows`

50 tables with the maximum number of rows written to the target.

### Target response time {#sinker.pusher.time.batch_push_distribution_sec}
`sinker.pusher.time.batch_push_distribution_sec`

Full time it takes to write a batch to the target, including data preprocessing (in seconds).

### Rows awaiting transfer, by table {#task.snapshot.remainder.table}
`task.snapshot.remainder.table`

Number of rows awaiting transfer.

### Operation status {#task.status}
`task.status`

Type of the operation in progress: `1`, meaning the task is active.


## Setting up alerts in Yandex Monitoring {#monitoring-integration}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the folder with the transfer you want to set up alerts for.
  1. Go to ![image](../../_assets/console-icons/display-pulse.svg) **Monitoring**.
  1. Under **Service dashboards**, select **Data Transfer**.
  1. In the chart you need, click ![options](../../_assets/console-icons/ellipsis.svg) and select **Create alert**.
  1. If the chart displays multiple metrics, select the data query for the relevant metric and click **Continue**. Learn more about the query language in [this Yandex Monitoring guide](../../monitoring/concepts/querying.md).
  1. Set the `Alarm` and `Warning` alert thresholds.
  1. Click **Create alert**.

{% endlist %}

## Recommended alerts

### Number of source events {#source-change-items}

Alert triggering means that the source base generated no replicated Data Transfer events (individual data elements) during the evaluation window.

Possible causes include:

* The source base is not available over the network for Data Transfer, e.g., due to revoked accesses or a source base failure.
* The source base has no data to replicate.

Alert parameters:

* Metrics:

    ![image](../../_assets/console-icons/chart-line.svg) `<cloud_name> > <folder_name>` `service = data-transfer` `name = publisher.data.changeitems`

    ![image](../../_assets/console-icons/function.svg) `derivative()` (in the **Transformation** section)

* Alert settings:

    * Condition: `Less than or equals`.
    * Alarm: `0`.
    * Warning: `-`.

    You can additionally set the `Warning` triggering condition for the situations when the number of replicated operations is below the expected value.

    Additional settings:

    * **Aggregation function**: `Maximum`.
    * **Evaluation window**: `5 minutes`. If the source database changes less frequently than once every five minutes, increase the evaluation window to the maximum allowable interval between two DML operations with data in the source.

### Number of target events {#target-change-items}

When an alert is triggered, it means that the target database recorded no replicated Data Transfer events during the evaluation window.

Possible causes include:

* The source or target base is not available over the network for Data Transfer, e.g., due to revoked accesses or a source/target base failure.
* The source base has no data to replicate.
* The data from the source database cannot be replicated to the target one, e.g., due to the target data type limitations in the target database.

Alert parameters:

* Metrics:

    ![image](../../_assets/console-icons/chart-line.svg) `<cloud_name> > <folder_name>` `service = data-transfer` `name = sinker.pusher.data.changeitems`
    ![image](../../_assets/console-icons/function.svg) `derivative()` (in the **Transformation** section)

* Alert settings:

    * Condition: `Less than or equals`.
    * Alarm: `0`.
    * Warning: `-`.

    You can additionally set the `Warning` triggering condition for the situations when the number of replicated operations is below the expected value.

    Additional settings:

    * **Aggregation function**: `Maximum`.
    * **Evaluation window**: `5 minutes`. If the source database changes less frequently than once every five minutes, increase the evaluation window to the maximum allowable interval between two DML operations with data in the source.

### Maximum data transfer delay {#row-max-lag}

Alert triggering means that the time difference between execution of the operation with rows in the source and the target has exceeded the specified threshold during the evaluation window.

Possible causes include:

* The target database is not available over the network for Data Transfer, e.g., due to revoked accesses or a target database failure.
* Not enough resources for replication. For example, the load on the source database exceeds the capacity of the VM instance the Data Transfer replication is running on.
* The data from the source database cannot be replicated to the target one, e.g., due to the target data type limitations in the target database.

Alert parameters:

* Metrics:

    ![image](../../_assets/console-icons/chart-line.svg) `<cloud_name> > <folder_name>` `service = data-transfer` `name = sinker.pusher.time.row_max_lag_sec`

* Alert settings:

    * Condition: `Greater than or equals`.
    *  Alarm: `15`. If the target database is slow, or large blocks of data are being replicated at a time, set the maximum possible value.
    * Warning: `-`.

    Additional settings:

    * **Aggregation function**: `Minimum`.
    * **Evaluation window**: `1 minute`.

### Reading {#reading}

Alert triggering means that no bytes of data were read from the source during the evaluation window.

Possible causes include:

* The source base is not available over the network for Data Transfer, e.g., due to revoked accesses or a source base failure.
* The source base has no data to replicate.

Alert parameters:

* Metrics:

    ![image](../../_assets/console-icons/chart-line.svg) `<cloud_name> > <folder_name>` `service = data-transfer` `name = publisher.data.bytes`
    ![image](../../_assets/console-icons/function.svg) `derivative()` (in the **Transformation** section)

* Alert settings:

    * Condition: `Equals to`.
    * Alarm: `0`.
    * Warning: `-`.

    Additional settings:

    * **Aggregation function**: `Maximum`.
    * **Evaluation window**: `15 minutes`. If the source database changes less frequently than once every 15 minutes, increase the evaluation window to the maximum allowable interval between two DML operations with data in the source.

## Working with alerts {#alert-specifics}

* To determine the causes of the transfer failure, check all available alerts. Information about which alerts worked and which did not will enable you to determine the cause more accurately. For example, if the [Number of source events](#source-change-items) alert has fired, and the [Number of target events](#target-change-items) alert has not, in all probability the problem is not on the source.