[Yandex Cloud documentation](../../index.md) > [Yandex Data Transfer](../index.md) > [Tutorials](index.md) > Uploading data to Object Storage > Replicating logs to Object Storage using Fluent Bit

# Replicating logs to Yandex Object Storage using Fluent Bit


Data aggregators enable you to transmit data, e.g., logs, from [VMs](../../compute/concepts/vm.md) to log monitoring and data storage services.

In this tutorial, you will learn how to replicate VM logs automatically to an Object Storage bucket using [Fluent Bit](https://fluentbit.io).

The solution described below works in the following way:
1. Fluent Bit runs on an active VM as a [systemd](https://ru.wikipedia.org/wiki/Systemd) module.
1. Fluent Bit collects logs as per the configuration settings and sends them to a [stream](../../data-streams/concepts/glossary.md#stream-concepts) in Data Streams over the [Amazon Kinesis Data Streams](https://aws.amazon.com/ru/kinesis/data-streams/) protocol.
1. In your working folder, you set up a [Data Transfer](../concepts/index.md#transfer) that fetches data from the stream and saves it to an Object Storage [bucket](../../storage/concepts/bucket.md).

To set up log replication:

1. [Get your cloud ready](#before-you-begin).
1. [Set up your environment](#setup).
1. [Create an Object Storage bucket for storing your logs](#create-bucket).
1. [Create a stream in Data Streams](#create-stream).
1. [Create a Data Transfer](#create-transfer).
1. [Install Fluent Bit](#install-fluent-bit).
1. [Connect Fluent Bit to your data stream](#connect).
1. [Test sending and receiving data](#check-ingestion).

If you no longer want to store logs, [delete the resources allocated to them](#clear-out).

## Get your cloud ready {#before-you-begin}

Sign up for Yandex Cloud and create a [billing account](../../billing/concepts/billing-account.md):
1. Navigate to the [management console](https://console.yandex.cloud) and log in to Yandex Cloud or create a new account.
1. On the **[Yandex Cloud Billing](https://center.yandex.cloud/billing/accounts)** page, make sure you have a billing account linked and it has the `ACTIVE` or `TRIAL_ACTIVE` [status](../../billing/concepts/billing-account-statuses.md). If you do not have a billing account, [create one](../../billing/quickstart/index.md) and [link](../../billing/operations/pin-cloud.md) a cloud to it.

If you have an active billing account, you can create or select a [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) for your infrastructure on the [cloud page](https://console.yandex.cloud/cloud).

[Learn more about clouds and folders here](../../resource-manager/concepts/resources-hierarchy.md).


### Required paid resources {#paid-resources}

* Data Streams (see [Data Streams pricing](../../data-streams/pricing.md)). The cost depends on the pricing model:

    * [Based on allocated resources](../../data-streams/pricing.md#rules): You pay a fixed hourly rate for the established throughput limit and message retention period, and additionally for the number of units of actually written data.
    * [On-demand](../../data-streams/pricing.md#on-demand): You pay for the performed read/write operations, the amount of read or written data, and the actual storage used for messages that are still within their retention period.

* Managed Service for YDB database, operating in serverless mode: data operations, amount of stored data and backups (see [Managed Service for YDB pricing](../../ydb/pricing/index.md)).
* Object Storage bucket: use of storage, data operations (see [Object Storage pricing](../../storage/pricing.md)).


## Set up your environment {#setup}

1. [Create a service account](../../iam/operations/sa/create.md), e.g., `logs-sa`, with the `editor` role for the folder.
1. [Create a static access key](../../iam/operations/authentication/manage-access-keys.md#create-access-key) for the service account. Save the ID and secret key. You will need them to log in to AWS.
1. [Create a VM](../../compute/operations/vm-create/create-linux-vm.md) from a public [Ubuntu 20.04](https://yandex.cloud/en/marketplace/products/yc/ubuntu-20-04-lts) image. Under **Access**, specify the service account you created at the previous step.
1. [Connect to the VM](../../compute/operations/vm-connect/ssh.md#vm-connect) over SSH.
1. Install the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) on your VM.
1. Run this command:

    ```bash
    aws configure
    ```
1. Enter the following, one by one:

    * `AWS Access Key ID [None]:`: Service account [key ID](../../iam/concepts/authorization/access-key.md).
    * `AWS Secret Access Key [None]:`: Service account [secret key](../../iam/concepts/authorization/access-key.md).
    * `Default region name [None]:`: `ru-central1`.

## Create a bucket {#create-bucket}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) where you want to create a [bucket](../../storage/concepts/bucket.md).
  1. Navigate to **Object Storage**.
  1. Click **Create bucket**.
  1. Enter a name for the bucket.
  1. In the **Storage class** field, select `Cold`.
  1. Click **Create bucket**.

{% endlist %}

## Create a data stream {#create-stream}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) where you want to create a [data stream](../../data-streams/concepts/glossary.md#stream-concepts).
  1. Navigate to **Data Streams**.
  1. Click **Create stream**.
  1. Specify an existing [serverless](../../ydb/concepts/serverless-and-dedicated.md#serverless) database in YDB or [create](../../ydb/quickstart.md#serverless) a new one. If you have created a new database, click ![refresh-button](../../_assets/data-streams/refresh-button.svg) to update the database list.
  1. Name the data stream, e.g., `logs-stream`.
  1. Click **Create**.

  Wait for the stream to start. Once the stream is ready for use, its status will change from `Creating` to `Active`.

{% endlist %}

## Create a transfer {#create-transfer}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) where you want to create a [transfer](../concepts/index.md#transfer).
  1. Navigate to Data Transfer.
  1. Create a source [endpoint](../concepts/index.md#endpoint):
     1. In the ![endpoint](../../_assets/console-icons/aperture.svg) **Endpoints** tab, click **Create endpoint**.
     1. In the **Direction** field, select `Source`.
     1. Enter the endpoint name, e.g., `logs-source`.
     1. From the **Database type** list, select `Yandex Data Streams`.
     1. Select the database you specified in the settings of the [stream](../../data-streams/concepts/glossary.md#stream-concepts) you created earlier.
     1. Name the data stream: `logs-stream`.
     1. Select the `logs-sa` [service account](../../iam/concepts/users/service-accounts.md) you created earlier.
     1. Under **Advanced settings**, specify the conversion rules for the `CloudLogging parser` data.
     1. Click **Create**.
  1. Create a target endpoint:
     1. In the ![endpoint](../../_assets/console-icons/aperture.svg) **Endpoints** tab, click **Create endpoint**.
     1. In the **Direction** field, select `Target`.
     1. Enter the endpoint name, e.g., `logs-receiver`.
     1. From the **Database type** list, select `Object Storage`.
     1. Enter the name of the previously created [bucket](../../storage/concepts/bucket.md).
     1. Select the `logs-sa` service account you created earlier.
     1. In the **Serialization format** field, select `JSON`.
     1. Click **Create**.
  1. Create a transfer:
     1. In the ![image](../../_assets/console-icons/arrow-right-arrow-left.svg) **Transfers** tab, click **Create transfer**.
     1. Enter the transfer name, e.g., `logs-transfer`.
     1. Select the `logs-source` source endpoint you created earlier.
     1. Select the `logs-receiver` target endpoint you created earlier.
     1. Click **Create**.
  1. Click ![ellipsis](../../_assets/console-icons/ellipsis.svg) next to the new transfer and select **Activate**.

  Wait until the transfer gets activated. Once the transfer is ready for use, its [status](../concepts/transfer-lifecycle.md#statuses) will change from `Creating` to `Replicating`.

{% endlist %}

## Install Fluent Bit {#install-fluent-bit}

{% note info %}

This tutorial uses the current Fluent Bit version, 1.9.

{% endnote %}

1. To install Fluent Bit on your VM, run this command:
    ```bash
    curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
    ```
    For more information on how to install Fluent Bit, see [this Fluent Bit guide](https://docs.fluentbit.io/manual/installation/linux/ubuntu).

1. Start `fluent-bit`:
    ```bash
    sudo systemctl start fluent-bit
    ```
1. Make sure the `fluent-bit` status is active:
    ```bash
    sudo systemctl status fluent-bit
    ```

    The result should include the `active (running)` status and logs for the embedded `cpu` plugin that Fluent Bit starts collecting by default as soon as installation is complete:
    ```bash
    ● fluent-bit.service - Fluent Bit
     Loaded: loaded (/lib/systemd/system/fluent-bit.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2022-09-08 10:23:03 UTC; 10s ago
       Docs: https://docs.fluentbit.io/manual/
   Main PID: 1328 (fluent-bit)
      Tasks: 4 (limit: 2310)
     Memory: 2.8M
     CGroup: /system.slice/fluent-bit.service
             └─1328 /opt/fluent-bit/bin/fluent-bit -c //etc/fluent-bit/fluent-bit.conf

     Sep 08 10:23:03 ycl-20 fluent-bit[1328]: [2022/09/08 10:23:03] [ info] [output:stdout:stdout.0] worker #0 started
     Sep 08 10:23:05 ycl-20 fluent-bit[1328]: [0] cpu.local: [1662632584.114661597, {"cpu_p"=>1.000000, "user_p"=>0.000000, >
     Sep 08 10:23:06 ycl-20 fluent-bit[1328]: [0] cpu.local: [1662632585.114797726, {"cpu_p"=>0.000000, "user_p"=>0.000000, >
     ...
     
    ``` 

## Connect Fluent Bit to your data stream {#connect}

{% note info %}

If running Fluent Bit version below 1.9, which comes with the `td-agent-bit` package, edit the `/etc/td-agent-bit/td-agent-bit.conf` and `/lib/systemd/system/td-agent-bit.service` files and restart `td-agent-bit`.

{% endnote %}


1. Open `/etc/fluent-bit/fluent-bit.conf`: 

   ```bash
   sudo vim  /etc/fluent-bit/fluent-bit.conf
   ```
1. Add the `OUTPUT` section with the `kinesis_streams` plugin settings:

    ```bash
    [OUTPUT]
        Name  kinesis_streams
        Match *
        region ru-central-1
        stream /<region>/<folder_ID>/<database_ID>/<data_stream_name>
        endpoint https://yds.serverless.yandexcloud.net
    ```
    Where:

    * `stream`: ID of the data stream in Data Streams. 
        >For example, your stream ID will appear as `/ru-central1/aoeu1kuk2dht********/cc8029jgtuab********/logs-stream` if:
        >* `logs-stream`: Stream name.
        >* `ru-central1`: Region.
        >* `aoeu1kuk2dht********`: Folder ID.
        >* `cc8029jgtuab********`: YDB database ID.

    For more information on how to configure Fluent Bit, see [this Fluent Bit guide](https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/configuration-file).

1. Open `/lib/systemd/system/fluent-bit.service`: 
   ```bash
   sudo vim  /lib/systemd/system/fluent-bit.service
   ```
1. Add the environment variables containing to access key file paths to the `SERVICE` section:
   ```bash
   Environment=AWS_CONFIG_FILE=/home/<username>/.aws/config
   Environment=AWS_SHARED_CREDENTIALS_FILE=/home/<username>/.aws/credentials
   ```

   Where `<username>` is the username you specified in the VM settings. 

1. Restart `fluent-bit`:
   ```bash
   sudo systemctl daemon-reload
   sudo systemctl restart fluent-bit
   ```
1. Check the `fluent-bit` status. It must not contain any error messages:
    ```bash
    sudo systemctl status fluent-bit
    ```

    Result:
    ```bash
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: Fluent Bit v1.9.8
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: * Copyright (C) 2015-2022 The Fluent Bit Authors
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: * https://fluentbit.io
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: [2022/09/08 16:51:19] [ info] [fluent bit] version=1.9.8, commit=, pid=3450
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: [2022/09/08 16:51:19] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: [2022/09/08 16:51:19] [ info] [cmetrics] version=0.3.6
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: [2022/09/08 16:51:19] [ info] [sp] stream processor started
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: [2022/09/08 16:51:19] [ info] [output:kinesis_streams:kinesis_streams.1] worker #0 started
    Sep 08 16:51:19 ycl-20 fluent-bit[3450]: [2022/09/08 16:51:19] [ info] [output:stdout:stdout.0] worker #0 started
    ```

## Test sending and receiving data {#check-ingestion}

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), navigate to the [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) with the new [data stream](../../data-streams/concepts/glossary.md#stream-concepts), [transfer](../concepts/index.md##transfer), and [bucket](../../storage/concepts/bucket.md).
  1. Navigate to **Data Streams**.
  1. Select the data stream named `logs-stream`.
  1. Go to the **Monitoring** tab and check the stream activity charts.
  1. Navigate to Data Transfer.
  1. Select the `logs-transfer` transfer.
  1. Go to the **Monitoring** tab and check the transfer activity charts.
  1. Navigate to **Object Storage**.
  1. Select the previously created bucket.
  1. Make sure that you have objects in the bucket. Download and review the log files you got.

{% endlist %}

## Delete the resources you created {#clear-out}

To reduce the consumption of resources you do not need, delete them:

1. [Delete the transfer](../operations/transfer.md#delete).
1. [Delete the endpoints](../operations/endpoint/index.md#delete).
1. [Delete the data stream](../../data-streams/operations/manage-streams.md#delete-data-stream).
1. [Delete the objects from the bucket](../../storage/operations/objects/delete.md).
1. [Delete the bucket](../../storage/operations/buckets/delete.md).