[Yandex Cloud documentation](../../index.md) > [Yandex Query](../index.md) > [Getting started](index.md) > Unified analysis of streaming and analytical data

# Unified streaming and batch data analysis

In this example, we will query [analytical](../concepts/batch-processing.md) and [streaming](../concepts/stream-processing.md) data to calculate taxi fares in specific locations.

We will use the same SQL query for both data types, with the only difference in bucket and stream [connections](../concepts/glossary.md#connection) and [data bindings](../concepts/glossary.md#binding).

The data for batch processing has been pre-loaded in the [Yandex Object Storage](../../storage/index.md) bucket in [Parquet](https://parquet.apache.org/docs/file-format/) files. Streaming data will be written to a dedicated [Yandex Data Streams](../../data-streams/index.md) stream by a generator.

In both cases, we will use a reference table stored in Object Storage to filter query data.

To run this example:

1. [Make the necessary preparations](#before-you-begin).
1. [Analyze the Object Storage data](#batch).
1. [Analyze the streaming data from Data Streams](#stream).

{% note info %}

Yandex Cloud provides the **New York City taxi trips** dataset as is. Yandex Cloud makes no express or implied representations, warranties, or conditions pertaining to your use of the specified dataset. To the extent permitted by your local law, Yandex Cloud shall not be liable for any loss or damage, including direct, indirect, consequential, special, incidental, or punitive, resulting from your use of the dataset.

NYC Taxi and Limousine Commission (TLC):

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data was not created by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.

Please review the dataset’s original [source](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) and its [terms of use](https://www1.nyc.gov/home/terms-of-use.page).

{% endnote %}

## Make the necessary preparations {#before-you-begin}

1. Log in to the [management console](https://console.yandex.cloud) or sign up if you have not already. If you have not signed up yet, navigate to the management console and follow the instructions.
1. On the [**Yandex Cloud Billing**](https://center.yandex.cloud/billing/accounts) page, make sure you have an `ACTIVE` or `TRIAL_ACTIVE` [billing account](../../billing/concepts/billing-account.md). If you do not have a billing account yet, [create one](../../billing/quickstart/index.md#create_billing_account).
1. If you do not have a folder yet, [create one](../../resource-manager/operations/folder/create.md).
1. We will connect to the data stream using a [service account](../../iam/concepts/users/service-accounts.md). [Create](../../iam/operations/sa/create.md#create-sa) a service account named `datastream-connection-account` and assign it the `ydb.editor` role.
1. Data streams use Yandex Managed Service for YDB. You will need to [create](../../ydb/quickstart.md#serverless) a serverless database.

## Analyze the data Object Storage {#batch}

### Connect to the analytical data source {#batch-create-binding}

1. In the [management console](https://console.yandex.cloud), select the folder where you want to create a connection.
1. Navigate to **Yandex Query**.
1. In the left-hand panel, select ![study](../../_assets/console-icons/graduation-cap.svg) **Tutorial**.
1. Under **Create infrastructure for tutorial**, click **Create connection**.

    This will open the [create connection](../concepts/glossary.md#connection) page. Check the default settings; do not change them.
1. Click **Create**.

    This will open the [create data binding](../concepts/glossary.md#binding) page. Check the default settings; do not change them.
1. Click **Create**.

### Run the query {#batch-run-query}

1. In the query editor within the Query interface, click **New analytics query**.
1. Enter the query text in the text field:

   ```sql
   $data =
   SELECT
       *
   FROM
       `tutorial-analytics`;

   $locations =
   SELECT
       PULocationID
   FROM
       `tutorial-analytics`.`nyc_taxi_sample/example_locations.csv`
   WITH
   (
       format=csv_with_names,
       SCHEMA
       (
           PULocationID String
       )
   );

   $time =
   SELECT
       HOP_END() AS time,
       rides.PULocationID AS PULocationID,
       SUM(total_amount) AS total_amount
   FROM $data AS rides
   INNER JOIN $locations AS locations
       ON rides.PULocationID=locations.PULocationID
   GROUP BY
       HOP(CAST(tpep_pickup_datetime AS Timestamp?), "PT1M", "PT1M", "PT1M"),
       rides.PULocationID;

   SELECT
       *
   FROM
       $time;
   ```

1. Click **Run**.

### Check the result {#batch-check-result}

Once executed, the analytical query will return the distribution of taxi trip fares in specific locations.

| #  | time | PULocationID | total_amount |
| --- | --- | --- | --- |
| 1  | 2017-12-31T22:24:00.000000Z | 120 | 7.54  |
| 2  | 2018-01-01T00:13:00.000000Z | 120 | 48.8  |
| 3  | 2018-01-01T03:25:00.000000Z | 120 | 30.8  |
| 4  | 2018-01-01T11:29:00.000000Z | 120 | 32.88 |
| 5  | 2018-01-01T15:13:00.000000Z | 120 | 9.8   |
| 6  | 2018-01-01T22:03:00.000000Z | 120 | 14.8  |
| 7  | 2018-01-02T19:28:00.000000Z | 120 | 7.3   |
| 8  | 2018-01-03T10:17:00.000000Z | 120 | 81.3  |

## Analyze the streaming data from Data Streams {#stream}

### Create a data stream {#stream-create-datastream}

1. In the [management console](https://console.yandex.cloud), select the [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) where you need to create a [data stream](../../data-streams/concepts/glossary.md).
  1. Navigate to **Data Streams**.
  1. Click **Create stream**.
  1. Specify the Yandex Managed Service for YDB database created earlier.
  1. Name the data stream: `yellow-taxi`.
  1. Click **Create**.

### Set up data generation {#stream-configure-generation}

1. Create a connection:

    1. In the [management console](https://console.yandex.cloud), select the folder where you want to create a connection.
    1. Navigate to **Yandex Query**.
    1. In the left-hand panel, select ![study](../../_assets/console-icons/graduation-cap.svg) **Tutorial**.
    1. Navigate to **Streaming**.
    1. Under **Create infrastructure for tutorial**, click **Create connection**.
    1. In the window that opens, under **Connection type parameters**, select the database and service account you created earlier.
    1. Click **Create**.

1. Create a [data binding](../concepts/glossary.md):

    1. This will open the `create data binding` page.
    1. Under **Binding parameters**, select the `yellow-taxi` stream you created earlier.
    1. Click **Create**.

The generator will start writing data to the `yellow-taxi` stream. You can control the generator using the **Stop** and **Start** buttons.

### Run the query {#stream-run-query}

1. In the query editor within the Query interface, click **New streaming query**.
1. Enter the query text in the text field:

   ```sql
   $data =
   SELECT
       *
   FROM bindings.`tutorial-streaming`;

   $locations =
   SELECT
       PULocationID
   FROM
       `tutorial-analytics`.`nyc_taxi_sample/example_locations.csv`
   WITH
   (
       format=csv_with_names,
       SCHEMA
       (
           PULocationID String
       )
   );

   $time =
   SELECT
       HOP_END() AS time,
       rides.PULocationID AS PULocationID,
       SUM(total_amount) AS total_amount
   FROM $data AS rides
   INNER JOIN $locations AS locations
       ON rides.PULocationID=locations.PULocationID
   GROUP BY
       HOP(cast(tpep_pickup_datetime AS Timestamp?), "PT1M", "PT1M", "PT1M"),
       rides.PULocationID;

   SELECT
       *
   FROM
       $time;
   ```

1. Click **Run**.

### Check the result {#stream-check-result}

Once launched, the query returns the total fare (`total_amount`) of the taxi rides taken in specific locations (`PULocationID`) after processing started.

| #  | PULocationID | time | total_amount |
| --- | --- | --- | --- |
| 1  | 125 | 2022-02-15T12:03:00.000000Z | 1275.4084 |
| 2  | 129 | 2022-02-15T12:03:00.000000Z | 1073.0449 |
| 3  | 126 | 2022-02-15T12:03:00.000000Z | 202.85883 |
| 4  | 121 | 2022-02-15T12:03:00.000000Z | 636.8784  |
| 5  | 124 | 2022-02-15T12:03:00.000000Z | 923.87805 |
| 6  | 127 | 2022-02-15T12:04:00.000000Z | 2105.3125 |
| ... |

## See also {#see-also}

* [HOP operator and window parameters in streaming data processing](../concepts/stream-processing-windows.md)
* [Aggregate functions in YQL](https://ydb.tech/docs/en//yql/reference/builtins/aggregation)
* [SQL syntax](../sources-and-sinks/data-streams-binding.md#model-dannyh)
* [Batch processing](../concepts/batch-processing.md)
* [Streaming data analysis](../concepts/stream-processing.md)