[Yandex Cloud documentation](../../index.md) > [Yandex API Gateway](../index.md) > [Tutorials](index.md) > Serverless-based bots > Developing a Telegram bot for text and audio recognition

# Developing a Telegram bot for text recognition in images, audio synthesis and recognition


In this tutorial, you will create a Telegram bot that can:

* [Synthesize speech](https://aistudio.yandex.ru/docs/en//speechkit/tts/index) from a message text and [recognize speech](https://aistudio.yandex.ru/docs/en//speechkit/stt/index) in voice messages using the Yandex SpeechKit [Python SDK](https://pypi.org/project/yandex-speechkit/).
* [Recognize text](https://aistudio.yandex.ru/docs/en/vision/concepts/ocr/index) in images using Yandex Vision OCR.

Authentication in the Yandex Cloud services is performed under a service account using an [IAM token](../../iam/concepts/authorization/iam-token.md). The IAM token is contained in the handler context of the [function](../../functions/operations/function-sa.md) which manages user conversation with the bot.

The Yandex API Gateway [API gateway](../concepts/index.md) will receive requests from your bot and forward them to [functions](../../functions/concepts/function.md) in Yandex Cloud Functions for processing.

To create a bot:

1. [Get your cloud ready](#before-you-begin).
1. [Create resources](#prepare).
1. [Register your Telegram bot](#bot-register).
1. [Create a function](#create-function).
1. [Create an API gateway](#create-api-gateway).
1. [Link the function with the bot](#link-bot).
1. [Test the bot](#test).

If you no longer need the resources you created, [delete them](#clear-out).

## Getting started {#before-you-begin}

Sign up for Yandex Cloud and create a [billing account](../../billing/concepts/billing-account.md):
1. Navigate to the [management console](https://console.yandex.cloud) and log in to Yandex Cloud or create a new account.
1. On the **[Yandex Cloud Billing](https://center.yandex.cloud/billing/accounts)** page, make sure you have a billing account linked and it has the `ACTIVE` or `TRIAL_ACTIVE` [status](../../billing/concepts/billing-account-statuses.md). If you do not have a billing account, [create one](../../billing/quickstart/index.md) and [link](../../billing/operations/pin-cloud.md) a cloud to it.

If you have an active billing account, you can create or select a [folder](../../resource-manager/concepts/resources-hierarchy.md#folder) for your infrastructure on the [cloud page](https://console.yandex.cloud/cloud).

[Learn more about clouds and folders here](../../resource-manager/concepts/resources-hierarchy.md).

### Required paid resources {#paid-resources}

The cost of Telegram bot support includes:

* Fee for using SpeechKit (see [SpeechKit pricing](https://aistudio.yandex.ru/docs/en/speechkit/pricing)).
* Fee for using Vision OCR (see [Vision OCR pricing](https://aistudio.yandex.ru/docs/en/vision/pricing)).
* Fee for function invocation count, computing resources allocated to run the function, and outbound traffic (see [Cloud Functions pricing](../../functions/pricing.md)).
* Fee for the number of requests to the API gateway and outbound traffic (see [API Gateway pricing](../pricing.md)).

## Create resources {#prepare}

1. [Create a service account](../../iam/operations/sa/create.md) named `recognizer-bot-sa` and assign it the `ai.editor` and `functions.editor` [roles](../../iam/operations/sa/assign-role-for-sa.md) for your folder.
1. [Download](https://github.com/BtbN/FFmpeg-Builds/releases/download/autobuild-2024-09-30-15-36/ffmpeg-N-117275-g04182b5549-linux64-gpl.tar.xz) the archive with the FFmpeg package for the SpeechKit Python SDK to work correctly in the [function runtime environment](../../functions/concepts/runtime/index.md).
1. Extract the `ffmpeg` and `ffprobe` binary files from the archive and run these commands to make them executable:

    ```bash
    chmod +x ffmpeg
    chmod +x ffprobe
    ```

1. Create a ZIP archive with the function code:

   1. Create a file named `index.py` and paste the code below to it.

      {% cut "index.py" %}

      ```py
      import logging
      import requests
      import telebot
      import json
      import os
      import base64
      from speechkit import model_repository, configure_credentials, creds
      from speechkit.stt import AudioProcessingType


      folder_id = ""
      iam_token = ''
      
      # Image recognition service endpoint and authentication credentials

      API_TOKEN = os.environ['TELEGRAM_TOKEN']
      vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText'

      # Adding the folder with ffmpeg to the system PATH

      path = os.environ.get("PATH")
      os.environ["PATH"] = path + ':/function/code'

      logger = telebot.logger
      telebot.logger.setLevel(logging.INFO)
      bot = telebot.TeleBot(API_TOKEN, threaded=False)
      
      # Getting the folder ID

      def get_folder_id(iam_token, version_id):
          headers = {'Authorization': f'Bearer {iam_token}'}
          function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}',
                                         headers=headers)
          function_id_data = function_id_req.json()
          function_id = function_id_data['functionId']
          folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}',
                                       headers=headers)
          folder_id_data = folder_id_req.json()
          folder_id = folder_id_data['folderId']
          return folder_id

      def process_event(event):
          request_body_dict = json.loads(event['body'])
          update = telebot.types.Update.de_json(request_body_dict)

          bot.process_new_updates([update])

      def handler(event, context):
          global iam_token, folder_id
          iam_token = context.token["access_token"]
          version_id = context.function_version
          folder_id = get_folder_id(iam_token, version_id)

          # Authenticating in SpeechKit with an IAM token
          configure_credentials(
              yandex_credentials=creds.YandexCredentials(
                  iam_token=iam_token
              )
          )

          process_event(event)
          return {
              'statusCode': 200
          }

      # Command and message handlers

      @bot.message_handler(commands=['help', 'start'])
      def send_welcome(message):
          bot.reply_to(message,
                       "The bot can do the following:\n* Recognize text in images.\n* Generate voice messages from text.\n* Convert voice messages to text.")

      @bot.message_handler(func=lambda message: True, content_types=['text'])
      def echo_message(message):
          export_path = '/tmp/audio.ogg'
          synthesize(message.text, export_path)
          with open(export_path, 'rb') as voice:
              bot.send_voice(message.chat.id, voice)

      @bot.message_handler(func=lambda message: True, content_types=['voice'])
      def echo_audio(message):
          file_id = message.voice.file_id
          file_info = bot.get_file(file_id)
          downloaded_file = bot.download_file(file_info.file_path)
          response_text = audio_analyze(downloaded_file)
          bot.reply_to(message, response_text)

      @bot.message_handler(func=lambda message: True, content_types=['photo'])
      def echo_photo(message):
          file_id = message.photo[-1].file_id
          file_info = bot.get_file(file_id)
          downloaded_file = bot.download_file(file_info.file_path)
          image_data = base64.b64encode(downloaded_file).decode('utf-8')
          response_text = image_analyze(vision_url, iam_token, folder_id, image_data)
          bot.reply_to(message, response_text)
      
      # Image recognition

      def image_analyze(vision_url, iam_token, folder_id, image_data):
          response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={
              "mimeType": "image",
              "languageCodes": ["en", "ru"],
              "model": "page",
              "content": image_data
              })
          blocks = response.json()['result']['textAnnotation']['blocks']
          text = ''
          for block in blocks:
              for line in block['lines']:
                  for word in line['words']:
                      text += word['text'] + ' '
                  text += '\n'
          return text
      
      # Speech recognition

      def audio_analyze(audio_data):
          model = model_repository.recognition_model()

          # Recognition settings
          model.model = 'general'
          model.language = 'ru-RU'
          model.audio_processing_type = AudioProcessingType.Full

          result = model.transcribe(audio_data)
          speech_text = [res.normalized_text for res in result]
          return ' '.join(speech_text)
      
      # Speech synthesis

      def synthesize(text, export_path):
          model = model_repository.synthesis_model()

          # Synthesis settings
          model.voice = 'kirill'

          result = model.synthesize(text, raw_format=False)
          result.export(export_path, 'ogg')
      ```

      {% endcut %}

   1. Create a file named `requirements.txt`. In this file, specify a library to use for the bot and the Python SDK library:

      ```text
      pyTelegramBotAPI==4.27
      yandex-speechkit==1.5.0
      ```

   1. Add the `index.py`, `requirements.txt`, `ffmpeg`, and `ffprobe` files into the `index.zip` archive.

1. Create an Object Storage [bucket](../../storage/operations/buckets/create.md) and [upload the created ZIP archive into it](../../storage/operations/objects/upload.md).

## Register your Telegram bot {#bot-register}

Register your bot in Telegram and get a token.

1. Start [BotFather](https://t.me/BotFather) and send it the following command:

   ```text
   /newbot
   ```

1. In the `name` field, enter a name for the new bot. This is the name users will see when chatting with the bot.
1. In the `username` field, enter a username for the new bot. You can use it to find the bot in Telegram. The username must end with `...Bot` or `..._bot`.

   Once done, you will get a token. Save it, as you will need it later.

## Create a function {#create-function}

Create a function to process user actions in the chat.

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the folder where you want to create a function.
  1. [Navigate](../../console/operations/select-service.md#select-service) to **Cloud Functions**.
  1. Create a function:

     1. Click **Create function**.
     1. Enter the function name: `for-recognizer-bot`.
     1. Click **Create**.

  1. Create a function version:

     1. Select `Python` as the runtime environment, disable **Add files with code examples**, and click **Continue**.
     1. Specify `Object Storage` for the upload method and select the bucket you [created earlier](#prepare). In the **Object** field, specify the file name: `index.zip`.
     1. Specify the entry point: `index.handler`.
     1. Under **Parameters**, specify:

        * **Timeout**: `30`.
        * **Memory**: `256 MB`.
        * **Service account**: `recognizer-bot-sa`.
        * **Environment variables**:

          * `TELEGRAM_TOKEN`: Your Telegram bot token.

     1. Click **Save changes**.

- CLI {#cli}

  If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

  The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

  1. Create a function named `for-recognizer-bot`:

     ```bash
     yc serverless function create --name=for-recognizer-bot
     ```

     Result:

     ```text
     id: b09bhaokchn9********
     folder_id: aoek49ghmknn********
     created_at: "2023-03-21T10:03:37.475Z"
     name: for-recognizer-bot
     log_group_id: eolm8aoq9vcp********
     http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9********
     status: ACTIVE
     ```

  1. Create a version of the `for-recognizer-bot` function:

     ```bash
     yc serverless function version create \
       --function-name for-recognizer-bot \
       --memory=256m \
       --execution-timeout=30s \
       --runtime=python312 \
       --entrypoint=index.handler \
       --service-account-id=<service_account_ID> \
       --environment TELEGRAM_TOKEN=<bot_token> \
       --package-bucket-name=<bucket_name> \
       --package-object-name=index.zip
     ```

     Where:

     * `--function-name`: Name of the function whose version you are creating.
     * `--memory`: Amount of RAM.
     * `--execution-timeout`: Maximum function running time before timeout.
     * `--runtime`: Runtime environment.
     * `--entrypoint`: Entry point.
     * `--service-account-id`: `recognizer-bot-sa` service account ID.
     * `--environment`: Environment variables.
     * `--package-bucket-name`: Bucket name.
     * `--package-object-name`: File key in the `index.zip` bucket.

     Result:

     ```text
     done (1s)
     id: d4e6qqlh53nu********
     function_id: d4emc80mnp5n********
     created_at: "2025-03-22T16:49:41.800Z"
     runtime: python312
     entrypoint: index.handler
     resources:
       memory: "268435456"
     execution_timeout: 30s
     service_account_id: aje20nhregkc********
     image_size: "4096"
     status: ACTIVE
     tags:
       - $latest
     log_group_id: ckgmc3l93cl0********
     environment:
       TELEGRAM_TOKEN: <bot_token>
     log_options:
       folder_id: b1g86q4m5vej********
     ```

- Terraform {#tf}

  
  With [Terraform](https://www.terraform.io/), you can quickly create a cloud infrastructure in Yandex Cloud and manage it using configuration files. These files store the infrastructure description written in HashiCorp Configuration Language (HCL). If you change the configuration files, Terraform automatically detects which part of your configuration is already deployed, and what should be added or removed.
  
  Terraform is distributed under the [Business Source License](https://github.com/hashicorp/terraform/blob/main/LICENSE). The [Yandex Cloud provider for Terraform](https://github.com/yandex-cloud/terraform-provider-yandex) is distributed under the [MPL-2.0](https://www.mozilla.org/en-US/MPL/2.0/) license.
  
  For more information about the provider resources, see the relevant documentation on the [Terraform](https://www.terraform.io/docs/providers/yandex/index.html) website or [its mirror](../../terraform/index.md).

  If you do not have Terraform yet, [install it and configure the Yandex Cloud provider](../../tutorials/infrastructure-management/terraform-quickstart.md#install-terraform).
  
  
  To manage infrastructure using Terraform under a service account or user accounts (a Yandex account, a federated account, or a local user), [authenticate](../../terraform/authentication.md) using the appropriate method.


  1. In the configuration file, define the function properties:

     ```hcl
     resource "yandex_function" "for-recognizer-bot-function" {
       name               = "for-recognizer-bot"
       user_hash          = "first function"
       runtime            = "python312"
       entrypoint         = "index.handler"
       memory             = "256"
       execution_timeout  = "30"
       service_account_id = "aje20nhregkcvu******"
       environment = {
         TELEGRAM_TOKEN = <bot_token>
       }
       package {
         bucket_name = <bucket_name>
         object_name = "index.zip"
       }
     }
     ```

     Where:

     * `name`: Function name.
     * `user_hash`: Any string to identify the function version.
     * `runtime`: Function [runtime environment](../../functions/concepts/runtime/index.md).
     * `entrypoint`: Entry point.
     * `memory`: Amount of memory allocated for the function, in MB.
     * `execution_timeout`: Function execution timeout.
     * `service_account_id`: `recognizer-bot-sa` service account ID.
     * `environment`: Environment variables.
     * `package`: Name of the bucket containing the uploaded `index.zip` archive with the function source code.

     For more information about `yandex_function` properties, see [this provider guide](../../terraform/resources/function.md).

  1. Make sure the configuration files are correct.

     1. In the command line, navigate to the directory where you created the configuration file.
     1. Run a check using this command:

        ```bash
        terraform plan
        ```

     If the configuration description is correct, the terminal will display a list of the resources being created and their settings. Terraform will show any errors in the configuration.

  1. Deploy the cloud resources.

     1. If the configuration does not contain any errors, run this command:

        ```bash
        terraform apply
        ```

     1. Confirm creating the function by typing `yes` in the terminal and pressing **Enter**.

- API {#api}

  To create a function, use the [create](../../functions/functions/api-ref/Function/create.md) REST API method for the [Function](../../functions/functions/api-ref/Function/index.md) resource or the [FunctionService/Create](../../functions/functions/api-ref/grpc/Function/create.md) gRPC API call.

  To create a function version, use the [createVersion](../../functions/functions/api-ref/Function/createVersion.md) REST API method for the [Function](../../functions/functions/api-ref/Function/index.md) resource or the [FunctionService/CreateVersion](../../functions/functions/api-ref/grpc/Function/createVersion.md) gRPC API call.

{% endlist %}

## Create an API gateway {#create-api-gateway}

The Telegram server will notify your bot of new messages using a [webhook](https://core.telegram.org/bots/api#setwebhook). The API gateway will receive requests on the bot side and forward them to the `for-recognizer-bot` function for processing.

{% list tabs group=instructions %}

- Management console {#console}

  1. In the [management console](https://console.yandex.cloud), select the folder where you want to create an API gateway.
  1. Navigate to **API Gateway**.
  1. Click **Create API gateway**.
  1. In the **Name** field, enter `recognizer-bot-api-gw`.
  1. Under **Specification**, add the following specification:

     ```yaml
     openapi: 3.0.0
     info:
       title: Sample API
       version: 1.0.0
     paths:
       /for-recognizer-bot-function:
         post:
           x-yc-apigateway-integration:
             type: cloud_functions
             function_id: <function_ID>
             service_account_id: <service_account_ID>
           operationId: for-recognizer-bot-function
     ```

     Where:

     * `function_id`: `for-recognizer-bot` function ID.
     * `service_account_id`: `recognizer-bot-sa` service account ID.

  1. Click **Create**.
  1. Select the created API gateway. Save the **Default domain** field value. You will need it later.

- CLI {#cli}

  1. Save the following specification to `spec.yaml`:

     ```yaml
     openapi: 3.0.0
     info:
       title: Sample API
       version: 1.0.0
     paths:
       /for-recognizer-bot-function:
         post:
           x-yc-apigateway-integration:
             type: cloud_functions
             function_id: <function_ID>
             service_account_id: <service_account_ID>
           operationId: for-recognizer-bot-function
     ```

     Where:

     * `function_id`: `for-recognizer-bot` function ID.
     * `service_account_id`: `recognizer-bot-sa` service account ID.

  1. Run this command:

     ```bash
     yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml
     ```

     Where:

     * `--name`: API gateway name.
     * `--spec`: Specification file.

     Result:

     ```text
     done (5s)
     id: d5d1ud9bli1e********
     folder_id: b1gc1t4cb638********
     created_at: "2023-09-25T16:01:48.926Z"
     name: recognizer-bot-api-gw
     status: ACTIVE
     domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net
     log_group_id: ckgefpleo5eg********
     connectivity: {}
     log_options:
       folder_id: b1gc1t4cb638********
     ```

- Terraform {#tf}

  To create an API gateway:

  1. Describe the `yandex_api_gateway` properties in the configuration file:

     ```hcl
     resource "yandex_api_gateway" "recognizer-bot-api-gw" {
       name        = "recognizer-bot-api-gw"
       spec = <<-EOT
         openapi: 3.0.0
         info:
           title: Sample API
           version: 1.0.0

         paths:
           /for-recognizer-bot-function:
             post:
               x-yc-apigateway-integration:
                 type: cloud_functions
                 function_id: <function_ID>
                 service_account_id: <service_account_ID>
               operationId: for-recognizer-bot-function
       EOT
     }
     ```

     Where:

     * `name`: API gateway name.
     * `spec`: API gateway specification.

     For more information about resource properties, see [this Terraform article](../../terraform/resources/api_gateway.md).

  1. Make sure the configuration files are correct.

     1. In the command line, navigate to the directory where you created the configuration file.
     1. Run a check using this command:

        ```bash
        terraform plan
        ```

     If the configuration description is correct, the terminal will display a list of the resources being created and their settings. Terraform will show any errors in the configuration.

  1. Deploy the cloud resources.

     1. If the configuration does not contain any errors, run this command:

        ```bash
        terraform apply
        ```

     1. Confirm creating the resources: type `yes` and press **Enter**.

- API {#api}

  To create an API gateway, use the [create](../apigateway/api-ref/ApiGateway/create.md) REST API method for the [ApiGateway](../apigateway/api-ref/ApiGateway/index.md) resource or the [ApiGatewayService/Create](../apigateway/api-ref/grpc/ApiGateway/create.md) gRPC API call.

{% endlist %}

## Configure a link between the function and the Telegram bot {#link-bot}

Install a webhook for your Telegram bot:

```bash
curl --request POST \
  --url 'https://api.telegram.org/bot<bot_token>/setWebhook' \
  --header 'content-type: application/json' \
  --data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'
```

Where:

* `<bot_token>`: Telegram bot token.
* `<API_gateway_domain>`: `recognizer-bot-api-gw` API gateway's service domain.

Result:

```json
{"ok":true,"result":true,"description":"Webhook was set"}
```

## Test the bot {#test}

Chat with the bot:

1. Open Telegram and search for the bot by the [specified](#bot-register) `username`.
1. Send `/start` to the chat.

   The bot should respond with:

   ```text
   The bot can do the following:

   * Recognize text in images.
   * Generate voice messages from text.
   * Convert voice messages to text.
   ```

1. Send a text message to the chat. The bot will respond with a voice message synthesized from your text.
1. Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.
1. Send an image with text to the chat. The bot will respond with a message containing the recognized text.

   {% note info %}

   The image must meet the [following requirements](https://aistudio.yandex.ru/docs/en/vision/concepts/ocr/index#image-requirements).

   {% endnote %}

## How to delete the resources you created {#clear-out}

Delete the resources you no longer need to avoid [paying](#paid-resources) for them:

* [Delete](../operations/api-gw-delete.md) the API Gateway.
* [Delete](../../functions/operations/function/function-delete.md) the function in Cloud Functions.