[Yandex Cloud documentation](../../index.md) > [Yandex Data Processing](../index.md) > [Step-by-step guides](index.md) > Jobs > Spark jobs

# Managing Spark jobs

## Creating a job {#create}

{% list tabs group=instructions %}

- Management console {#console}

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Click the name of your cluster and select the **Jobs** tab.
    1. Click **Submit job**.
    1. Optionally, enter a name for the job.
    1. In the **Job type** field, select `Spark`.
    1. In the **Main jar** field, specify the path to the application's main JAR file in the following format:

        | File location                                                | Path format                                          |
        |-----------------------------------------------------------------|------------------------------------------------------|
        | Instance file system                                       | `file:///<path_to_file>`                             |
        | Distributed cluster file system                        | `hdfs:///<path_to_file>`                             |
        | [Object Storage bucket](../../storage/concepts/bucket.md) | `s3a://<bucket_name>/<path_to_file>`                  |
        | Internet                                                        | `http://<path_to_file>` or `https://<path_to_file>` |
        
        Archives in standard Linux formats, such as `zip`, `gz`, `xz`, `bz2`, etc., are supported.
        
        The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in [Editing a bucket ACL](../../storage/operations/buckets/edit-acl.md).

    1. In the **Main class** field, specify the name of the main application class.
    1. Specify job arguments.

        If an argument, variable, or property is in several space-separated parts, specify each part separately. At the same time, it is important to preserve the order in which you declare arguments, variables, and properties.
        
        The `-mapper mapper.py` argument, for instance, must be converted into two arguments, `-mapper` and `mapper.py`, in that order.

    1. Optionally, specify the paths to JAR files, if any.
    1. Optionally, configure advanced settings:

        * Specify paths to the required files and archives.
        * In the **Properties** field, specify [component properties](../concepts/settings-list.md) as `key-value` pairs.

    1. Click **Submit job**.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To create a job:

    1. See the description of the CLI command for creating `spark` jobs:

        ```bash
        yc dataproc job create-spark --help
        ```

    1. Create a job (the example does not illustrate all available parameters):

        ```bash
        yc dataproc job create-spark \
           --cluster-name=<cluster_name> \
           --name=<job_name> \
           --main-class=<application_main_class_name> \
           --main-jar-file-uri=<path_to_main_jar_file> \
           --jar-file-uris=<path_to_jar_file> \
           --file-uris=<file_path> \
           --archive-uris=<path_to_archive> \
           --properties=<component_properties> \
           --args=<arguments> \
           --packages=<Maven_coordinates_of_jar_files> \
           --repositories=<additional_repositories> \
           --exclude-packages=<packages_to_exclude>
        ```

        Where: 
        
        * `--properties`: Component properties as `key-value` pairs.
        * `--args`: Arguments provided to the job.
        * `--packages`: Maven coordinates of JAR files in `groupId:artifactId:version` format.
        * `--repositories`: Additional repositories to search for `packages`.
        * `--exclude-packages`: Packages to exclude, in `groupId:artifactId` format.

        Provide the paths to the files required for the job in the following format:

        | File location                                                | Path format                                          |
        |-----------------------------------------------------------------|------------------------------------------------------|
        | Instance file system                                       | `file:///<path_to_file>`                             |
        | Distributed cluster file system                        | `hdfs:///<path_to_file>`                             |
        | [Object Storage bucket](../../storage/concepts/bucket.md) | `s3a://<bucket_name>/<path_to_file>`                  |
        | Internet                                                        | `http://<path_to_file>` or `https://<path_to_file>` |
        
        Archives in standard Linux formats, such as `zip`, `gz`, `xz`, `bz2`, etc., are supported.
        
        The cluster service account needs read access to all the files in the bucket. Step-by-step guides on how to set up access to Object Storage are provided in [Editing a bucket ACL](../../storage/operations/buckets/edit-acl.md).

    You can get the cluster ID and name with the [list of clusters in the folder](cluster-list.md#list).

- API {#api}

    Call the [create](../api-ref/Job/create.md) API method and provide the following in the request:

    * Cluster ID in the `clusterId` parameter.
    * Job name in the `name` parameter.
    * Job properties in the `sparkJob` parameter.

    You can get the cluster ID with the [list of clusters in the folder](cluster-list.md#list).

{% endlist %}

## Canceling a job {#cancel}

{% note info %}

You cannot cancel jobs with the `ERROR`, `DONE`, or `CANCELLED` status. To find out the job status, get the [list of jobs](#list) in the cluster.

{% endnote %}

{% list tabs group=instructions %}

- Management console {#console}

  1. Open the [folder dashboard](https://console.yandex.cloud).
  1. Navigate to **Yandex Data Processing**.
  1. Click the name of your cluster and select the **Jobs** tab.
  1. Click the job name.
  1. Click **Cancel** in the top-right corner of the page.
  1. In the window that opens, select **Cancel**.

- CLI {#cli}

  If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

  The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

  To cancel a job, run this command:

  ```bash
  yc dataproc job cancel <job_name_or_ID> \
    --cluster-name=<cluster_name>
  ```

  You can get the job ID and name with the [list of jobs in the cluster](#list), and the cluster name, with the [list of clusters in the folder](cluster-list.md#list).

- API {#api}

  Call the [cancel](../api-ref/Job/cancel.md) API method and provide the following in the request:
  * Cluster ID in the `clusterId` parameter.
  * Job ID in the `jobId` parameter.

  You can get the cluster ID with the [list of clusters in the folder](cluster-list.md#list), and the job ID, with the [list of cluster jobs](#list).

{% endlist %}

## Getting a list of jobs {#list}

{% list tabs group=instructions %}

- Management console {#console}

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Click the name of your cluster and select the **Jobs** tab.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To get a list of jobs, run the following command:

    ```bash
    yc dataproc job list --cluster-name=<cluster_name>
    ```

    You can get the cluster ID and name with the [list of clusters in the folder](cluster-list.md#list).

- API {#api}

    Call the [list](../api-ref/Job/list.md) API method, providing the cluster ID in the `clusterId` request parameter.

    You can get the cluster ID with the [list of clusters in the folder](cluster-list.md#list).

{% endlist %}

## Getting general info about a job {#get-info}

{% list tabs group=instructions %}

- Management console {#console}

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Click the name of your cluster and select the **Jobs** tab.
    1. Click the job name.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To get general info about a job, run this command:

    ```bash
    yc dataproc job get \
       --cluster-name=<cluster_name> \
       --name=<job_name>
    ```

    You can get the cluster ID and name with the [list of clusters in the folder](cluster-list.md#list).

- API {#api}

    Call the [get](../api-ref/Job/get.md) API method and provide the following in the request:

    * Cluster ID in the `clusterId` parameter. You can get it with the [list of clusters in the folder](cluster-list.md#list).
    * Job ID in the `jobId` parameter. You can get it with the [list of cluster jobs](#list).

{% endlist %}

## Getting job execution logs {#get-logs}

{% note info %}

You can view the job logs and search data in them using [Yandex Cloud Logging](../../logging/index.md). For more information, see [Working with logs](logging.md).

{% endnote %}

{% list tabs group=instructions %}

- Management console {#console}

    1. Open the [folder dashboard](https://console.yandex.cloud).
    1. Navigate to **Yandex Data Processing**.
    1. Click the name of your cluster and select the **Jobs** tab.
    1. Click the job name.

- CLI {#cli}

    If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../cli/quickstart.md#install).

    The folder used by default is the one specified when [creating](../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

    To get the job execution logs, run the following command:

    ```bash
    yc dataproc job log \
       --cluster-name=<cluster_name> \
       --name=<job_name>
    ```

    You can get the cluster ID and name with the [list of clusters in the folder](cluster-list.md#list).

- API {#api}

    Call the API [listLog](../api-ref/Job/listLog.md) method and provide the following in the request:

    * Cluster ID in the `clusterId` parameter. You can get it with the [list of clusters in the folder](cluster-list.md#list).
    * Job ID in the `jobId` parameter. You can get it with the [list of cluster jobs](#list).

{% endlist %}