[Yandex Cloud documentation](../../index.md) > [Yandex Data Processing](../index.md) > FAQ

# General questions about Yandex Data Processing

* [What clusters can I move to a different availability zone?](#new-availability-zone)

* [What should I do if data on storage subcluster hosts is distributed unevenly?](#data-unevenly-distributed)

* [Where can I view Yandex Data Processing cluster logs?](#cluster-logs)

* [How do I get the logs of my actions in the services?](#logs)

* [Why is the cluster slow even though the computing resources are not used fully?](#throttling)

* [I get the `^M: bad interpreter` error when running the initialization script. How do I fix this?](#syntax-error)

* [When I run a PySpark job, I get an error related to `com/amazonaws/auth/AWSCredentialsProvider`. How do I fix this?](#sharedPrefixes-property)

* [When using dynamic partition overwrites, I get an error related to `PathOutputCommitProtocol`. How do I fix it?](#dynamic-partition-overwrite)

* [Why does the `NAT should be enabled on the subnet` error occur and how do I fix it?](#nat)

* [Why does the `Using fileUris is forbidden on lightweight cluster` error occur and how do I fix it?](#file-uri)

* [Why does the `Create Yandex Data Processing cluster Error: 0 Address space exhausted` error occur and how do I fix it?](#addresses-exhausted)

* [Why is my cluster's status `Unknown`?](#unknown)

* [What is the minimum computing power required for a subcluster with a master host?](#master-computing-power)

* [How do I upgrade the image version in Yandex Data Processing?](#upgrade)

* [How do I run jobs?](#jobs)

* [What security group limits are there?](#security-groups)

* [Can I get superuser permissions on hosts?](#connect-root)

* [How can I fix the no permission error when connecting a service account to the cluster?](#attach-service-account)

#### Which clusters can be moved to a different availability zone? {#new-availability-zone}

You can move [light-weight clusters](../operations/migration-to-an-availability-zone.md) and [HDFS clusters](../tutorials/hdfs-cluster-migration.md).

#### What should I do if data on storage subcluster hosts is distributed unevenly? {#data-unevenly-distributed}

[Connect](../operations/connect.md) to the cluster master host and run this command to rebalance the data:

```bash
sudo -u hdfs hdfs balancer
```

You can configure the load balancer parameters. For example, to change the maximum amount of data to transfer, add the following argument: `-D dfs.balancer.max-size-to-move=<data-size-in-bytes>`.

#### Where can I view Yandex Data Processing cluster logs? {#cluster-logs}

You can find cluster logs in its log group. To track the events of a cluster and its individual hosts, specify the relevant [log group](../../logging/concepts/log-group.md) in cluster settings when [creating](../operations/cluster-create.md) or [updating](../operations/cluster-update.md) the cluster. If no log group has been selected for the cluster, a default log group in the cluster directory will be used to send and store logs. For more information, see [Working with logs](../operations/logging.md).

#### Can I get logs of my operations in Yandex Cloud? {#logs}

Yes, you can request information about operations with your resources from Yandex Cloud logs. Do it by contacting [support](https://center.yandex.cloud/support).

#### Why is the cluster slow even though the computing resources are not used fully? {#throttling}

Your storage may have insufficient maximum [IOPS and bandwidth](../../compute/concepts/storage-read-write.md) to process the current number of requests. In this case, [throttling](../../compute/concepts/storage-read-write.md#throttling) occurs, which degrades the entire cluster performance.

The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:

| Disk type                  | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
|-----------------------------|---------|------------------------------------|-----------------------------------------------|
| `network-hdd`               | 256     | 300/300                            | 30/30                                         |
| `network-ssd`               | 32      | 1,000/1,000                          | 15/15                                         |
| `network-ssd-nonreplicated`, `network-ssd-io-m3` | 93      | 28,000/5,600                         | 110/82                                        |

To increase the maximum IOPS and bandwidth values and make throttling less likely, consider switching to a different cluster with larger host storage or a faster disk type. You can transfer data to a new cluster, for example, using [Apache Hive™ Metastore](../../metadata-hub/concepts/metastore.md).

#### I get the "^M: bad interpreter" error when running the initialization script. How do I fix this? {#syntax-error}

The script runtime environment being Linux (Ubuntu), scripts created in Windows may terminate with the `^M: bad interpreter` error due to using the `CR/LF` new line character (`LF` in Linux). To fix the error, save the script file in Linux format. For more information, see [Syntax errors](../concepts/init-action.md#syntax-errors).

#### When I run a PySpark job, I get an error related to "com/amazonaws/auth/AWSCredentialsProvider". How do I fix this? {#sharedPrefixes-property}

If a Yandex Data Processing cluster is connected to a Apache Hive™ Metastore cluster, you may get the following error when running PySpark jobs:

```text
previously initiated loading for a different type with name "com/amazonaws/auth/AWSCredentialsProvider";
```

To fix this, [add](../operations/cluster-update.md) the `spark:spark.sql.hive.metastore.sharedPrefixes` property with the `com.amazonaws,ru.yandex.cloud` value to the Yandex Data Processing cluster.

#### When using dynamic partition overwrites, I get an error related to "PathOutputCommitProtocol". How do I fix it? {#dynamic-partition-overwrite}

When data processing uses dynamic partition overwrites, you may get this error:

```text
py4j.protocol.Py4JJavaError: An error occurred while calling o264.parquet.
: java.io.IOException: PathOutputCommitProtocol does not support dynamicPartitionOverwrite
```

To fix it, [add](../operations/cluster-update.md) the following properties to the Yandex Data Processing cluster:

* `spark:spark.sql.sources.partitionOverwriteMode : dynamic`
* `spark:spark.sql.parquet.output.committer.class : org.apache.parquet.hadoop.ParquetOutputCommitter`
* `spark:spark.sql.sources.commitProtocolClass : org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol`

You can also add properties when [creating a job](../operations/jobs.md).

#### Why does the "NAT should be enabled on the subnet" error occur and how do I fix it? {#nat}

This error occurs when trying to create a Yandex Data Processing cluster in a subnet with no NAT gateway configured. To fix it, [configure a network for Yandex Data Processing](../tutorials/configure-network.md).

#### Why does the "Using fileUris is forbidden on lightweight cluster" error occur and how do I fix it? {#file-uri}

This error occurs because the [lightweight clusters](../concepts/index.md#light-weight-clusters) configuration does not include HDFS. To fix the error, [create a cluster](../operations/cluster-create.md) with HDFS support.

We also recommend using [Yandex Object Storage buckets](../../storage/concepts/bucket.md) to work with jobs. You can [upload scripts to them](../../storage/operations/objects/upload.md) to run jobs. These scripts are stored as objects one can [get links](../../storage/operations/objects/link-for-download.md) to. As a result, you can use links from Object Storage instead of `file:/` format links in your jobs.

#### Why does the "Create Yandex Data Processing cluster Error: 0 Address space exhausted" error occur and how do I fix it? {#addresses-exhausted}

The error means that your Yandex Data Processing cluster's subnet has run out of IPs that can be allocated to cluster hosts. To check how many IPs are available, [view the list of addresses used](../../vpc/operations/subnet-used-addresses.md) in the subnet and its mask.

To fix the error, do one of the following:

* Delete the unnecessary resources taking up the subnet's IPs.
* Create a subnet with CIDR that suits your cluster's configuration. Next, create a Yandex Data Processing cluster in the new subnet.

For more information about subnet sizes, see the [Yandex Virtual Private Cloud](../../vpc/concepts/network.md#subnet) documentation.

#### Why is my cluster's status "Unknown"? {#unknown}

If your cluster's status changed from `Alive` to `Unknown`:

1. Make sure you have [set up a network for Yandex Data Processing](../tutorials/configure-network.md). For a cluster to run, you need to create and configure the following network resources:

   * Network
   * Subnet
   * NAT gateway
   * Route table
   * Security group
   * Service account for the cluster
   * Bucket to store job dependencies and results

1. Review the logs that describe the cluster status over the specified period:

   ```bash
   yc logging read \
      --group-id=<log_group_ID> \
      --resource-ids=<cluster_ID> \
      --filter=log_type=yandex-dataproc-agent \
      --since 'YYYY-MM-DDThh:mm:ssZ' \
      --until 'YYYY-MM-DDThh:mm:ssZ'
   ```

   In the `--since` and `--until` parameters, specify the period boundaries. Time format: `YYYY-MM-DDThh:mm:ssZ`, e.g., `2020-08-10T12:00:00Z`. Use the UTC time zone.

   For more information, see [Working with logs](../operations/logging.md).

#### What is the minimum computing power required for a subcluster with a master host? {#master-computing-power}

It depends on the driver deploy mode:

* In `deployMode=cluster` mode, when the driver runs on one of the cluster's `compute` hosts, a subcluster with the `master` host requires 4–8 CPU cores and 16 GB of RAM.
* In `deployMode=client` mode, when the driver run on the cluster's master host, the compute capacity depends on the job logic and the number of concurrent jobs.

For more information on driver deploy modes and computing resource usage, see [Resource allocation](../concepts/spark-sql.md#resource-management).

In Yandex Cloud, computing power depends on the host class. For their ratio, see [Host classes](../concepts/instance-types.md).

#### How do I upgrade the image version in Yandex Data Processing? {#upgrade}

The service has no built-in mechanism for [image version](../concepts/environment.md) upgrades. To upgrade your image version, create a new cluster.

To make sure the version you use is always up-to-date, [automate](../tutorials/airflow-automation.md) the creation and removal of temporary Yandex Data Processing clusters using Yandex Managed Service for Apache Airflow™. To run jobs automatically, other than Managed Service for Apache Airflow™, you can also [use](../tutorials/datasphere-integration.md) Yandex DataSphere.

#### How do I run jobs? {#jobs}

There are several ways to do it:

* [Create jobs in Yandex Data Processing](../operations/jobs.md). Once created, they will run automatically.
* [Run Apache Hive jobs](../tutorials/how-to-use-hive.md) using the Yandex Cloud CLI or Hive CLI.
* [Run Spark or PySpark applications](../tutorials/run-spark-job.md) using Spark Shell, `spark-submit`, or the Yandex Cloud CLI.
* Use `spark-submit` to [run jobs from remote hosts](../tutorials/remote-run-job.md) that are not part of the Yandex Data Processing cluster.
* Set up integration with [Yandex Managed Service for Apache Airflow™](../tutorials/airflow-automation.md) or [Yandex DataSphere](../tutorials/datasphere-integration.md). This will automate running the jobs.

#### What security group limits are there? {#security-groups}

You can create no more than five security groups per network. Each group may have a maximum of 50 rules. Learn more about [limits in Yandex Virtual Private Cloud](../../vpc/concepts/limits.md#vpc-limits).

#### Can I get superuser permissions on hosts? {#connect-root}

Yes. To switch to superuser, enter the following command after connecting to the host:

   ```bash
     sudo su
   ```

However, you do not have to switch to superuser: just use `sudo`.

#### How can I fix the no permission error when connecting a service account to the cluster? {#attach-service-account}

#### How can I fix the no permission error when connecting a service account to the cluster? {#attach-service-account}

Error message:

```text
ERROR: rpc error: code = PermissionDenied desc = you do not have permission to access the requested service account or service account does not exist
```

This error occurs if you link a service account to a cluster while creating or modifying it.

**Solution**
[Assign](../../iam/operations/roles/grant.md) the [iam.serviceAccounts.user](../../iam/security/index.md#iam-serviceAccounts-user) role or higher to your Yandex Cloud account.