# Installing Kubernetes Node Remediation

[Kubernetes Node Remediation](https://www.medik8s.io/) is a tool comprising several Kubernetes operators that provide automatic recovery of failed Managed Service for Kubernetes cluster nodes and high availability for stateful workloads.

The solution consists of two controllers:

* Node Healthcheck Controller, which tracks failures.
* Self Node Remediation Controller: Transfers the workload from failed nodes and restores them.

## Getting started {#before-you-begin}

1. If you do not have the Yandex Cloud CLI yet, [install and initialize it](../../../cli/quickstart.md#install).

   The folder used by default is the one specified when [creating](../../../cli/operations/profile/profile-create.md) the CLI profile. To change the default folder, use the `yc config set folder-id <folder_ID>` command. You can also specify a different folder for any command using `--folder-name` or `--folder-id`. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

1. [Make sure](../connect/security-groups.md) the security groups for the Managed Service for Kubernetes cluster and its node groups are configured correctly. If a rule is missing, [add it](../../../vpc/operations/security-group-add-rule.md).

    {% note warning %}
    
    The configuration of security groups determines performance and availability of the cluster and the services and applications running in it.
    
    {% endnote %}

1. [Install kubect](https://kubernetes.io/docs/tasks/tools/install-kubectl) and [configure it to work with the new cluster](../connect/index.md#kubectl-connect).

## Installation from Yandex Cloud Marketplace {#marketplace-install}

1. In the [management console](https://console.yandex.cloud), select a folder.
1. Navigate to **Managed Service for&nbsp;Kubernetes**.
1. Click the name of your cluster and select the ![image](../../../_assets/console-icons/shopping-cart.svg) **Marketplace** tab.
1. Under **Application available for installation**, select [Kubernetes Node Remediation](https://yandex.cloud/en/marketplace/products/yc/kubernetes-node-remediation) and click **Go to install**.
1. Configure the application:
   * **Namespace**: Create a new [namespace](../../concepts/index.md#namespace), e.g., `remediation-space`. If you leave the default namespace, Kubernetes Node Remediation may work incorrectly.
   * **Application name**: Specify the application name.
1. Click **Install**.
1. Wait for the application to change its status to `Deployed`.
1. [Create the `NodeHealthCheck` resource](#create-resource).

## Installation using a Helm chart {#helm-install}

1. [Install Helm](https://helm.sh/docs/intro/install/) v3.8.0 or higher.
1. To install a [Helm chart](https://helm.sh/docs/topics/charts/) with Kubernetes Node Remediation, run this command:

   ```bash
   helm pull oci://cr.yandex/yc-marketplace/yandex-cloud/medik8s/kubernetes-node-remediation/chart/kubernetes-node-remediation \
     --version 1.0.1 \
     --untar && \
   helm install \
     --namespace <namespace> \
     --create-namespace \
   kubernetes-node-remediation ./kubernetes-node-remediation/
   ```

   If you specify the default `namespace`, Kubernetes Node Remediation may work incorrectly. We recommend specifying a value different from all the existing namespaces, e.g., `remediation-space`.

   {% note info %}
   
   If you are using a Helm version below 3.8.0, add the `export HELM_EXPERIMENTAL_OCI=1 && \` string at the beginning of the command to enable [Open Container Initiative](https://opencontainers.org/) (OCI) support in the Helm client.
   
   {% endnote %}

1. [Create the `NodeHealthCheck` resource](#create-resource).

## Creating the NodeHealthCheck resource {#create-resource}

1. Create a file named `NodeHealthCheck` with the resource description:

    ```yml
    apiVersion: remediation.medik8s.io/v1alpha1
    kind: NodeHealthCheck
    metadata:
      name: nodehc-sample
    spec:
      minHealthy: 51%
      remediationTemplate:
        apiVersion: self-node-remediation.medik8s.io/v1alpha1
        kind: SelfNodeRemediationTemplate
        name: self-node-remediation-automatic-strategy-template
        namespace: <application_namespace>
      selector:
        matchLabels:
          beta.kubernetes.io/os: linux
      unhealthyConditions:
      - duration: 60s
        status: "False"
        type: Ready
      - duration: 60s
        status: Unknown
        type: Ready
    ```

    Where:

    * `spec.minHealthy`: Minimum percentage of healthy nodes required to initiate recovery.
    * `spec.unhealthyConditions`: List of [node status conditions](https://kubernetes.io/docs/reference/node/node-status/) the controller uses to determine if the node is unhealthy.

        * `duration`: Amount of time the condition must persist before the node recovery process begins.
        * `type`: Condition type.
        * `status`: Expected status for recognizing a node as unhealthy.

        In the example shown, the NodeHealthCheck controller will initiate recovery if the `Ready` condition type for a node indicates that it is unavailable or in a down state for 60 seconds.

    [Learn more about resource fields](https://github.com/medik8s/node-healthcheck-operator/blob/main/docs/configuration.md).

1. Navigate to the directory with the file and run this command:

    ```bash
    kubectl apply -f <file_name>
    ```