[Yandex Cloud documentation](../../index.md) > [Yandex MPP Analytics for PostgreSQL](../index.md) > [Tutorials](index.md) > Getting data from external sources using named queries

# Getting data from external sources using named queries

You can use named queries to retrieve data from external systems through [PXF](../concepts/external-tables.md) in Greenplum®.

A _named query_ is a prebuilt SQL query stored in the `mdb_toolkit.pxf_named_queries` table of the Greenplum® cluster's system database. You specify the query name reference when creating an external table.

Named queries allow you to join tables and aggregate data in an external source when creating a view is not an option. With all computations performed externally, the cluster operates more efficiently.

Named queries can be used with data sources connected to a Greenplum® cluster through a JDBC connector.

To get data from an external source using a named query:

1. [Prepare your test data](#prepare-data).
1. [Create an external data source](#create-jdbc-source).
1. [Create a named query](#create-named-query).
1. [Create an external table and get data from it](#create-ext-table).

If you no longer need the resources you created, [delete them](#clear-out).


## Required paid resources {#paid-resources}

* Yandex MPP Analytics for PostgreSQL cluster, which includes the use of computing resources allocated to hosts, storage and backup size (see [Yandex MPP Analytics for PostgreSQL pricing](../pricing/index.md)).
* Yandex Managed Service for PostgreSQL cluster, which includes the use of computing resources allocated to hosts, storage and backup size (see [Managed Service for PostgreSQL pricing](../../managed-postgresql/pricing.md)).
* NAT gateway: Fee for hourly usage of NAT gateways and outgoing traffic they handle (see [Yandex Virtual Private Cloud pricing](../../vpc/pricing.md)).
* Public IP addresses if public access is enabled for cluster hosts (see [Virtual Private Cloud pricing](../../vpc/pricing.md)).


## Getting started {#before-you-begin}

Set up your infrastructure:

{% list tabs group=instructions %}

- Manually {#manual}

    1. [Create](../operations/cluster-create.md) a Greenplum® cluster with your preferred configuration.
    
    1. In the Greenplum® cluster [subnet](../../vpc/concepts/network.md#subnet), [set up a NAT gateway](../../vpc/operations/create-nat-gateway.md) and [create a security group](../../vpc/operations/security-group-create.md) allowing all incoming and outgoing traffic from all addresses.

    1. [Create](../../managed-postgresql/operations/cluster-create.md) a Managed Service for PostgreSQL cluster with publicly available hosts.

        {% note info %}
        
        Public access to cluster hosts is required if you plan to connect to the cluster via the internet. This connection option is simpler and is recommended for the purposes of this guide. You can connect to non-public hosts as well but only from Yandex Cloud virtual machines located in the same cloud network as the cluster.
        
        {% endnote %}

    1. In the Managed Service for PostgreSQL cluster subnet, [set up a NAT gateway](../../vpc/operations/create-nat-gateway.md) and [create a security group](../../vpc/operations/security-group-create.md) allowing all incoming and outgoing traffic from all addresses.

- Terraform {#tf}

    1. If you do not have Terraform yet, [install it](../../tutorials/infrastructure-management/terraform-quickstart.md#install-terraform).
    1. [Get the authentication credentials](../../tutorials/infrastructure-management/terraform-quickstart.md#get-credentials). You can add them to environment variables or specify them later in the provider configuration file.
    1. [Configure and initialize a provider](../../tutorials/infrastructure-management/terraform-quickstart.md#configure-provider). There is no need to create a provider configuration file manually, you can [download it](https://github.com/yandex-cloud-examples/yc-terraform-provider-settings/blob/main/provider.tf).
    1. Place the configuration file in a separate working directory and [specify the parameter values](../../tutorials/infrastructure-management/terraform-quickstart.md#configure-provider). If you did not add the authentication credentials to environment variables, specify them in the configuration file.
    1. Download the `pxf-named-queries-infrastructure.tf` [file](https://github.com/yandex-cloud-examples/yc-greenplum-pxf-named-queries/blob/main/pxf-named-queries-infrastructure.tf) to the working directory. This file describes:

         * Networks.
         * Subnets.
         * NAT gateways.
         * Security groups.
         * Greenplum® cluster in Yandex MPP Analytics for PostgreSQL.
         * Managed Service for PostgreSQL cluster.

    1. Specify the following in the file:

         * `mgp_password`: Greenplum® password.
         * `mgp_version`: Greenplum® version.
         * `mpg_password`: PostgreSQL database user password.
         * `mpg_version`: PostgreSQL version.

    1. To make sure the configuration files are correct, run this command:

         ```bash
         terraform validate
         ```

       Terraform will show any errors found in your configuration files.

    1. Create an infrastructure:

       1. Run this command to view the planned changes:
       
          ```bash
          terraform plan
          ```
       
          If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
       
       1. If everything looks correct, apply the changes:
          1. Run this command:
       
             ```bash
             terraform apply
             ```
       
          1. Confirm updating the resources.
          1. Wait for the operation to complete.

       All the required resources will be created in the specified folder. You can check resource availability and their settings in the [management console](https://console.yandex.cloud).

{% endlist %}

## Prepare your test data {#prepare-data}

1. [Connect](../../managed-postgresql/operations/connect/clients.md#bash) to the PostgreSQL database.

1. Create a table named `customers` and populate it with test data:

   ```sql
   CREATE TABLE customers(id int, name text, city text);
   INSERT INTO customers VALUES (111, 'Benjamin', 'Chicago');
   INSERT INTO customers VALUES (222, 'Mary', 'New York');
   INSERT INTO customers VALUES (333, 'Paul', 'Boston');
   INSERT INTO customers VALUES (444, 'Peter', 'Denver');
   INSERT INTO customers VALUES (555, 'Natalie', 'Austin');
   ```

1. Check the result:

   ```sql
   SELECT * FROM customers;
   ```

   ```text
   id  |  name   |    city
   ----+---------+-------------
   111 | Benjamin   | Chicago
   222 | Mary   | New York
   333 | Paul   | Boston
   444 | Peter    | Denver
   555 | Natalie | Austin
   ```

1. Create a table named `orders` and populate it with test data:

   ```sql
   CREATE TABLE orders(customer_id int, amount int, year int);
   INSERT INTO orders VALUES (111, 12, 2018);
   INSERT INTO orders VALUES (222, 234, 2019);
   INSERT INTO orders VALUES (333, 34, 2018);
   INSERT INTO orders VALUES (444, 456, 2019);
   INSERT INTO orders VALUES (555, 56, 2021);
   ```

1. Check the result:

   ```sql
   SELECT * FROM orders;
   ```

   ```text
   customer_id | amount | year
   ------------+--------+------
           111 |     12 | 2018
           222 |    234 | 2019
           333 |     34 | 2018
           444 |    456 | 2019
           555 |     56 | 2021
   ```

## Create an external data source {#create-jdbc-source}

In the Greenplum® cluster, [create an external data source](../operations/pxf/create-jdbc-source.md) with the following settings:

   * **Name**: `pgserver`.
   * **Driver**: `org.postgresql.Driver`.
   * **Url**: `jdbc:postgresql://c-<PostgreSQL_cluster_ID>.rw.mdb.yandexcloud.net:6432/<DB_name_in_PostgreSQL_cluster>`.
   * **User**: `<PostgreSQL_username>`.

   You can [get](../../managed-postgresql/operations/cluster-list.md#list-clusters) the cluster ID with the list of clusters in the folder.

## Create a named query {#create-named-query}

To create a named query, add it to the `mdb_toolkit.pxf_named_queries` table:

1. [Connect](../operations/connect/index.md) to Greenplum®.

1. Run this request:

   ```sql
   INSERT INTO mdb_toolkit.pxf_named_queries (pxf_profile, name, query) VALUES (
      'pgserver',
      'my_query',
      'SELECT c.name, o.year
      FROM customers c
      JOIN orders o ON c.id = o.customer_id;'
   );
   ```

   Where:

      * `pgserver`: Data source name.
      * `my_query`: Named query name.

   As an example of a named query, we will use one that joins the `customers` and `orders` tables by the `id = customer_id` field and returns the customer name (`c.name`) along with the year of their orders (`o.year`).

1. Check the result:

   ```sql
   SELECT * FROM mdb_toolkit.pxf_named_queries;
   ```

   ```text
   pxf_profile |    name     |              query
   ------------+-------------+--------------------------------------------
   pgserver    | my_query    | SELECT c.name, o.year                       
               |             |       FROM customers c                      
               |             |       JOIN orders o ON c.id = o.customer_id;
   ```

   The `query` column should contain the text of the named query.

## Create an external table and get data from it {#create-ext-table}

1. [Connect](../operations/connect/index.md) to Greenplum®.

1. Create an external table named `pxf_named_query`. It will reference the data the named query retrieves from the `customers` and `orders` tables in the PostgreSQL DB:

   ```sql
   CREATE READABLE EXTERNAL TABLE pxf_named_query(name text, year int)
   LOCATION ('pxf://query:my_query?PROFILE=JDBC&SERVER=pgserver')
   FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
   ```

   Learn more about the SQL syntax for creating external tables [here](../operations/pxf/create-table.md#sql-statement).


1. Get the data:

   ```sql
   SELECT * FROM pxf_named_query;
   ```

   Result:

   ```text
     name  | year
   --------+------
   Benjamin   | 2018
   Mary   | 2019
   Paul   | 2018
   Peter    | 2019
   Natalie | 2021
   ```

   {% note tip %}

   If the query fails and returns an error, wait a few minutes and try again. The changes may not have been applied yet.

   {% endnote %}

## Delete the resources you created {#clear-out}

Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them:

{% list tabs group=instructions %}

- Manually {#manual}

   1. [Delete](../operations/cluster-delete.md) the Greenplum® cluster.
   1. [Delete](../../managed-postgresql/operations/cluster-delete.md) the Managed Service for PostgreSQL cluster.
   1. [Delete](../../vpc/operations/delete-nat-gateway.md) the NAT gateways.

- Terraform {#tf}

   1. In the terminal window, go to the directory containing the infrastructure plan.
   
       {% note warning %}
   
       Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
   
       {% endnote %}
   
   1. Delete resources:
   
       1. Run this command:
   
           ```bash
           terraform destroy
           ```
   
       1. Confirm deleting the resources and wait for the operation to complete.
   
       All the resources described in the Terraform manifests will be deleted.

{% endlist %}

_Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries._