Functions v1.3

The reference section is a list of functions available with Analytics Accelerator.

Refer to PGFS functions for Pipelines for details on how to create storage locations for PGAA.

Table functions

pgaa.lakehouse_table_stats()

Returns storage statistics for a specific analytical table, including the size of the latest active snapshot and the total cumulative size of all data versions (historical data and logs) stored in object storage.

Synopsis

SELECT * FROM pgaa.lakehouse_table_stats('table_name'::regclass);

Parameters

ParameterTypeDescription
relationREGCLASSThe name or OID of the analytical table to investigate.

Returns

ColumnTypeDescription
latest_snapshot_sizeBIGINTThe size in bytes of the latest active snapshot of the table.
total_sizeBIGINTThe total size in bytes of all files associated with the table in object storage, including metadata, transaction logs, and historical snapshots.

Catalog functions

pgaa.add_catalog()

Registers a new Iceberg catalog with PGAA. This function performs an automated connection check to validate credentials and accessibility before the catalog is registered in the system.

Synopsis

SELECT pgaa.add_catalog('catalog_name', 'catalog_type', 'catalog_options'::json);

Parameters

ParameterTypeDescription
catalog_nameVARCHARA unique name for the catalog within PGAA.
catalog_typepgaa.catalog_typeThe catalog type. Supported values are iceberg-rest (Iceberg REST catalog), and iceberg-s3tables (AWS S3 Tables).
catalog_optionsJSONA JSON object containing the the connection and authentication parameters.

Depending on which storage you use, your JSON file for the catalog_options must specify different options.

  • For REST catalogs

    {
    "url": "https://your-catalog-endpoint.com",
    "warehouse_name": "your_warehouse_name",
    "warehouse": "your_warehouse_id",
    "token": "your_secret_auth_token"
    "danger_accept_invalid_certs": "false"
    }

    Where:

    • url: The base HTTP(S) endpoint of the REST catalog service.
    • warehouse_name: A human-readable alternative to the warehouse ID, supported by some REST providers for easier configuration.
    • warehouse: The unique identifier for the specific warehouse within the catalog service.
    • danger_accept_invalid_certs: If set to true, Postgres skips SSL certificate validation. Use this only for internal testing or with self-signed certificates; never use it for sensitive public connections.
  • For AWS S3 Tables:

    {
    "arn": "arn:aws:s3tables:us-east-1:1234567890:bucket/my-bucket",
    "region": "us-east-1"
    }

    Where:

    • arn: Specifies the Amazon Resource Name (ARN), the unique identifier for your S3 Table bucket.
    • region: Specifies the physical AWS data center location where your S3 Table bucket resides.

Returns

Returns the name of the catalog on success.

pgaa.update_catalog()

Updates the configuration options (the JSON object) for an existing Iceberg catalog. Like pgaa.add_catalog(), this function performs a validation check to ensure the new connection parameters are functional before applying the changes.

Synopsis

SELECT * FROM pgaa.update_catalog('catalog_name', 'new_options'::json);

Parameters

ParameterTypeDescription
catalog_nameVARCHARA unique name for the catalog within PGAA.
new_optionsJSONA JSON object containing the updated connection and authentication parameters.

See pgaa.add_catalog() for a detailed breakdown of the required JSON fields for each catalog type.

Returns

Returns the name of the catalog upon successful update.

pgaa.delete_catalog()

Removes a registered catalog from the database.

Synopsis

SELECT * FROM pgaa.delete_catalog('catalog_name');

Parameters

ParameterTypeDescription
catalog_nameVARCHARThe name of the catalog to be removed.

Returns

Returns the name of the deleted catalog upon successful completion.

pgaa.list_catalogs()

Returns a list of all registered catalogs in the system, including their connection configuration, metadata synchronization timestamps, and current operational status.

Synopsis

SELECT * FROM pgaa.list_catalogs();

Parameters

None.

Returns

ColumnTypeDescription
nameTEXTThe name of the catalog.
typepgaa.catalog_typeThe catalog type. Supported values are iceberg-rest (Iceberg REST catalog), and iceberg-s3tables (AWS S3 Tables).
optionsJSONThe connection parameters (URL, ARN, etc.) used for this catalog.
statuspgaa.catalog_statusThe current health of the catalog ( detached, attached, refresh_retry, or refresh_failed).
created_atTIMESTAMPTZThe timestamp when the catalog was first registered.
last_refreshed_atTIMESTAMPTZThe last time PGAA successfully synced metadata from this catalog.

pgaa.import_catalog()

Performs a one-time scan and import of table definitions from a registered Iceberg catalog into Postgres. This function creates the local metadata required for PGAA to query the remote tables. This is a manual, once-off import and does not enable automatic, continuous synchronization.

Synopsis

SELECT pgaa.import_catalog('catalog_name');

Parameters

ParameterTypeDescription
catalog_nameVARCHARThe name of the previously registered catalog to import from.

Returns

None.

pgaa.attach_catalog()

Enables continuous metadata synchronization for a previously registered Iceberg catalog (using pgaa.add_catalog()). Once attached, PGAA automatically monitors the remote catalog for changes and updates the local Postgres metadata accordingly. See Catalog synchronization for polling rate configuration.

Synopsis

SELECT pgaa.attach_catalog('catalog_name');    

Parameters

ParameterTypeDescription
catalog_nameVARCHARThe name of the registered catalog to start synchronizing.

Returns

None.

pgaa.detach_catalog()

Stops continuous metadata synchronization for a registered Iceberg catalog and moves it to a detached state.

Synopsis

SELECT * FROM pgaa.detach_catalog('catalog_name');

Parameters

ParameterTypeDescription
catalog_nameVARCHARThe name of the registered catalog to detach.

Returns

The function returns the row from the pgaa.catalog system table for the catalog being detached.

ColumnTypeDescription
nameTEXTThe name of detached catalog.
typepgaa.catalog_typeThe catalog type.
statuspgaa.catalog_statusThe new status, which will be detached.

pgaa.test_catalog()

Tests the connectivity and configuration of a registered Iceberg catalog. This function verifies that the Postgres instance can communicate with the remote catalog endpoint and, optionally, validates that the provided credentials have write permissions.

Synopsis

SELECT pgaa.test_catalog('catalog_name', test_writes:=true);

Parameters

ParameterTypeDescription
nameTEXTThe name of the registered catalog to test.
text_writesBOOLEANIf true, the function attempts a write operation to the catalog metadata service to verify permissions. If false, only read permissions are tested.

Returns

Returns NULL if the test is successful. Returns a descriptive error message if the test fails.

Spark functions

pgaa.spark_sql()

Executes a Spark SQL query directly on your Postgres cluster via the configured Spark Connect endpoint. This allows you to run Iceberg compaction routines or Spark functions that aren't available in Postgres.

See Spark procedures for a list of the available procedures.

To run this function, you must set the configuration parameter pgaa.spark_connect_url to point to an available Spark Connect service.

Synopsis

For a single catalog:

SELECT pgaa.spark_sql('query', 'catalog_name');

For multiple catalogs:

SELECT pgaa.spark_sql('query', ARRAY['catalog1', 'catalog2']);

Parameters

ParameterTypeDescription
queryTEXTThe Spark SQL statement to execute.
catalogTEXT or TEXT[]A single catalog name, or an array of catalog names to use for the query.

Returns

The result set of the Spark query formatted as a JSON object.

Example

Reduce metadata overhead via the rewrite_data_files Spark task:

SELECT pgaa.spark_sql($$  
            CALL preexisting.system.rewrite_data_files(  
              table => '"preexisting"."ns-1"."table-1"',  
              strategy => 'sort',  
              sort_order => 'value DESC',  
              options => map('rewrite-all', 'true')  
            )  
        $$);  

Background task functions

pgaa.launch_task()

Schedules a background maintenance task for an analytical Delta table.

Synopsis

SELECT pgaa.launch_task(
    'table_name'::regclass, 
    'task_type', 
    'task_options'::jsonb, 
    'scheduled_at'::timestamp
);

Parameters

ParameterTypeDescription
table_nameREGCLASSThe name or OID of the analytical table to run the task on.
task_typeTEXTThe maintenance operation: compaction, zorder, vacuum, or purge.
task_optionsJSONBConfiguration specific to the task type (see below).
scheduled_atTIMESTAMPIf provided, the task will wait until this time to execute. Default is NULL.

The values for the JSONB task_options depend on each task_type. All options are optional except zorder, which requires columns, and purge, which requires both storage_location and path.

  • compaction: Merges small files into larger ones to speed up analytical scans. The available task_options are:

    {
    "target_size": 536870912,
    "preserve_insertion_order": true,
    "max_concurrent_tasks": 10,
    "max_spill_size": 2147483648,
    "min_commit_interval": 60
    }

    Where:

    • target_size: Specifies the size of the output files in bytes.
    • preserve_insertion_order: Whether to maintain the existing sort order of rows.
    • max_concurrent_tasks: Limits the number of parallel tasks the executor can run.
    • max_spill_size: Sets the maximum data size in bytes allowed to spill to disk during the process.
    • min_commit_interval: Sets the minimum wait time in seconds between committing updates to the Delta log.
  • zorder: A clustering technique that reorganizes data across multiple columns to improve "data skipping" for queries with filters on those columns. The available task_options are:

    {
    "columns": ["customer_id", "transaction_date"],        
    "target_size": 1073741824,
    "preserve_insertion_order": false,
    "max_concurrent_tasks": 4,
    "max_spill_size": 2147483648,
    "min_commit_interval": 30
    }

    Where:

    • columns: (Required) An array of strings representing the columns to be used for Z-ordering.
    • target_size: Specifies the size of the output files in bytes.
    • preserve_insertion_order: Whether to maintain the existing sort order of rows.
    • max_concurrent_tasks: Limits the number of parallel tasks the executor can run.
    • max_spill_size: Sets the maximum data size in bytes allowed to spill to disk during the process.
    • min_commit_interval: Sets the minimum wait time in seconds between committing updates to the Delta log.
  • vacuum: Deletes old data files that are no longer referenced by the Delta transaction logs, freeing up space in object storage. The available task_options are:

    {
    "retention_period": "168 hours",
    "dry_run": false,
    "enforce_retention_duration": true
    }

    Where:

    • retention_period: Defines the age at which unreferenced files become eligible for deletion.
    • dry_run: If true, calculates and logs the files that would be deleted, but performs no deletions.
    • enforce_retention_duration: If true, the task validates the retention_period against the system's global minimum safety limit.
  • purge: Explicitly removes data from a specific storage path. The available task_options are:

    {
    "storage_location": "s3_main",
    "path": "archive/2023/temp/"
    }

    Where:

    • storage_location: Required. The name of the storage location.
    • path: The relative directory path or file prefix within the storage location that should be permanently deleted.

Returns

A unique task ID for the task.

You can check the task status by querying the pgaa.background_task table and the provided task ID.

Examples

  • Delete old data files prior to the last 7 days:

    SELECT pgaa.launch_task(
        'sales.transactions', 
        'vacuum', 
        '{"retention_period": "7 days", "dry_run": false}'::jsonb
    );
  • Perform compaction:

    SELECT pgaa.launch_task(
        'telemetry.logs', 
        'compaction', 
        '{
            "target_size": 536870912, 
            "max_concurrent_tasks": 2
        }'::jsonb
    );
  • Reorganize data rows by clustering on columns region and customer_id:

    SELECT pgaa.launch_task(
        'crm.customers', 
        'zorder', 
        '{
            "columns": ["region", "customer_id"], 
            "target_size": 1073741824
        }'::jsonb
    );

Postgres Distributed (PGD) integration functions

pgaa.convert_to_analytics()

Converts a PGD-replicated table (HTAP) to the PGAA access method. This function points the table directly to the analytics data residing in object storage (Iceberg or Delta format) that has been synchronized via replication.

Synopsis

CALL pgaa.convert_to_analytics('relation_name'::regclass);

Parameters

ParameterTypeDescription
relationREGCLASSThe name or OID of the PGD-replicated table (HTAP) to convert to analytics.

Returns

None.

Note

To convert a heap analytics table to a PGAA table, you can also run an ALTER TABLE command and SET ACCESS METHOD PGAA. See Table options for details.

pgaa.convert_to_tiered_table()

Converts a heap table into a PGD-managed tiered table with automatic partitioning and analytics offloading. This function automates the creation of a partitioned staging table, migrates existing data, and configures the lifecycle policies for both hot and cold tiers. See Implementing tiered tables for details.

Synopsis

CALL pgaa.convert_to_tiered_table(
    relation := 'relation_name'::regclass,
    range_partition_column := 'column_name',
    partition_increment := 'interval_string',
    analytics_offload_period := 'interval',
    initial_lower_bound := 'start_date_string',
    retention_period := 'interval' DEFAULT NULL,
    enable_replication := boolean DEFAULT false,
    minimum_advance_partitions := integer DEFAULT 2,
    maximum_advance_partitions := integer DEFAULT 5,
    drop_after_retention_period := boolean DEFAULT true,
    purge_analytics_target := boolean DEFAULT false
);
Note

The column used for range partitioning must be a primary key.

Tables with sequences or foreign key constraints are not supported with tiered tables. You must drop the sequence or constraint before converting to tiered table.

Parameters

ParameterTypeDescription
relationREGCLASSThe name or OID of the heap table to convert to tiered table.
range_partition_columnTEXTThe column used for range partitioning. It must be DATE or TIMESTAMP, and it must also be included in the primary key.
partition_incrementTEXTThe time interval defining the size of each partition, such as '1 month'.
analytics_offload_periodINTERVALThe threshold age at which a partition is moved from local storage to the cold tier in the data lake.
initial_lower_boundTEXTThe starting value for the first partition's lower bound.
retention_periodINTERVAL(Optional) The period after which data is purged from the cold tier. Default is NULL.
enable_replicationBOOLEAN(Optional) If true, enables real-time replication to object storage for active partitions (as an HTAP table). Default is false.
minimum_advance_partitionsINTEGER(Optional) Minimum number of future partitions to maintain. Default is 2.
maximum_advance_partitionsINTEGER(Optional) Maximum number of future partitions to maintain. Default is 5.
drop_after_retention_periodBOOLEAN(Optional) Whether to drop the partition locally after the retention period expires. Defaults is true.
purge_analytics_targetBOOLEAN(Optional) If true, any pre-existing tables in the object store directory or catalog entry are deleted. If set to false and pre-existing tables are detected, the command will fail with an error. Supported in PGD 6.3+. Default is false.

Returns

None.

Example

Converts a heap table into a tiered structure where data is partitioned by month, replicated to the cloud for real-time analytics, and automatically offloaded to cold storage once it is older than three months.

CALL pgaa.convert_to_tiered_table(
    relation                    := 'public.sensor_readings'::regclass,
    range_partition_column      := 'reading_time',
    partition_increment         := '1 month',
    analytics_offload_period    := '3 months',
    initial_lower_bound         := '2026-01-01',
    retention_period            := '2 years',
    enable_replication          := true,
    minimum_advance_partitions  := 3,
    maximum_advance_partitions  := 6,
    drop_after_retention_period := true,
    purge_analytics_target      := true
);

pgaa.disable_analytics_replication()

Disables PGD replication to object storage for an HTAP table. This function reverts the table from an HTAP state back to a Heap state by stopping the continuous replication stream to object storage. See Replicating to analytics for details.

Synopsis

CALL pgaa.disable_analytics_replication('relation_name'::regclass);

Parameters

ParameterTypeDescription
relationREGCLASSThe name or OID of the HTAP table for which to disable analytics replication.

Returns

None.

Note

To disable replication for an HTAP table, you can also run an ALTER TABLE command and SET pgd.replicate_to_analytics = FALSE. See Table options for details.

pgaa.enable_analytics_replication()

Enables PGD replication for a heap table to object storage for analytics. This function transitions a standard Heap table to an HTAP table by initiating a continuous replication stream to the analytics engine. See Replicating to analytics for details.

Synopsis

CALL pgaa.enable_analytics_replication(
    relation regclass,
    purge_analytics_target boolean DEFAULT false
);

Parameters

ParameterTypeDescription
relationREGCLASSThe name or OID of the heap table for which to enable analytics replication.
purge_analytics_targetBOOLEAN(Optional) If true, purges existing data in the analytics target before starting replication. Requires PGD 6.3 or later. Default is false.

Returns

None.

Note

To enable replication for a heap table, you can also run an ALTER TABLE command and SET pgd.replicate_to_analytics = TRUE. See Table options for details.

pgaa.htap_table_stats_uncached()

Returns statistics for HTAP tables in PGD environments.

Synopsis

SELECT * FROM pgaa.htap_table_stats_uncached(
    htap_catalog text,
    htap_catalog_namespace text,
    htap_catalog_table text,
    htap_storage_location text,
    htap_storage_path text,
    htap_format text
);

Parameters

ParameterTypeDescription
htap_catalogTEXTThe name of the HTAP catalog.
htap_catalog_namespaceTEXTThe namespace/schema within the HTAP catalog.
htap_catalog_tableTEXTThe specific table name in the HTAP catalog.
htap_storage_locationTEXTThe URI or identifier for the object storage location.
htap_storage_pathTEXTThe specific path within the storage location where data resides.
htap_formatTEXTThe data format used in storage.

Returns

ColumnTypeDescription
latest_snapshot_sizeBIGINTThe size of the most recent data snapshot in bytes.
total_sizeBIGINTThe total cumulative size of all data associated with the table in bytes.

pgaa.list_tiered_tables()

Returns detailed information about PGD-managed tiered tables, including their partition settings, data retention policies, analytics offloading configurations, and storage size metrics.

Synopsis

SELECT * FROM pgaa.list_tiered_tables();

Parameters

None.

Returns

ColumnTypeDescription
schema_oidOIDThe internal object identifier for the schema.
table_oidOIDThe internal object identifier for the table.
schema_nameTEXTName of the schema containing the tiered table.
table_nameTEXTName of the tiered table.
partition_incrementINTERVALThe interval defined for automatic partitioning.
retentionINTERVALThe duration for which data is kept before being dropped.
analytics_offloadINTERVALThe period after which data is offloaded to the analytics engine.
replication_enabledBOOLEANIndicates if replication to analytics is active.
tiered_data_sizeBIGINTTotal size of data stored in object storage in bytes.
untiered_data_sizeBIGINTTotal size of data remaining in local storage in bytes.

pgaa.restore_from_analytics()

Converts a PGAA analytics table residing in object storage back to the standard PostgreSQL heap access method.

Synopsis

CALL pgaa.restore_from_analytics('relation_name'::regclass);

Parameters

ParameterTypeDescription
relationREGCLASSThe name or OID of the PGAA analytics table to restore to a heap table.

Returns

None.

Note

To convert a PGAA analytics table to heap, you can also run an ALTER TABLE command and SET ACCESS METHOD heap. See Table options for details.