Postgres Analytics Accelerator (PGAA) v1.7

Postgres Analytics Accelerator (PGAA) is a high-performance extension that enables Postgres to query large-scale data stored in open table formats like Delta Lake, Apache Iceberg, and Parquet. By offloading heavy analytical queries to a vectorized execution engine, PGAA bridges the gap between operational databases and data lakes.

Get started

  • Compatibility: Check supported PostgreSQL versions, operating systems, and other requirements.

  • Architecture: Understand the core architecture and how the vectorized engine works.

  • Core concepts: Understand the fundamental principles of vectorized execution, data lake integration, and DirectScan.

  • Quickstart guide: Install PGAA, create a storage location and read table from our sample benchmark datasets.

Using PGAA

Replicating with PGD

  • Implementing tiered tables: Combine PGD AutoPartition and PGAA to create an automated data lifecycle. Move older partitions to object storage while keeping recent data in Postgres tables.
  • Replicating to analytics: Convert standard heap tables into HTAP tables. Use continuous logical replication to maintain a real-time analytical copy of your transactional data in the data lake.
  • Offloading to analytics: Perform surgical storage management by manually moving entire HTAP tables to the cold tier, truncating local data to reclaim disk space immediately.

Performance & optimization

  • Accelerate with Spark: Offload massive datasets and complex distributed joins to a remote Spark cluster via Spark Connect. PGAA offers two integration modes depending on your performance requirements:

    • Standard Spark integration: Leverage a remote Spark cluster for high-concurrency analytical queries and distributed processing.

    • GPU-Accelerated Spark: Integrate with the NVIDIA RAPIDS Accelerator for Apache Spark to leverage GPU acceleration.

  • Monitor and maintain your analytical tables: Audit storage utilization, monitor table health, and perform table maintenance tasks for PGAA-managed tables.

  • Optimize query performance: Maximize query speeds by managing DirectScan execution, configuring compute pushdowns, and troubleshooting path fallbacks.

Reference

  • Configuration parameters: The behavior of the PGAA extension is governed by Grand Unified Configuration (GUC) variables. These parameters allow you to switch executors, enable performance optimizations, and manage security credentials.

  • Functions: PGAA introduces a suite of SQL functions for administrative tasks, such as mapping new tables, monitoring storage health, and launching maintenance background jobs.

  • Table options: When mapping or creating analytical tables, specific options allow you to define how data is read from or written to your object store.

  • Data types: PGAA maps native Postgres data types to optimized columnar formats in the data lake.

  • Datasets: Access pre-configured schemas and data loading instructions for analytical datasets to baseline your performance.