Skip to content

Feat: Add GKE Postgres Benchmarking Support (Sysbench & CNPG HA)#6465

Draft
manojcns wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
manojcns:postgres-gke
Draft

Feat: Add GKE Postgres Benchmarking Support (Sysbench & CNPG HA)#6465
manojcns wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
manojcns:postgres-gke

Conversation

@manojcns
Copy link

@manojcns manojcns commented Feb 12, 2026

This PR introduces benchmarking capabilities for PostgreSQL on Google Kubernetes Engine (GKE), adding support for both standalone and High Availability (HA) topologies. It specifically targets newer infrastructure options like C4 machine types and Hyperdisk Balanced storage.

Summary

The contribution adds two new benchmarks:

  1. postgres_sysbench_gke: A standalone benchmark deploying PostgreSQL as a StatefulSet with a Sysbench client pod.
  2. postgres_cnpg_benchmark: An HA benchmark leveraging the CloudNativePG (CNPG) operator to deploy a Primary-Replica cluster.

These benchmarks allow for rigorous performance testing of GKE storage and networking, including support for advanced configurations like HugePages, Host Networking, and PostgreSQL parameter tuning.

Key Features

  • New Standalone Benchmark (postgres_sysbench_gke):

    • Pod-Integrated Sysbench: Runs the load generator within the cluster (Client Pod) to test internal network performance without VM hops.
    • Optimization Profiles: Includes predefined profiles (e.g., v6, v1+v6+v4) that automatically tune shared_buffers, effective_io_concurrency, and kernel limits.
    • Private Networking: Traffic is routed exclusively via private Pod IPs for security and realism.
  • New HA Benchmark (postgres_cnpg_benchmark):

    • Operator Lifecycle: Automates the installation of CloudNativePG and the provisioning of a 3-node HA cluster (1 Primary, 2 Replicas).
    • Host Networking Support: Bypass the Kubernetes CNI overlay for maximum network throughput using the hostnetwork profile.
  • Infrastructure Intelligence:

    • Auto-Disk Selection: Automatically maps machine families to their best disk types (e.g., C4 -> hyperdisk-balanced, N2 -> pd-ssd).
    • HugePages Integration: Dynamically configures GKE node system configs to reserve HugePages and updates Postgres to use them.

Code Structure & Implementation Details

-- New Files(Benchmarks & Logic)

  • [perfkitbenchmarker/linux_benchmarks/postgres_sysbench_gke_benchmark.py]: Standalone Benchmark Logic. Implements optimization profiles (v1-v6), HugePages, and Host Networking for single-node Postgres.
  • [perfkitbenchmarker/linux_benchmarks/postgres_cnpg_benchmark.py]: HA Benchmark Logic. Orchestrates a CloudNativePG (CNPG) cluster (Primary + 2 Replicas) and manages the Sysbench client.
  • [perfkitbenchmarker/resources/kubernetes/init.py]: (New) Ensures proper package initialization for custom Kubernetes resources.

-- New Files (Kubernetes Templates & Manifests)

  • [perfkitbenchmarker/data/container/postgres_sysbench/postgres_all.yaml.j2]: Main StatefulSet & Service template for the standalone Postgres database.
  • [perfkitbenchmarker/data/container/postgres_sysbench/client_pod.yaml.j2]: Template for the Sysbench oltp_read_write load generator pod.
  • [perfkitbenchmarker/data/container/postgres_sysbench/hugepages-daemonset.yaml.j2]: DaemonSet to enable Massive HugePages on GKE nodes (Optimization Profile v4).
  • [perfkitbenchmarker/data/container/postgres_sysbench/hugepages-node-config.yaml]: Supporting config for the HugePages daemonset.
  • [perfkitbenchmarker/data/container/postgres_cnpg/postgres_cluster.yaml.j2]: CNPG Cluster definition (HA Topology) for the postgres_cnpg benchmark.
  • [perfkitbenchmarker/data/container/postgres_cnpg/hyperdisk_storageclass.yaml]: Custom StorageClass definition for Hyperdisk Balanced (C4 Support).
  • [perfkitbenchmarker/data/container/postgres_cnpg/pd_ssd_storageclass.yaml]: Custom StorageClass definition for standard PD-SSD (N2 Support).
  • [perfkitbenchmarker/data/container/postgres_cnpg/kyverno_policy.yaml]: Policy manifest for enforcing cluster compliance during HA runs.

-- New Files (Documentation)

  • [docs/GKE_PostgreSQL_Quickstart_generic.MD]: Comprehensive Quickstart guide with example commands for Baseline, Optimized, and HA runs.
  • [docs/Technical_Architecture_PostgreSQL_PKB.md]: Deep-dive architecture document explaining private networking, optimization profiles (v6), and HA design.

-- Modified Files

  • [perfkitbenchmarker/resources/container_service/kubernetes_cluster.py]:
    * Restored: Added back ApplyManifest and WaitForResource methods to KubernetesCluster for backward compatibility.
    * Robustness: Increased PVC deletion timeout to 15 minutes (was 5m) to fix teardown flakes with large disks.

  • [perfkitbenchmarker/providers/gcp/google_kubernetes_engine.py]: Minor compatibility updates to support the restored container service methods.

  • [perfkitbenchmarker/linux_packages/sysbench.py]: (Minor) Updates to Sysbench package installation logic for compatibility with newer Debian/Ubuntu images.

Example Run Command

python3 pkb.py \
    --benchmarks=postgres_sysbench_gke \
    --cloud=GCP \
    --vm_platform=Kubernetes \
    --zone=us-central1-a \
    --project=$PROJECT_ID \
    --postgres_gke_server_machine_type=c4-standard-16 \
    --postgres_gke_client_machine_type=c4-standard-16 \
    --postgres_gke_client_mode=pod \
    --postgres_gke_disk_type=hyperdisk-balanced \
    --postgres_gke_disk_size=500 \
    --postgres_gke_optimization_profile=v6 \
    --sysbench_tables=10 \
    --sysbench_table_size=4000000 \
    --sysbench_run_threads=512 \
    --sysbench_run_seconds=300 \
    --sysbench_testname=oltp_read_write \
    --metadata=cloud:GCP \
    --metadata=geo:us-central1 \
    --metadata=scenario:postgres_optimized_v6 \
    --metadata=optimization_profile:v6 \
    --temp_dir=./pkb_temp \
    --run_stage_iterations=1 \
    --owner=$(whoami | tr '.' '-') \
    --log_level=error \
    --accept_licenses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants