Skip to content

ndenev/k8sql

Repository files navigation

k8sql

Query Kubernetes clusters using SQL or PRQL. Powered by Apache DataFusion.

Features

  • Dual query languages — Write SQL or PRQL, auto-detected
  • Intuitive JSON paths — Use status.phase instead of status->>'phase'
  • Multi-cluster queries — Query across clusters with _cluster = '*'
  • Smart optimization — Filters pushed to K8s API (namespaces, labels, field selectors)
  • Multiple interfaces — Interactive REPL, batch mode, PostgreSQL wire protocol
  • Full CRD support — Automatically discovers Custom Resource Definitions

Installation

Quick Install (Linux/macOS)

curl -sSfL https://raw.githubusercontent.com/ndenev/k8sql/master/install.sh | sh

The script automatically detects your OS and architecture, downloads the latest release, and installs to /usr/local/bin or ~/.local/bin.

From Cargo

cargo install k8sql

Manual Download

Download pre-built binaries for Linux, macOS, and Windows from GitHub Releases.

Quick Start

# Interactive REPL
k8sql

# SQL query
k8sql -q "SELECT name, namespace, status.phase FROM pods"

# PRQL query (auto-detected)
k8sql -q "from pods | filter status.phase == 'Running' | select {name, namespace}"

# Multi-cluster query
k8sql -c "prod-*" -q "SELECT _cluster, name FROM pods WHERE status.phase = 'Failed'"

# Output as JSON (or YAML or CSV)
k8sql -q "SELECT * FROM deployments" -o json

SQL vs PRQL

k8sql supports both SQL and PRQL — just write your query and it's auto-detected:

Task SQL PRQL
Basic query SELECT name FROM pods LIMIT 5 from pods | select {name} | take 5
Filter WHERE namespace = 'default' filter namespace == "default"
Sort descending ORDER BY created DESC sort {-created}
Aggregate COUNT(*) ... GROUP BY namespace group namespace (aggregate {count this})
JSON field status.phase or status->>'phase' status.phase
Regex match WHERE name ~ 'nginx.*' filter name ~= "nginx.*"
Null coalesce COALESCE(namespace, 'default') namespace ?? "default"

CLI Options

-q, --query <SQL>       Execute a SQL query directly
-c, --context <CTX>     Kubernetes context(s): name, comma-separated, or glob pattern
-o, --output <FMT>      Output format: table, json, csv, yaml (default: table)
-f, --file <PATH>       Execute queries from a file
    --no-headers        Omit column headers in output
    --refresh-crds      Force refresh of cached CRD schemas
-v, --verbose           Enable verbose logging

Subcommands

k8sql interactive       # Start REPL (default)
k8sql daemon -p 15432   # PostgreSQL wire protocol server

SQL Extensions

The _cluster Column

Every table includes _cluster as the first column, enabling cross-cluster queries:

SELECT * FROM pods WHERE _cluster = 'prod'           -- Specific cluster
SELECT * FROM pods WHERE _cluster IN ('prod', 'dev') -- Multiple clusters
SELECT * FROM pods WHERE _cluster = '*'              -- All clusters (parallel)

Intuitive JSON Access

Intuitive JSON path access vs the more verbose -> operators.

-- Intuitive syntax (auto-converted)
SELECT name, status.phase FROM pods
SELECT name, spec.containers[0].image FROM pods
SELECT * FROM pods WHERE labels.app = 'nginx'

-- Array expansion (one row per container)
SELECT name, spec.containers[].image FROM pods

This works in both SQL and PRQL - no special syntax needed.

Supported patterns:

Syntax Meaning Converted To
status.phase Field access status->>'phase'
spec.selector.app Nested fields spec->'selector'->>'app'
spec.containers[0] Array index spec->'containers'->0
spec.containers[] Array expansion UNNEST(json_get_array(...))

Advanced: PostgreSQL arrow operators (also supported):

SELECT name, status->>'phase' FROM pods
SELECT name, status->'nodeInfo'->>'kubeletVersion' FROM nodes

JSON functions for complex access:

SELECT json_get_str(spec, 'containers', 0, 'image') FROM pods
SELECT json_get_int(spec, 'replicas') FROM deployments

Available JSON functions: json_get_str, json_get_int, json_get_float, json_get_bool, json_get_json, json_get_array, json_length, json_keys.

Why PRQL?

PRQL (Pipelined Relational Query Language) offers a more readable alternative to SQL:

  • Readable pipelines — Data flows left-to-right: from pods | filter ... | select ...
  • Less repetition — No need to repeat columns in GROUP BY
  • Composable — Build complex queries incrementally
  • Same JSON paths — Use status.phase just like in SQL

PRQL-specific operators:

Operator Meaning Example
~= Regex match filter name ~= "nginx.*"
?? Null coalesce derive ns = namespace ?? "default"
f"..." String interpolation derive full = f"{namespace}/{name}"
-col Sort descending sort {-created}

PRQL is auto-detected when queries start with from, let, or prql.

Query Optimization

k8sql pushes predicates to the Kubernetes API when possible:

Predicate Optimization
namespace = 'x' Namespaced API call
labels->>'app' = 'nginx' Label selector (server-side)
_cluster = 'prod' Only queries that cluster
_cluster = '*' Parallel queries to all clusters

Use -v to see which filters are pushed down.

Table Schema

All resources share a consistent schema:

Column Type Description
_cluster text Kubernetes context name
api_version text API version (v1, apps/v1, etc.)
kind text Resource kind
name text Resource name
namespace text Namespace (null for cluster-scoped)
uid text Unique identifier
created timestamp Creation timestamp
labels json Labels (access with labels->>'key')
annotations json Annotations
spec json Resource specification
status json Resource status

Special cases: ConfigMaps/Secrets have data instead of spec/status. Events have dedicated columns, etc. CRDs have their top level fields discovered.

REPL Commands

SHOW TABLES          List available tables
SHOW DATABASES       List kubectl contexts (* = active)
DESCRIBE <table>     Show table schema
USE <cluster>        Switch context(s) - supports globs: USE prod-*
\dt, \l, \d, \x, \q  Shortcuts

Context selection persists to ~/.k8sql/config.json and restores on next startup.

Daemon Mode

Run as a PostgreSQL-compatible server:

k8sql daemon --port 15432 --context "prod-*"
psql -h localhost -p 15432

Caching

k8sql caches CRD schemas for fast startup:

  • CRD schemas: Cached indefinitely (use --refresh-crds to force refresh)
  • Cluster CRD list: Checked hourly for new/removed CRDs
  • Cache location: ~/.k8sql/cache/

Example Queries

SQL

-- Pods with status (using intuitive dot notation)
SELECT name, namespace, status.phase FROM pods

-- Container images from pods
SELECT name, spec.containers[0].image FROM pods WHERE namespace = 'default'

-- Unhealthy deployments
SELECT name, spec.replicas as desired, status.readyReplicas as ready
FROM deployments
WHERE status.readyReplicas < spec.replicas

-- Cross-cluster pod count
SELECT _cluster, COUNT(*) FROM pods WHERE _cluster = '*' GROUP BY _cluster

-- Find pods by label (server-side filtering)
SELECT name FROM pods WHERE labels.app = 'nginx' AND namespace = 'default'

-- All container images (array expansion)
SELECT name, spec.containers[].image as container_image FROM pods

PRQL

# Pods with status
from pods | select {name, namespace, phase = status.phase}

# Filter by phase
from pods
filter status.phase == "Running"
select {name, namespace}
take 10

# Unhealthy deployments
from deployments
filter status.readyReplicas < spec.replicas
select {name, desired = spec.replicas, ready = status.readyReplicas}

# Pods by label
from pods
filter labels.app == "nginx" && namespace == "default"
select {name, namespace}

# Aggregation - pod count by namespace
from pods
group namespace (aggregate {count = count this})
sort {-count}

Built With

License

BSD-3-Clause

About

A tool to query Kubernetes clusters using SQL

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •