Simian Forge

A command-line tool for generating synthetic data and sending it to Elasticsearch. Supports both host metrics (OpenTelemetry and Elastic Metricbeat formats) and weather station data (FieldSense format).

AI Coding Experiment

Note: This project was developed as an experiment to evaluate AI coding tools, specifically Claude Code. The goal was to create a complete, production-ready tool without writing a single line of code manually - instead relying entirely on AI guidance and code generation.

The experiment successfully demonstrates that AI coding assistants can:

Understand complex technical requirements and specifications
Generate comprehensive TypeScript applications with proper architecture
Implement industry-standard protocols (OpenTelemetry, Elasticsearch APIs)
Create realistic data simulation with proper statistical distributions
Handle error cases, logging, and production concerns
Produce well-documented, maintainable code

This serves as a proof-of-concept for AI-assisted software development workflows and the potential for natural language programming.

Overview

Simian Forge simulates realistic synthetic data for Elasticsearch, supporting three main datasets:

Host Metrics: CPU, memory, network, disk I/O, filesystem, and process statistics in OpenTelemetry and/or Elastic Metricbeat formats
Weather Station Data: Environmental sensors, solar panels, energy consumption, and system metrics in FieldSense format
Unique Metrics: Configurable cardinality testing with unique metric names for system performance evaluation
Histograms: Time series dataset with histogram (t-digest-like + HDR-like) and exponential_histogram fields for distribution testing

This makes it ideal for testing monitoring systems, dashboards, alerting rules, and time series visualizations.

Key Features

Multiple Datasets: Host metrics, weather station data, and unique metrics generation
Histogram Field Types: Generates histogram and exponential_histogram fields for aggregation/testing
Format Support: OpenTelemetry, Elastic Metricbeat, and FieldSense formats
Cardinality Testing: Configurable unique metric generation for performance testing
Realistic Data Generation: Correlated metrics, smooth transitions, and realistic patterns
Backfill & Real-time: Historical data backfill with configurable real-time generation
Time Series Support: Elasticsearch time series data streams with proper routing
Cloud Provider Simulation: Deterministic configurations with AWS, GCP, Azure specifics
OpenTelemetry Instrumentation: Full tracing support with configurable OTLP collector
Development Tools: Data stream purging for schema changes and fresh starts

Installation

Prerequisites

Docker (recommended)
OR Node.js 18+ and npm for local development
Elasticsearch cluster (optional, included in Docker Compose)
OpenTelemetry Collector (optional, included in Docker Compose)
Kibana Setup: Before indexing data, go to "Integrations" in Kibana and install the "System" integration to ensure proper index templates and mappings are configured

Docker Setup (Recommended)

The easiest way to get started is with Docker, which provides a complete testing environment:

Clone the repository:

git clone <repository-url>
cd simian-forge

Build the Docker image:

docker build -t simianhacker/simian-forge:latest .

Run with Docker Compose (includes Elasticsearch, Kibana, and OpenTelemetry):

# Start the full stack
docker compose --profile full-stack up -d

# View logs
docker compose logs -f simian-forge

# Stop the stack
docker compose --profile full-stack down

Local Development Setup

Clone the repository:

git clone <repository-url>
cd simian-forge

Install dependencies:

npm install

Build the project:

npm run build

Usage

Docker Usage (Recommended)

Using Docker Compose

The easiest way to get started with a complete testing environment:

# Start with default settings
docker compose --profile full-stack up -d

# Customize with environment variables
INTERVAL=30s COUNT=20 FORMAT=otel docker compose --profile full-stack up -d

# Run one-time data generation
docker compose run --rm simian-forge --purge --backfill now-2h

# Connect to external Elasticsearch with API key
ELASTICSEARCH_URL=https://my-cluster.com:9200 \
ELASTICSEARCH_API_KEY=your-api-key-here \
docker compose --profile full-stack up -d

# Or with username/password
ELASTICSEARCH_URL=https://my-cluster.com:9200 \
ELASTICSEARCH_AUTH=myuser:mypass \
docker compose --profile full-stack up -d

Using Docker Directly

Run the container directly (bring your own Elasticsearch):

# Basic usage with API key
docker run --rm simianhacker/simian-forge:latest \
  --elasticsearch-url http://your-elasticsearch:9200 \
  --elasticsearch-api-key your-api-key-here \
  --count 5 --interval 30s

# Basic usage with username/password
docker run --rm simianhacker/simian-forge:latest \
  --elasticsearch-url http://your-elasticsearch:9200 \
  --elasticsearch-auth elastic:yourpassword \
  --count 5 --interval 30s

# Generate weather data
docker run --rm simianhacker/simian-forge:latest \
  --dataset weather \
  --count 3 \
  --interval 1m \
  --elasticsearch-url http://your-elasticsearch:9200

# Connect to external services with API key
docker run --rm simianhacker/simian-forge:latest \
  --elasticsearch-url https://my-cluster.com:9200 \
  --elasticsearch-api-key your-api-key-here \
  --collector http://my-collector:4318 \
  --format otel

Docker Configuration

Copy .env.example to .env and customize:

cp .env.example .env
# Edit .env file with your preferred settings
docker compose --profile full-stack up -d

Local Usage

Generate metrics for 5 minutes with default settings:

./forge

Command Line Options

simian-forge [options]

Options:
  --interval <value>              Frequency of data generation (e.g., 30s, 5m) (default: "10s")
  --backfill <value>              How far back to backfill data (e.g., now-1h) (default: "now-5m")
  --count <number>                Number of entities to generate (default: "10")
  --dataset <name>                Name of the dataset: hosts, weather, unique-metrics, histograms (default: "hosts")
  --elasticsearch-url <url>       Elasticsearch cluster URL (default: "http://localhost:9200")
  --elasticsearch-auth <auth>     Elasticsearch auth in username:password format (default: "elastic:changeme")
  --elasticsearch-api-key <key>   Elasticsearch API key for authentication (default: "")
  --collector <url>               OpenTelemetry collector HTTP endpoint (default: "http://localhost:4318")
  --format <format>               Output format: otel, elastic, or both (hosts only) (default: "both")
  --purge                         Delete existing data streams for the dataset before starting

Authentication Options

Simian Forge supports multiple authentication methods for Elasticsearch:

API Key Authentication (Recommended):

./forge --elasticsearch-api-key "your-api-key-here"

Username/Password Authentication:

./forge --elasticsearch-auth "username:password"

No Authentication (for local development):

./forge --elasticsearch-url http://localhost:9200

Note: API key authentication takes precedence over username/password if both are provided. API keys are recommended for production use as they can be easily revoked and have fine-grained permissions.

Example Commands

Host Metrics

Generate only OpenTelemetry format metrics:

./forge --dataset hosts --format otel --interval 30s

Generate metrics with 1-hour backfill:

./forge --dataset hosts --backfill now-1h --interval 2m

Generate metrics for 25 hosts with custom interval:

./forge --dataset hosts --count 25 --interval 30s

Purge existing data and start fresh:

./forge --dataset hosts --purge --format both

Weather Station Data

Generate weather station data with 5 stations:

./forge --dataset weather --count 5 --interval 1m

Generate weather data with 24-hour backfill:

./forge --dataset weather --backfill now-24h --interval 10s

Purge existing weather data and start fresh:

./forge --dataset weather --purge --count 3 --backfill now-12h

Unique Metrics (Cardinality Testing)

Generate 1000 unique metrics for cardinality testing:

./forge --dataset unique-metrics --count 1000 --interval 30s

Test high cardinality with backfill:

./forge --dataset unique-metrics --count 5000 --backfill now-1h --interval 1m

Purge existing cardinality test data and start fresh:

./forge --dataset unique-metrics --purge --count 2500 --interval 15s

Histograms (Histogram + Exponential Histogram Fields)

Generate time series histogram samples (one doc per entity per interval):

./forge --dataset histograms --count 3 --interval 10s

Purge existing histogram data stream and start fresh:

./forge --dataset histograms --purge --count 3 --backfill now-30m --interval 10s

General Options

Connect to remote Elasticsearch with API key authentication:

./forge --elasticsearch-url https://my-cluster.com:9200 --elasticsearch-api-key your-api-key-here

Connect to remote Elasticsearch with username/password authentication:

./forge --elasticsearch-url https://my-cluster.com:9200 --elasticsearch-auth myuser:mypass

Generate with custom OpenTelemetry collector:

./forge --collector http://otel-collector:4318

Data Formats

Unique Metrics (Cardinality Testing)

Generates configurable numbers of unique metrics for testing system cardinality limits:

Configurable Count: --count parameter controls the exact number of unique metrics generated
Guaranteed Uniqueness: Each metric name includes a counter suffix (e.g., system.usage.total.1, system.usage.total.2)
Consistent Dimensions: Each metric has 3-5 dimensions that remain consistent across intervals
Index Distribution: Automatically distributes metrics across indices (500 metrics per index) to avoid Elasticsearch mapping limits
OTel Format: Uses OpenTelemetry format with _metric_names_hash for consistency
Dynamic Templates: Each metric is mapped as gauge_double type
Data Streams: Routes to metrics-uniquemetrics{N}.otel-default (e.g., metrics-uniquemetrics1.otel-default)

Example document:

{
  "@timestamp": "2025-01-17T15:30:00.000Z",
  "_metric_names_hash": "xyz789",
  "resource": {
    "attributes": {
      "service.name": "metrics-cardinality-test",
      "service.version": "1.0.0",
      "telemetry.sdk.name": "opentelemetry",
      "telemetry.sdk.language": "javascript",
      "telemetry.sdk.version": "1.0.0"
    }
  },
  "attributes": {
    "entity.id": "metric-01",
    "environment": "production",
    "region": "us-east-1",
    "service": "metrics-cardinality-test",
    "datacenter": "dc1",
    "availability_zone": "us-east-1a"
  },
  "metrics": {
    "system.usage.total.1": 0.75
  },
  "data_stream": {
    "type": "metrics",
    "dataset": "cardinality.otel",
    "namespace": "default"
  }
}

Key features:

Scalable Testing: Generate anywhere from 5 to 10,000+ unique metrics
Index Management: Automatically splits across multiple indices at 500 metrics per index
Deterministic Dimensions: Same dimensions per metric across all intervals for consistent cardinality
Performance Testing: Perfect for testing Elasticsearch mapping limits, query performance, and storage efficiency
Smart Purging: Calculates exact indices to delete based on metric count

Weather Station Data (FieldSense Format)

Generates comprehensive weather station metrics in FieldSense namespace:

Environmental Metrics: Temperature, humidity, wind, precipitation, pressure, solar radiation, soil conditions
Solar Panel Metrics: Individual panel voltage, current, power, temperature, and efficiency
Energy Metrics: Consumption, production, and battery status
System Metrics: CPU usage, memory, network traffic
Time Series Support: Proper geo_point coordinates and time series dimensions
Data Stream: Routes to fieldsense-station-metrics

Example document:

{
  "@timestamp": "2025-01-08T15:30:00.000Z",
  "_metric_names_hash": "def456",
  "station.id": "station-01",
  "station.name": "FieldSense Station 01",
  "station.location.coordinates": {
    "lat": 40.7128,
    "lon": -74.0060
  },
  "station.location.region": "us-east-1",
  "sensor.id": "temperature-1",
  "sensor.type": "temperature",
  "sensor.location": "ambient",
  "fieldsense.environmental.temperature.air": 22.5,
  "fieldsense.environmental.temperature.dewpoint": 18.3
}

Key features:

Realistic Correlations: Cloudy weather reduces solar output, temperature affects soil conditions
Smooth Transitions: Weather changes gradually with proper smoothing algorithms
Geo-spatial Support: Coordinates stored as geo_point for mapping and spatial queries
Comprehensive Coverage: 24+ different metric types per station
Time Series Optimized: Proper dimensions and metric routing for long-term storage

Histograms (Histogram + Exponential Histogram Fields)

Generates a dedicated time series data stream containing distribution fields:

Data Stream: histograms-samples (Elasticsearch index.mode: time_series)
Dimensions: entity.id
Fields:
- histogram.tdigest (histogram)
- histogram.hdr (histogram)
- histogram.exponential (exponential_histogram)

Example document:

{
  "@timestamp": "2025-01-08T15:30:00.000Z",
  "entity.id": "entity-01",
  "histogram.tdigest": {
    "values": [12.3, 18.7, 29.1],
    "counts": [10, 12, 8]
  },
  "histogram.hdr": {
    "values": [10.0, 14.1, 20.0],
    "counts": [5, 15, 10]
  },
  "histogram.exponential": {
    "scale": 8,
    "sum": 1234.0,
    "min": 5.1,
    "max": 420.2,
    "zero": { "threshold": 0, "count": 0 },
    "positive": {
      "indices": [12, 13, 14],
      "counts": [20, 7, 3]
    }
  }
}

OpenTelemetry Format

Generates metrics following OpenTelemetry semantic conventions:

Resource Attributes: Comprehensive host attributes (name, type, arch, IPs, MACs, etc.)
Per-Core Metrics: Individual documents per CPU core with cpu and state attributes
Metric Types: system.cpu.utilization, system.cpu.time, system.memory.usage, etc.
Data Stream: Routes to metrics-hostmetricsreceiver.otel-default

Example document:

{
  "@timestamp": "2025-06-17T15:30:00.000Z",
  "_metric_names_hash": "abc123",
  "data_stream": {
    "dataset": "hostmetricsreceiver.otel",
    "namespace": "default",
    "type": "metrics"
  },
  "resource": {
    "attributes": {
      "host.name": "host-01",
      "host.type": "m5.large",
      "host.arch": "amd64",
      "cloud.provider": "aws"
    }
  },
  "attributes": { "cpu": "0", "state": "user" },
  "metrics": { "system.cpu.utilization": 0.45 },
  "unit": "1"
}

Elastic Metricbeat Format

Generates metrics following Elastic Metricbeat patterns:

Metricsets: cpu, memory, load, network, diskio, filesystem, process
Normalized Metrics: Includes *.norm.pct fields for CPU metrics
Event Classification: event.dataset and event.module fields
Data Streams: Routes to metrics-system.{metricset}-default

Example document:

{
  "@timestamp": "2025-06-17T15:30:00.000Z",
  "data_stream": {
    "dataset": "system.cpu",
    "namespace": "default",
    "type": "metrics"
  },
  "event": {
    "dataset": "system.cpu",
    "module": "system"
  },
  "system": {
    "cpu": {
      "user": {
        "pct": 0.45,
        "norm": { "pct": 0.225 }
      },
      "cores": 2
    }
  }
}

Development

Project Structure

src/
├── index.ts                      # Main CLI entry point
├── tracing.ts                   # OpenTelemetry tracing setup
├── types/
│   ├── host-types.ts            # Host and metrics type definitions
│   ├── machine-types.ts         # Cloud machine type specifications
│   └── weather-types.ts         # Weather station type definitions
├── simulators/
│   ├── host-simulator.ts        # Host metrics simulator orchestrator
│   ├── host-generator.ts        # Host configuration generator
│   ├── metrics-generator.ts     # Host metrics generation
│   ├── weather-simulator.ts     # Weather station simulator orchestrator
│   ├── weather-generator.ts     # Weather station configuration generator
│   └── weather-metrics-generator.ts # Weather metrics generation
└── formatters/
    ├── base-formatter.ts        # Common formatter functionality
    ├── otel-formatter.ts        # OpenTelemetry format converter
    ├── elastic-formatter.ts     # Elastic Metricbeat format converter
    └── fieldsense-formatter.ts  # FieldSense weather format converter

Available Scripts

npm run build    # Compile TypeScript to JavaScript
npm run start    # Run the compiled application (use ./forge for easier CLI)
npm run dev      # Build and run the application
./forge          # Convenient wrapper for npm run start

Development Setup

Install dependencies:

npm install

Make changes to TypeScript files in src/
Build and test:

npm run build
./forge --help

Development Workflow

When making schema changes or testing new features, use the --purge option to delete existing data streams and start fresh:

# Purge and restart with hosts data
./forge --dataset hosts --purge --format both

# Purge and restart with weather data
./forge --dataset weather --purge --count 3 --backfill now-6h

# Purge and restart with unique metrics (cardinality testing)
./forge --dataset unique-metrics --purge --count 1000

# Purge specific format for hosts
./forge --dataset hosts --format otel --purge

This ensures clean data streams with updated mappings and templates.

Adding New Metrics

For Host Metrics:

Update HostMetrics interface in src/types/host-types.ts
Implement generation logic in src/simulators/metrics-generator.ts
Add formatting logic to both src/formatters/otel-formatter.ts and src/formatters/elastic-formatter.ts
Test with both formats: --format both

For Weather Station Metrics:

Update WeatherStationMetrics interface in src/types/weather-types.ts
Implement generation logic in src/simulators/weather-metrics-generator.ts
Add formatting logic to src/formatters/fieldsense-formatter.ts
Update Elasticsearch mappings in src/simulators/weather-simulator.ts
Test with: --dataset weather --purge

For Unique Metrics (Cardinality Testing):

Update UniqueMetricsConfig or UniqueMetricsMetrics interfaces in src/types/unique-metrics-types.ts
Modify generation logic in src/simulators/unique-metrics-config-generator.ts or src/simulators/unique-metrics-metrics-generator.ts
Update formatting logic in src/formatters/unique-metrics-formatter.ts
Adjust index distribution logic in src/simulators/unique-metrics-simulator.ts if needed
Test with various counts: --dataset unique-metrics --purge --count 100

Host Configuration

The tool generates deterministic host configurations including:

Machine Types: Realistic CPU/memory specs for AWS, GCP, Azure instances
Network Configuration: Multiple network interfaces with realistic IPs/MACs
Cloud Metadata: Provider-specific instance IDs, regions, availability zones
Disk Configuration: Multiple filesystems with realistic sizes

All configurations are deterministic based on hostname, ensuring consistent data across runs.

Testing

Docker Compose Testing Environment

The included Docker Compose setup provides a complete testing environment:

# Start full stack (Elasticsearch, Kibana, OpenTelemetry Collector, Simian Forge)
docker compose --profile full-stack up -d

# Access services
# Elasticsearch: http://localhost:9200
# Kibana: http://localhost:5601
# OpenTelemetry Collector: http://localhost:4318

# View generated data in Kibana
# Go to http://localhost:5601 and explore the data streams

# Stop the stack
docker compose --profile full-stack down

Docker Environment Variables

Configure the Docker Compose setup with environment variables:

# Create environment file
cp .env.example .env

# Example configurations in .env:
INTERVAL=15s
COUNT=25
DATASET=hosts
FORMAT=both
BACKFILL=now-1h
ELASTICSEARCH_PORT=9200
KIBANA_PORT=5601

# Authentication options (choose one):
ELASTICSEARCH_API_KEY=your-api-key-here
# OR
ELASTICSEARCH_AUTH=username:password

Integration with Existing Docker Compose

Add Simian Forge to your existing Docker Compose setup:

version: '3.8'
services:
  # Your existing services...
  
  data-generator:
    image: simianhacker/simian-forge:latest
    environment:
      - ELASTICSEARCH_URL=http://elasticsearch:9200
      - COLLECTOR=http://otel-collector:4318
      - INTERVAL=10s
      - COUNT=15
      - DATASET=hosts
      - FORMAT=both
    command: [
      "--elasticsearch-url", "${ELASTICSEARCH_URL}",
      "--collector", "${COLLECTOR}",
      "--interval", "${INTERVAL}",
      "--count", "${COUNT}",
      "--dataset", "${DATASET}",
      "--format", "${FORMAT}"
    ]
    depends_on:
      - elasticsearch
    networks:
      - your-network

Local Testing

Local Elasticsearch

Start a local Elasticsearch instance:

docker run -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.13.0

Run simian-forge:

./forge --elasticsearch-url http://localhost:9200

With OpenTelemetry Collector

The included otel-collector-config.yaml provides a complete OpenTelemetry Collector configuration that exports to Elasticsearch:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  elasticsearch/traces:
    endpoints: ["http://elasticsearch:9200"]
    traces_index: traces-simian-forge
  elasticsearch/logs:
    endpoints: ["http://elasticsearch:9200"]
    logs_index: logs-simian-forge
  elasticsearch/metrics:
    endpoints: ["http://elasticsearch:9200"]
    metrics_index: metrics-simian-forge

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [elasticsearch/traces]
    logs:
      receivers: [otlp]
      exporters: [elasticsearch/logs]
    metrics:
      receivers: [otlp]
      exporters: [elasticsearch/metrics]

Contributing

Fork the repository
Create a feature branch
Make changes with appropriate tests
Submit a pull request

Author

Chris Cowan (@simianhacker)

Built with AI assistance from Claude Code

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
forge		forge
otel-collector-config.yaml		otel-collector-config.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

simianhacker/simian-forge

Folders and files

Latest commit

History

Repository files navigation

Simian Forge

AI Coding Experiment

Overview

Key Features

Installation

Prerequisites

Docker Setup (Recommended)

Local Development Setup

Usage

Docker Usage (Recommended)

Using Docker Compose

Using Docker Directly

Docker Configuration

Local Usage

Command Line Options

Authentication Options

Example Commands

Host Metrics

Weather Station Data

Unique Metrics (Cardinality Testing)

Histograms (Histogram + Exponential Histogram Fields)

General Options

Data Formats

Unique Metrics (Cardinality Testing)

Weather Station Data (FieldSense Format)

Histograms (Histogram + Exponential Histogram Fields)

OpenTelemetry Format

Elastic Metricbeat Format

Development

Project Structure

Available Scripts

Development Setup

Development Workflow

Adding New Metrics

For Host Metrics:

For Weather Station Metrics:

For Unique Metrics (Cardinality Testing):

Host Configuration

Testing

Docker Compose Testing Environment

Docker Environment Variables

Integration with Existing Docker Compose

Local Testing

Local Elasticsearch

With OpenTelemetry Collector

Contributing

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages