A command-line tool for generating synthetic data and sending it to Elasticsearch. Supports both host metrics (OpenTelemetry and Elastic Metricbeat formats) and weather station data (FieldSense format).
Note: This project was developed as an experiment to evaluate AI coding tools, specifically Claude Code. The goal was to create a complete, production-ready tool without writing a single line of code manually - instead relying entirely on AI guidance and code generation.
The experiment successfully demonstrates that AI coding assistants can:
- Understand complex technical requirements and specifications
- Generate comprehensive TypeScript applications with proper architecture
- Implement industry-standard protocols (OpenTelemetry, Elasticsearch APIs)
- Create realistic data simulation with proper statistical distributions
- Handle error cases, logging, and production concerns
- Produce well-documented, maintainable code
This serves as a proof-of-concept for AI-assisted software development workflows and the potential for natural language programming.
Simian Forge simulates realistic synthetic data for Elasticsearch, supporting three main datasets:
- Host Metrics: CPU, memory, network, disk I/O, filesystem, and process statistics in OpenTelemetry and/or Elastic Metricbeat formats
- Weather Station Data: Environmental sensors, solar panels, energy consumption, and system metrics in FieldSense format
- Unique Metrics: Configurable cardinality testing with unique metric names for system performance evaluation
- Histograms: Time series dataset with
histogram(t-digest-like + HDR-like) andexponential_histogramfields for distribution testing
This makes it ideal for testing monitoring systems, dashboards, alerting rules, and time series visualizations.
- Multiple Datasets: Host metrics, weather station data, and unique metrics generation
- Histogram Field Types: Generates
histogramandexponential_histogramfields for aggregation/testing - Format Support: OpenTelemetry, Elastic Metricbeat, and FieldSense formats
- Cardinality Testing: Configurable unique metric generation for performance testing
- Realistic Data Generation: Correlated metrics, smooth transitions, and realistic patterns
- Backfill & Real-time: Historical data backfill with configurable real-time generation
- Time Series Support: Elasticsearch time series data streams with proper routing
- Cloud Provider Simulation: Deterministic configurations with AWS, GCP, Azure specifics
- OpenTelemetry Instrumentation: Full tracing support with configurable OTLP collector
- Development Tools: Data stream purging for schema changes and fresh starts
- Docker (recommended)
- OR Node.js 18+ and npm for local development
- Elasticsearch cluster (optional, included in Docker Compose)
- OpenTelemetry Collector (optional, included in Docker Compose)
- Kibana Setup: Before indexing data, go to "Integrations" in Kibana and install the "System" integration to ensure proper index templates and mappings are configured
The easiest way to get started is with Docker, which provides a complete testing environment:
- Clone the repository:
git clone <repository-url>
cd simian-forge- Build the Docker image:
docker build -t simianhacker/simian-forge:latest .- Run with Docker Compose (includes Elasticsearch, Kibana, and OpenTelemetry):
# Start the full stack
docker compose --profile full-stack up -d
# View logs
docker compose logs -f simian-forge
# Stop the stack
docker compose --profile full-stack down- Clone the repository:
git clone <repository-url>
cd simian-forge- Install dependencies:
npm install- Build the project:
npm run buildThe easiest way to get started with a complete testing environment:
# Start with default settings
docker compose --profile full-stack up -d
# Customize with environment variables
INTERVAL=30s COUNT=20 FORMAT=otel docker compose --profile full-stack up -d
# Run one-time data generation
docker compose run --rm simian-forge --purge --backfill now-2h
# Connect to external Elasticsearch with API key
ELASTICSEARCH_URL=https://my-cluster.com:9200 \
ELASTICSEARCH_API_KEY=your-api-key-here \
docker compose --profile full-stack up -d
# Or with username/password
ELASTICSEARCH_URL=https://my-cluster.com:9200 \
ELASTICSEARCH_AUTH=myuser:mypass \
docker compose --profile full-stack up -dRun the container directly (bring your own Elasticsearch):
# Basic usage with API key
docker run --rm simianhacker/simian-forge:latest \
--elasticsearch-url http://your-elasticsearch:9200 \
--elasticsearch-api-key your-api-key-here \
--count 5 --interval 30s
# Basic usage with username/password
docker run --rm simianhacker/simian-forge:latest \
--elasticsearch-url http://your-elasticsearch:9200 \
--elasticsearch-auth elastic:yourpassword \
--count 5 --interval 30s
# Generate weather data
docker run --rm simianhacker/simian-forge:latest \
--dataset weather \
--count 3 \
--interval 1m \
--elasticsearch-url http://your-elasticsearch:9200
# Connect to external services with API key
docker run --rm simianhacker/simian-forge:latest \
--elasticsearch-url https://my-cluster.com:9200 \
--elasticsearch-api-key your-api-key-here \
--collector http://my-collector:4318 \
--format otelCopy .env.example to .env and customize:
cp .env.example .env
# Edit .env file with your preferred settings
docker compose --profile full-stack up -dGenerate metrics for 5 minutes with default settings:
./forgesimian-forge [options]
Options:
--interval <value> Frequency of data generation (e.g., 30s, 5m) (default: "10s")
--backfill <value> How far back to backfill data (e.g., now-1h) (default: "now-5m")
--count <number> Number of entities to generate (default: "10")
--dataset <name> Name of the dataset: hosts, weather, unique-metrics, histograms (default: "hosts")
--elasticsearch-url <url> Elasticsearch cluster URL (default: "http://localhost:9200")
--elasticsearch-auth <auth> Elasticsearch auth in username:password format (default: "elastic:changeme")
--elasticsearch-api-key <key> Elasticsearch API key for authentication (default: "")
--collector <url> OpenTelemetry collector HTTP endpoint (default: "http://localhost:4318")
--format <format> Output format: otel, elastic, or both (hosts only) (default: "both")
--purge Delete existing data streams for the dataset before startingSimian Forge supports multiple authentication methods for Elasticsearch:
-
API Key Authentication (Recommended):
./forge --elasticsearch-api-key "your-api-key-here" -
Username/Password Authentication:
./forge --elasticsearch-auth "username:password" -
No Authentication (for local development):
./forge --elasticsearch-url http://localhost:9200
Note: API key authentication takes precedence over username/password if both are provided. API keys are recommended for production use as they can be easily revoked and have fine-grained permissions.
Generate only OpenTelemetry format metrics:
./forge --dataset hosts --format otel --interval 30sGenerate metrics with 1-hour backfill:
./forge --dataset hosts --backfill now-1h --interval 2mGenerate metrics for 25 hosts with custom interval:
./forge --dataset hosts --count 25 --interval 30sPurge existing data and start fresh:
./forge --dataset hosts --purge --format bothGenerate weather station data with 5 stations:
./forge --dataset weather --count 5 --interval 1mGenerate weather data with 24-hour backfill:
./forge --dataset weather --backfill now-24h --interval 10sPurge existing weather data and start fresh:
./forge --dataset weather --purge --count 3 --backfill now-12hGenerate 1000 unique metrics for cardinality testing:
./forge --dataset unique-metrics --count 1000 --interval 30sTest high cardinality with backfill:
./forge --dataset unique-metrics --count 5000 --backfill now-1h --interval 1mPurge existing cardinality test data and start fresh:
./forge --dataset unique-metrics --purge --count 2500 --interval 15sGenerate time series histogram samples (one doc per entity per interval):
./forge --dataset histograms --count 3 --interval 10sPurge existing histogram data stream and start fresh:
./forge --dataset histograms --purge --count 3 --backfill now-30m --interval 10sConnect to remote Elasticsearch with API key authentication:
./forge --elasticsearch-url https://my-cluster.com:9200 --elasticsearch-api-key your-api-key-hereConnect to remote Elasticsearch with username/password authentication:
./forge --elasticsearch-url https://my-cluster.com:9200 --elasticsearch-auth myuser:mypassGenerate with custom OpenTelemetry collector:
./forge --collector http://otel-collector:4318Generates configurable numbers of unique metrics for testing system cardinality limits:
- Configurable Count:
--countparameter controls the exact number of unique metrics generated - Guaranteed Uniqueness: Each metric name includes a counter suffix (e.g.,
system.usage.total.1,system.usage.total.2) - Consistent Dimensions: Each metric has 3-5 dimensions that remain consistent across intervals
- Index Distribution: Automatically distributes metrics across indices (500 metrics per index) to avoid Elasticsearch mapping limits
- OTel Format: Uses OpenTelemetry format with
_metric_names_hashfor consistency - Dynamic Templates: Each metric is mapped as
gauge_doubletype - Data Streams: Routes to
metrics-uniquemetrics{N}.otel-default(e.g.,metrics-uniquemetrics1.otel-default)
Example document:
{
"@timestamp": "2025-01-17T15:30:00.000Z",
"_metric_names_hash": "xyz789",
"resource": {
"attributes": {
"service.name": "metrics-cardinality-test",
"service.version": "1.0.0",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.language": "javascript",
"telemetry.sdk.version": "1.0.0"
}
},
"attributes": {
"entity.id": "metric-01",
"environment": "production",
"region": "us-east-1",
"service": "metrics-cardinality-test",
"datacenter": "dc1",
"availability_zone": "us-east-1a"
},
"metrics": {
"system.usage.total.1": 0.75
},
"data_stream": {
"type": "metrics",
"dataset": "cardinality.otel",
"namespace": "default"
}
}Key features:
- Scalable Testing: Generate anywhere from 5 to 10,000+ unique metrics
- Index Management: Automatically splits across multiple indices at 500 metrics per index
- Deterministic Dimensions: Same dimensions per metric across all intervals for consistent cardinality
- Performance Testing: Perfect for testing Elasticsearch mapping limits, query performance, and storage efficiency
- Smart Purging: Calculates exact indices to delete based on metric count
Generates comprehensive weather station metrics in FieldSense namespace:
- Environmental Metrics: Temperature, humidity, wind, precipitation, pressure, solar radiation, soil conditions
- Solar Panel Metrics: Individual panel voltage, current, power, temperature, and efficiency
- Energy Metrics: Consumption, production, and battery status
- System Metrics: CPU usage, memory, network traffic
- Time Series Support: Proper geo_point coordinates and time series dimensions
- Data Stream: Routes to
fieldsense-station-metrics
Example document:
{
"@timestamp": "2025-01-08T15:30:00.000Z",
"_metric_names_hash": "def456",
"station.id": "station-01",
"station.name": "FieldSense Station 01",
"station.location.coordinates": {
"lat": 40.7128,
"lon": -74.0060
},
"station.location.region": "us-east-1",
"sensor.id": "temperature-1",
"sensor.type": "temperature",
"sensor.location": "ambient",
"fieldsense.environmental.temperature.air": 22.5,
"fieldsense.environmental.temperature.dewpoint": 18.3
}Key features:
- Realistic Correlations: Cloudy weather reduces solar output, temperature affects soil conditions
- Smooth Transitions: Weather changes gradually with proper smoothing algorithms
- Geo-spatial Support: Coordinates stored as geo_point for mapping and spatial queries
- Comprehensive Coverage: 24+ different metric types per station
- Time Series Optimized: Proper dimensions and metric routing for long-term storage
Generates a dedicated time series data stream containing distribution fields:
- Data Stream:
histograms-samples(Elasticsearchindex.mode: time_series) - Dimensions:
entity.id - Fields:
histogram.tdigest(histogram)histogram.hdr(histogram)histogram.exponential(exponential_histogram)
Example document:
{
"@timestamp": "2025-01-08T15:30:00.000Z",
"entity.id": "entity-01",
"histogram.tdigest": {
"values": [12.3, 18.7, 29.1],
"counts": [10, 12, 8]
},
"histogram.hdr": {
"values": [10.0, 14.1, 20.0],
"counts": [5, 15, 10]
},
"histogram.exponential": {
"scale": 8,
"sum": 1234.0,
"min": 5.1,
"max": 420.2,
"zero": { "threshold": 0, "count": 0 },
"positive": {
"indices": [12, 13, 14],
"counts": [20, 7, 3]
}
}
}Generates metrics following OpenTelemetry semantic conventions:
- Resource Attributes: Comprehensive host attributes (name, type, arch, IPs, MACs, etc.)
- Per-Core Metrics: Individual documents per CPU core with
cpuandstateattributes - Metric Types:
system.cpu.utilization,system.cpu.time,system.memory.usage, etc. - Data Stream: Routes to
metrics-hostmetricsreceiver.otel-default
Example document:
{
"@timestamp": "2025-06-17T15:30:00.000Z",
"_metric_names_hash": "abc123",
"data_stream": {
"dataset": "hostmetricsreceiver.otel",
"namespace": "default",
"type": "metrics"
},
"resource": {
"attributes": {
"host.name": "host-01",
"host.type": "m5.large",
"host.arch": "amd64",
"cloud.provider": "aws"
}
},
"attributes": { "cpu": "0", "state": "user" },
"metrics": { "system.cpu.utilization": 0.45 },
"unit": "1"
}Generates metrics following Elastic Metricbeat patterns:
- Metricsets:
cpu,memory,load,network,diskio,filesystem,process - Normalized Metrics: Includes
*.norm.pctfields for CPU metrics - Event Classification:
event.datasetandevent.modulefields - Data Streams: Routes to
metrics-system.{metricset}-default
Example document:
{
"@timestamp": "2025-06-17T15:30:00.000Z",
"data_stream": {
"dataset": "system.cpu",
"namespace": "default",
"type": "metrics"
},
"event": {
"dataset": "system.cpu",
"module": "system"
},
"system": {
"cpu": {
"user": {
"pct": 0.45,
"norm": { "pct": 0.225 }
},
"cores": 2
}
}
}src/
├── index.ts # Main CLI entry point
├── tracing.ts # OpenTelemetry tracing setup
├── types/
│ ├── host-types.ts # Host and metrics type definitions
│ ├── machine-types.ts # Cloud machine type specifications
│ └── weather-types.ts # Weather station type definitions
├── simulators/
│ ├── host-simulator.ts # Host metrics simulator orchestrator
│ ├── host-generator.ts # Host configuration generator
│ ├── metrics-generator.ts # Host metrics generation
│ ├── weather-simulator.ts # Weather station simulator orchestrator
│ ├── weather-generator.ts # Weather station configuration generator
│ └── weather-metrics-generator.ts # Weather metrics generation
└── formatters/
├── base-formatter.ts # Common formatter functionality
├── otel-formatter.ts # OpenTelemetry format converter
├── elastic-formatter.ts # Elastic Metricbeat format converter
└── fieldsense-formatter.ts # FieldSense weather format converter
npm run build # Compile TypeScript to JavaScript
npm run start # Run the compiled application (use ./forge for easier CLI)
npm run dev # Build and run the application
./forge # Convenient wrapper for npm run start- Install dependencies:
npm install-
Make changes to TypeScript files in
src/ -
Build and test:
npm run build
./forge --helpWhen making schema changes or testing new features, use the --purge option to delete existing data streams and start fresh:
# Purge and restart with hosts data
./forge --dataset hosts --purge --format both
# Purge and restart with weather data
./forge --dataset weather --purge --count 3 --backfill now-6h
# Purge and restart with unique metrics (cardinality testing)
./forge --dataset unique-metrics --purge --count 1000
# Purge specific format for hosts
./forge --dataset hosts --format otel --purgeThis ensures clean data streams with updated mappings and templates.
- Update
HostMetricsinterface insrc/types/host-types.ts - Implement generation logic in
src/simulators/metrics-generator.ts - Add formatting logic to both
src/formatters/otel-formatter.tsandsrc/formatters/elastic-formatter.ts - Test with both formats:
--format both
- Update
WeatherStationMetricsinterface insrc/types/weather-types.ts - Implement generation logic in
src/simulators/weather-metrics-generator.ts - Add formatting logic to
src/formatters/fieldsense-formatter.ts - Update Elasticsearch mappings in
src/simulators/weather-simulator.ts - Test with:
--dataset weather --purge
- Update
UniqueMetricsConfigorUniqueMetricsMetricsinterfaces insrc/types/unique-metrics-types.ts - Modify generation logic in
src/simulators/unique-metrics-config-generator.tsorsrc/simulators/unique-metrics-metrics-generator.ts - Update formatting logic in
src/formatters/unique-metrics-formatter.ts - Adjust index distribution logic in
src/simulators/unique-metrics-simulator.tsif needed - Test with various counts:
--dataset unique-metrics --purge --count 100
The tool generates deterministic host configurations including:
- Machine Types: Realistic CPU/memory specs for AWS, GCP, Azure instances
- Network Configuration: Multiple network interfaces with realistic IPs/MACs
- Cloud Metadata: Provider-specific instance IDs, regions, availability zones
- Disk Configuration: Multiple filesystems with realistic sizes
All configurations are deterministic based on hostname, ensuring consistent data across runs.
The included Docker Compose setup provides a complete testing environment:
# Start full stack (Elasticsearch, Kibana, OpenTelemetry Collector, Simian Forge)
docker compose --profile full-stack up -d
# Access services
# Elasticsearch: http://localhost:9200
# Kibana: http://localhost:5601
# OpenTelemetry Collector: http://localhost:4318
# View generated data in Kibana
# Go to http://localhost:5601 and explore the data streams
# Stop the stack
docker compose --profile full-stack downConfigure the Docker Compose setup with environment variables:
# Create environment file
cp .env.example .env
# Example configurations in .env:
INTERVAL=15s
COUNT=25
DATASET=hosts
FORMAT=both
BACKFILL=now-1h
ELASTICSEARCH_PORT=9200
KIBANA_PORT=5601
# Authentication options (choose one):
ELASTICSEARCH_API_KEY=your-api-key-here
# OR
ELASTICSEARCH_AUTH=username:passwordAdd Simian Forge to your existing Docker Compose setup:
version: '3.8'
services:
# Your existing services...
data-generator:
image: simianhacker/simian-forge:latest
environment:
- ELASTICSEARCH_URL=http://elasticsearch:9200
- COLLECTOR=http://otel-collector:4318
- INTERVAL=10s
- COUNT=15
- DATASET=hosts
- FORMAT=both
command: [
"--elasticsearch-url", "${ELASTICSEARCH_URL}",
"--collector", "${COLLECTOR}",
"--interval", "${INTERVAL}",
"--count", "${COUNT}",
"--dataset", "${DATASET}",
"--format", "${FORMAT}"
]
depends_on:
- elasticsearch
networks:
- your-networkStart a local Elasticsearch instance:
docker run -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.13.0Run simian-forge:
./forge --elasticsearch-url http://localhost:9200The included otel-collector-config.yaml provides a complete OpenTelemetry Collector configuration that exports to Elasticsearch:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
elasticsearch/traces:
endpoints: ["http://elasticsearch:9200"]
traces_index: traces-simian-forge
elasticsearch/logs:
endpoints: ["http://elasticsearch:9200"]
logs_index: logs-simian-forge
elasticsearch/metrics:
endpoints: ["http://elasticsearch:9200"]
metrics_index: metrics-simian-forge
service:
pipelines:
traces:
receivers: [otlp]
exporters: [elasticsearch/traces]
logs:
receivers: [otlp]
exporters: [elasticsearch/logs]
metrics:
receivers: [otlp]
exporters: [elasticsearch/metrics]- Fork the repository
- Create a feature branch
- Make changes with appropriate tests
- Submit a pull request
Chris Cowan (@simianhacker)
Built with AI assistance from Claude Code
MIT License