Skip to content

Add Logs Ingestion API scenario with synthetic sample data#296

Open
austinmccollum wants to merge 3 commits into
microsoft:masterfrom
austinmccollum:logs-ingestion-scenario
Open

Add Logs Ingestion API scenario with synthetic sample data#296
austinmccollum wants to merge 3 commits into
microsoft:masterfrom
austinmccollum:logs-ingestion-scenario

Conversation

@austinmccollum
Copy link
Copy Markdown

Summary

Adds a new scenario under Scenarios/How to collect data with the Logs Ingestion API/ to support the Logs Ingestion API portal tutorial.

Problem

The tutorial's inline sample data (~200 lines of Apache access logs) contained references to a real website (almhuette-raith.at), real page paths, and real user-agent strings — a privacy concern for published documentation.

Changes

File Description
README.md Describes the scenario, links to the tutorial, explains the sample data and generator script
Generate-SampleAccessLog.ps1 PowerShell script that generates synthetic Apache Combined Log Format entries with configurable count. Uses RFC 5737 documentation IPs (198.51.100.x, 203.0.113.x), contoso.example.com domains, and weighted response codes
sample_access.log Pre-generated 200-entry synthetic log file, ready to use with the tutorial

Design decisions

  • RFC 5737 IPs (198.51.100.x, 203.0.113.x) and RFC 2606 domains (example.com) — reserved for documentation, guaranteed non-routable
  • Weighted response codes — 65% 200, 8% 404, 8% 304, plus 301/400/401/403/500/502/503 — ensures the tutorial's where ResponseCode != 200 KQL filter produces meaningful output
  • Fixed random seed (42) in the generator for reproducibility
  • Varied user agents — desktop, mobile, bot, CLI — realistic mix without real device fingerprints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant