Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
<#
.SYNOPSIS
Generates synthetic Apache Combined Log Format access log data for use with
the Azure Monitor Logs Ingestion API tutorial.

.DESCRIPTION
Creates a sample_access.log file with realistic but fully synthetic entries.
All IP addresses, domains, paths, and user agents are fabricated — no real
PII is included.

The output matches the Apache Combined Log Format expected by the tutorial's
KQL parse transformation:
IP - - [timestamp] "METHOD /path HTTP/1.1" status size "referer" "user-agent" "-"

.PARAMETER Count
Number of log entries to generate. Default: 200.

.PARAMETER Output
Path to the output file. Default: sample_access.log in the current directory.

.PARAMETER StartDate
Starting timestamp for log entries. Default: 2024-03-15T08:00:00.

.EXAMPLE
.\Generate-SampleAccessLog.ps1
.\Generate-SampleAccessLog.ps1 -Count 500 -Output "my_access.log"
.\Generate-SampleAccessLog.ps1 -Count 100 -StartDate "2024-06-01T12:00:00"
#>
param(
[int]$Count = 200,
[string]$Output = "sample_access.log",
[datetime]$StartDate = [datetime]"2024-03-15T08:00:00"
)

# --- Pools of synthetic values ---

$methods = @("GET", "GET", "GET", "GET", "GET", "POST", "PUT", "DELETE", "HEAD")
$httpVersions = @("HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/2.0")

$paths = @(
"/"
"/index.html"
"/about.html"
"/contact.html"
"/products"
"/products/catalog"
"/products/details?id=1042"
"/products/details?id=2087"
"/products/details?id=3291"
"/api/v1/status"
"/api/v1/health"
"/api/v1/users"
"/api/v1/orders"
"/api/v1/inventory"
"/api/v2/search?q=monitor"
"/api/v2/search?q=logs"
"/images/logo.png"
"/images/banner.jpg"
"/images/hero-bg.webp"
"/css/main.css"
"/css/theme.css"
"/js/app.js"
"/js/analytics.js"
"/fonts/opensans.woff2"
"/favicon.ico"
"/robots.txt"
"/sitemap.xml"
"/docs/getting-started"
"/docs/api-reference"
"/docs/faq"
"/blog/2024/new-features"
"/blog/2024/performance-tips"
"/login"
"/dashboard"
"/dashboard/settings"
"/admin/reports"
"/admin/users"
"/download/latest"
"/pricing"
"/support/tickets"
"/support/kb/1001"
"/support/kb/2045"
)

# Weighted status codes: mostly 200, some errors
$statusWeights = @(
@{ Code = 200; Weight = 65 }
@{ Code = 301; Weight = 3 }
@{ Code = 302; Weight = 2 }
@{ Code = 304; Weight = 8 }
@{ Code = 400; Weight = 3 }
@{ Code = 401; Weight = 3 }
@{ Code = 403; Weight = 3 }
@{ Code = 404; Weight = 8 }
@{ Code = 500; Weight = 3 }
@{ Code = 502; Weight = 1 }
@{ Code = 503; Weight = 1 }
)

# Build expanded status array for weighted random selection
$statusPool = @()
foreach ($s in $statusWeights) {
for ($i = 0; $i -lt $s.Weight; $i++) {
$statusPool += $s.Code
}
}

$userAgents = @(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0'
'Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.64 Mobile Safari/537.36'
'Mozilla/5.0 (Linux; Android 13; SM-S911B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36'
'Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/122.0.6261.62 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (iPad; CPU OS 17_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
'Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)'
'Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)'
'curl/8.5.0'
'Python-urllib/3.12'
'axios/1.6.7'
)

$referers = @(
"-"
"-"
"-"
"-"
"https://www.contoso-web.example.com/"
"https://www.contoso-web.example.com/products"
"https://www.contoso-web.example.com/docs/getting-started"
"https://www.contoso-web.example.com/blog/2024/new-features"
"https://search.contoso.example.com/results?q=monitor+logs"
"https://portal.contoso.example.com/dashboard"
)

# --- Helper functions ---

function Get-SyntheticIP {
# Generate RFC 5737 documentation-range IPs (198.51.100.x, 203.0.113.x)
# and 10.x.x.x private range
$ranges = @(
@{ Prefix = "198.51.100"; Max = 254 }
@{ Prefix = "203.0.113"; Max = 254 }
@{ Prefix = "10.0"; TwoOctet = $true }
)
$range = $ranges | Get-Random
if ($range.TwoOctet) {
return "$($range.Prefix).$(Get-Random -Minimum 1 -Maximum 255).$(Get-Random -Minimum 1 -Maximum 255)"
}
return "$($range.Prefix).$(Get-Random -Minimum 1 -Maximum $range.Max)"
}

function Get-ResponseSize {
param([int]$StatusCode, [string]$Path)
switch ($StatusCode) {
304 { return 0 }
{ $_ -ge 400 } { return Get-Random -Minimum 150 -Maximum 600 }
default {
if ($Path -match '\.(png|jpg|webp|woff2)$') { return Get-Random -Minimum 5000 -Maximum 150000 }
if ($Path -match '\.(css|js)$') { return Get-Random -Minimum 800 -Maximum 45000 }
if ($Path -match '^/api/') { return Get-Random -Minimum 50 -Maximum 8000 }
return Get-Random -Minimum 1200 -Maximum 35000
}
}
}

# --- Generate entries ---

$random = [System.Random]::new(42) # Fixed seed for reproducibility
$entries = [System.Collections.Generic.List[string]]::new($Count)
$currentTime = $StartDate

for ($i = 0; $i -lt $Count; $i++) {
$ip = Get-SyntheticIP
$method = $methods | Get-Random
$path = $paths | Get-Random
$httpVersion = $httpVersions | Get-Random
$status = $statusPool | Get-Random
$size = Get-ResponseSize -StatusCode $status -Path $path
$ua = $userAgents | Get-Random
$referer = $referers | Get-Random

# Format timestamp as Apache CLF: [dd/Mon/yyyy:HH:mm:ss +0000]
$ts = $currentTime.ToString("dd/MMM/yyyy:HH:mm:ss +0000", [System.Globalization.CultureInfo]::InvariantCulture)

$entry = '{0} - - [{1}] "{2} {3} {4}" {5} {6} "{7}" "{8}" "-"' -f `
$ip, $ts, $method, $path, $httpVersion, $status, $size, $referer, $ua

$entries.Add($entry)

# Advance time by 1-90 seconds
$currentTime = $currentTime.AddSeconds((Get-Random -Minimum 1 -Maximum 91))
}

# --- Write output ---

$entries | Set-Content -Path $Output -Encoding UTF8
Write-Host "Generated $Count synthetic Apache access log entries in: $Output"
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# How to collect data with the Logs Ingestion API

The [Logs Ingestion API](https://learn.microsoft.com/azure/azure-monitor/logs/logs-ingestion-api-overview) lets you send external data to a Log Analytics workspace in Azure Monitor using a REST API call. Use it to ingest custom logs from any source that can make HTTP requests.

## Tutorial

For a complete walkthrough of configuring a custom table, data collection rule (DCR), data collection endpoint (DCE), and sending data with PowerShell, see:

**[Tutorial: Send data to Azure Monitor Logs with Logs ingestion API (Azure portal)](https://learn.microsoft.com/azure/azure-monitor/logs/tutorial-logs-ingestion-portal)**

## Sample data

This folder contains synthetic Apache access log data for use with the tutorial:

| File | Description |
|------|-------------|
| [sample_access.log](sample_access.log) | Pre-generated synthetic Apache access log (~200 entries). Ready to use with the tutorial's `LogGenerator.ps1` script. |
| [Generate-SampleAccessLog.ps1](Generate-SampleAccessLog.ps1) | PowerShell script to generate your own synthetic access log with a configurable number of entries. |

### Using the sample data

1. Download `sample_access.log` from this folder.
1. Follow the [tutorial](https://learn.microsoft.com/azure/azure-monitor/logs/tutorial-logs-ingestion-portal) to set up your DCR, DCE, and custom table.
1. Use the `LogGenerator.ps1` script from the tutorial to convert and send the data:

```powershell
.\LogGenerator.ps1 -Log "sample_access.log" -Type "file" -Output "data_sample.json"
```

### Generating your own data

Run `Generate-SampleAccessLog.ps1` to create a custom-sized log file:

```powershell
.\Generate-SampleAccessLog.ps1 -Count 500 -Output "my_access.log"
```

## Why synthetic data?

The sample data in this folder is fully synthetic — no real IP addresses, domains, or user traffic patterns. This avoids privacy concerns that can arise when using real-world access log datasets. The synthetic entries are structured to match standard Apache Combined Log Format so they work with the tutorial's KQL `parse` transformation.
Loading