Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 193 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# AGENTS.md

Coding agent guidelines for the Kagi Bangs repository.

## Project Overview

This repository contains bang definitions for [Kagi Search](https://kagi.com). Bangs are search shortcuts (e.g., `!gh` for GitHub) that redirect queries to specific websites.

- **Primary data files**: `data/bangs.json` (community bangs), `data/kagi_bangs.json` (internal Kagi bangs)
- **Schema**: `data/bangs.schema.json`
- **Language**: Ruby for scripts/tests, JSON for data

## Build/Test Commands

```bash
# Install dependencies
make install
# or: bundle install

# Run all tests
make test
# or: bundle exec rspec

# Run specific test by line number
bundle exec rspec spec/bangs_spec.rb:127

# Run specific test by description pattern
bundle exec rspec -e "doesn't have duplicate bang triggers"

# Health check all bang URLs (slow)
ruby scripts/health_check.rb

# Other utilities
ruby scripts/generate_alt_domains.rb
ruby scripts/taken_region_codes.rb
ruby scripts/deduplicate_bangs.rb
```

## Code Style

### General (EditorConfig)

- **Charset**: UTF-8
- **Line endings**: LF (`\n`)
- **Indent**: 2 spaces (no tabs)
- **Final newline**: Required
- **Trim trailing whitespace**: Yes

### Ruby Style

```ruby
# Require statements at top
require "json"
require "rspec"

# Double quotes for strings
bangs_json = JSON.parse(File.read("data/bangs.json"))

# Method definitions with snake_case
def find_dups(*arr)
arr.flatten
.group_by { |element| element }
.select { |k, v| v.size > 1 }
.keys
end

# Guard clauses for early returns
return if bang["u"].start_with?("/")

# Conditional assignment
bang["ts"] ||= []

# String interpolation
puts "#{bang["s"]} (#{bang["t"]})"

# Exception handling
rescue => e
mutex.synchronize { errored << [bang, e] }
```

### JSON Bang Object Style

Key order: `s`, `d`, `ad`, `t`, `ts`, `u`, `x`, `c`, `sc`, `skip_tests`, `fmt`

Website names in `s` must follow `WEBSITE_NAMING_SPEC.md`. In short: preserve official brand styling for the base site name, and use `Site Name (Qualifier)` for variants such as filters, sections, scoped entities, users, region variants, and translation pairs.

```json
{
"s": "Site Name",
"d": "example.com",
"ad": "alt-domain.com",
"t": "trigger",
"ts": ["alias1", "alias2"],
"u": "https://example.com/search?q={{{s}}}",
"x": "^pattern$",
"c": "Category",
"sc": "Subcategory",
"skip_tests": false,
"fmt": ["url_encode_placeholder"]
}
```

#### Required Fields

| Key | Description |
|-----|-------------|
| `s` | Website name (display name, following `WEBSITE_NAMING_SPEC.md`) |
| `d` | Domain (must match URL host) |
| `t` | Trigger (lowercase, letters/numbers/dashes/periods/underscores only) |
| `u` | URL template with `{{{s}}}` placeholder |

#### Optional Fields

| Key | Description |
|-----|-------------|
| `ad` | Alternative/snap domain |
| `ts` | Array of additional trigger aliases |
| `x` | Regex pattern for complex query parsing |
| `c` | Category (see README for valid values) |
| `sc` | Subcategory |
| `skip_tests` | Boolean to skip spec tests |
| `fmt` | Array of format flags (exhaustive): `open_base_path`, `open_snap_domain`, `url_encode_placeholder`, `url_encode_space_to_plus` |

### RSpec Test Patterns

```ruby
describe "bangs.json" do
it "doesn't have duplicate bang triggers" do
dups = find_dups(bang_triggers)
expect(dups).to be_empty, "Duplicate triggers(s) found: #{dups.join(", ")}"
end

bangs_json.each do |bang|
it "trigger should be lowercase (#{bang["s"]})" do
expect(bang["t"]).to eq(bang["t"].downcase)
end
end
end
```

## Data Validation Rules

Tests enforce (see `spec/bangs_spec.rb`):

- No duplicate triggers across `bangs.json` and `kagi_bangs.json`
- No duplicate templates or sites
- Triggers must be lowercase
- Templates must contain exactly one `{{{s}}}` placeholder
- Templates must be HTTPS URLs or paths starting with `/`
- Domains must match the URL template host
- Domains must not be URI-encoded
- Alternative domains (`ad`) must not contain protocol, commas, or spaces
- Regex patterns (`x`) must be valid regex
- Bangs with path-only URLs (`/search`) must have an `ad` field and `d: "kagi.com"`

## File Structure

```
bangs/
├── data/
│ ├── bangs.json # Community bangs (edit this)
│ ├── kagi_bangs.json # Internal Kagi bangs
│ └── bangs.schema.json # JSON Schema
├── spec/
│ └── bangs_spec.rb # RSpec tests
├── scripts/ # Utility scripts
├── Gemfile # Ruby dependencies
├── Makefile # Build commands
└── .editorconfig # Formatting rules
```

## Common Tasks

### Adding a New Bang

1. Add entry to `data/bangs.json` (alphabetical by trigger)
2. Set `s` according to `WEBSITE_NAMING_SPEC.md`
3. Run `bundle exec rspec` to validate
4. Verify the URL works with a test query

### Adding Additional Triggers

Add to the `ts` array instead of creating duplicate entries:

```json
{
"s": "GitHub",
"d": "github.com",
"t": "gh",
"ts": ["github", "git"],
"u": "https://github.com/search?q={{{s}}}"
}
```
139 changes: 139 additions & 0 deletions WEBSITE_NAMING_SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Website Name (`s` field) Specification

This document defines the naming convention for the `s` (Website Name) field in bang definitions.

## Core Principles

1. **Preserve official brand styling**: Keep the site's own capitalization and spelling where known, such as `eBay`, `iFixit`, `npm`, `dev.to`, `WordReference`, or names in native script
2. **Use Title Case for generic qualifiers**: Capitalize descriptive qualifiers inside parentheses, such as `History`, `Creative Commons`, `Past Day`, or `Orders`
3. **Site name first**: Never lead with qualifier, filter, or feature
4. **Parentheses for variants**: Use `()` consistently for qualifiers, variants, scopes, and filters; do not use dashes or bare suffixes
5. **No redundant "Search"**: Omit "Search" from names unless it is part of the official product name
6. **Prefer user-facing meaning**: Name the bang for what users understand it does, not for raw URL parameters or implementation details

## Format Patterns

| Category | Format | Examples |
|----------|--------|----------|
| **Default** | `Site Name` | `"GitHub"`, `"YouTube"`, `"Amazon"` |
| **Feature/Section** | `Site Name (Feature)` | `"GitHub (Stars)"`, `"GitHub (Notifications)"`, `"YouTube (History)"` |
| **Content Filter** | `Site Name (Filter)` | `"YouTube (Creative Commons)"`, `"YouTube (Past Day)"`, `"YouTube (Long)"` |
| **Item Type** | `Site Name (Type)` | `"YouTube (Video)"`, `"GitHub (Repo)"`, `"Amazon (ASIN)"` |
| **User/Profile** | `Site Name (User)` | `"Reddit (User)"`, `"GitHub (User)"` |
| **Scoped Entity** | `Site Name (Entity)` | `"YouTube (Game Grumps)"`, `"GitHub (NixOS/nixpkgs)"` |
| **Sub-site Community** | `Site Name (/r/name)` | `"Reddit (/r/AskReddit)"`, `"Reddit (/r/ProgrammerHumor)"` |
| **Market/Region Variant** | `Site Name (Region)` | `"Amazon (UK)"`, `"eBay (Germany)"` |
| **Translation Pair** | `Site Name (xx-yy)` | `"Google Translate (de-en)"`, `"DeepL (en-fr)"`, `"DeepL (pt-en)"` |
| **Auto-Detect Translation** | `Site Name (auto-yy)` | `"Google Translate (auto-en)"`, `"DeepL (auto-de)"` |
| **Kagi-Routed** | `Site Name (Kagi)` | `"4chan (Kagi)"`, `"Ada 2005 Manual (Kagi)"` |

## Language and Region Rules

- **Translation codes**: Use lowercase language codes in source-target form: `xx-yy`
- **Auto-detect source**: Use `auto-yy` for automatic source language detection
- **Script or regional language variants**: Keep the canonical code when needed, such as `zh-CN` or `pt-BR`
- **Normalize legacy codes**: Prefer modern standard codes in names, such as `he` instead of `iw`
- **Do not spell out language names**: Use `de-en`, not `German to English`
- **Region or market variants**: Use a stable, human-readable region label such as `US`, `UK`, `Germany`, or `Japan`; do not mix region codes and country names within the same site family

## Variant Model

Use this simple model whenever possible:

- Base bang: `Site`
- Variant bang: `Site (Qualifier)`
- Scoped entity bang: `Site (Entity)` or `Site (/r/Entity)` when the slash form is meaningful to users

Prefer a single qualifier over stacked qualifiers. If multiple qualifiers could apply, create separate bangs instead of combining them.

If combining is truly necessary, order qualifiers as follows:

1. Scope or entity
2. Feature or section
3. Filter or type
4. Region or locale

## Common Mistakes to Avoid

| Don't | Do Instead |
|-------|------------|
| `"YouTube - long"` | `"YouTube (Long)"` |
| `"Google Translate da-en"` | `"Google Translate (da-en)"` |
| `"Translate English to Danish"` | `"Google Translate (en-da)"` |
| `"deepl.com"` | `"DeepL (fr-en)"` |
| `"reddit.com/r/GlobalOffensive"` | `"Reddit (/r/GlobalOffensive)"` |
| `"GitHub Code Search"` | `"GitHub (Code)"` |
| `"Amazon.com order history"` | `"Amazon (Orders)"` |
| `"101 Domain"` and `"101domain"` | Pick one: `"101 Domain"` |
| `"/r/AskReddit"` | `"Reddit (/r/AskReddit)"` |
| `"r/ADHD"` | `"Reddit (/r/ADHD)"` |
| `"GitHub User"` | `"GitHub (User)"` |
| `"YouTube Video"` | `"YouTube (Video)"` |
| `"Google Translate (de2fr)"` | `"Google Translate (de-fr)"` |
| `"Google Translate (to Arabic)"` | `"Google Translate (auto-ar)"` |

## Examples by Site

### GitHub
```json
{ "s": "GitHub" }
{ "s": "GitHub (Code)" }
{ "s": "GitHub (Stars)" }
{ "s": "GitHub (Notifications)" }
{ "s": "GitHub (Trending)" }
{ "s": "GitHub (Topic)" }
{ "s": "GitHub (User)" }
{ "s": "GitHub (Repo)" }
{ "s": "GitHub (Private)" }
{ "s": "GitHub (JavaScript)" }
{ "s": "GitHub (NixOS/nixpkgs)" }
```

### YouTube
```json
{ "s": "YouTube" }
{ "s": "YouTube (Video)" }
{ "s": "YouTube (Playlists)" }
{ "s": "YouTube (Creative Commons)" }
{ "s": "YouTube (Past Day)" }
{ "s": "YouTube (Long)" }
{ "s": "YouTube (DE)" }
{ "s": "YouTube (US)" }
{ "s": "YouTube (Channel)" }
{ "s": "YouTube (History)" }
{ "s": "YouTube (Game Grumps)" }
```

### Reddit
```json
{ "s": "Reddit" }
{ "s": "Reddit (User)" }
{ "s": "Reddit (/r/AskReddit)" }
{ "s": "Reddit (/r/ProgrammerHumor)" }
{ "s": "Reddit (/r/GlobalOffensive)" }
{ "s": "Reddit (Subreddits)" }
```

### Translation Services
```json
{ "s": "Google Translate (de-en)" }
{ "s": "Google Translate (en-ja)" }
{ "s": "Google Translate (fr-en)" }
{ "s": "Google Translate (auto-ar)" }
{ "s": "Google Translate (zh-CN-en)" }
{ "s": "DeepL (en-de)" }
{ "s": "DeepL (en-fr)" }
{ "s": "DeepL (pt-en)" }
```

### Amazon
```json
{ "s": "Amazon" }
{ "s": "Amazon (ASIN)" }
{ "s": "Amazon (Books)" }
{ "s": "Amazon (Kindle)" }
{ "s": "Amazon (Prime Video)" }
{ "s": "Amazon (Orders)" }
{ "s": "Amazon (Fresh)" }
{ "s": "Amazon (Automotive)" }
```
Loading