|
| 1 | +--- |
| 2 | +myst: |
| 3 | + html_meta: |
| 4 | + description: Learn how to enable Botstopper, choose an AI policy, and write custom |
| 5 | + Botstopper policies on Hypernode. |
| 6 | + title: How to use Botstopper on Hypernode | Hypernode |
| 7 | +--- |
| 8 | + |
| 9 | +# How to Use Botstopper on Hypernode |
| 10 | + |
| 11 | +Bot traffic has changed. Some bots still identify themselves with clear user agents, but many scrapers now use large sets of user agents or pretend to be a normal Chrome browser. That makes simple user-agent blocking less reliable. |
| 12 | + |
| 13 | +Botstopper gives you control over this traffic before it reaches Magento, Shopware, or another application. It checks incoming requests and decides whether they should be allowed, blocked, or challenged. Botstopper is the commercial derivative of the open source project [Anubis](https://anubis.techaro.lol/). |
| 14 | + |
| 15 | +Use Botstopper when bots cause high load, scrape content, crawl expensive layered navigation URLs, or ignore your `robots.txt` file. Botstopper also lets each merchant choose how strict they want to be with AI crawlers and AI clients. |
| 16 | + |
| 17 | +```{tip} |
| 18 | +For more background on bot traffic and Magento performance, see [How to Fix Performance Issues Caused by Bots and Crawlers](../../best-practices/performance/how-to-fix-performance-issues-caused-by-bots-and-crawlers.md). |
| 19 | +``` |
| 20 | + |
| 21 | +## Enable Botstopper |
| 22 | + |
| 23 | +Botstopper is disabled by default on a Hypernode. Enable it with: |
| 24 | + |
| 25 | +```bash |
| 26 | +hypernode-systemctl settings botstopper_enabled True |
| 27 | +``` |
| 28 | + |
| 29 | +Disable it again with: |
| 30 | + |
| 31 | +```bash |
| 32 | +hypernode-systemctl settings botstopper_enabled False |
| 33 | +``` |
| 34 | + |
| 35 | +## Configure Botstopper Per Vhost |
| 36 | + |
| 37 | +Botstopper is enabled per vhost by default. This means that when you enable Botstopper on Hypernode level, Botstopper becomes active for all managed vhosts unless you disabled it for a specific vhost. See [Hypernode Managed Vhosts](../nginx/hypernode-managed-vhosts.md) for more information about vhost configuration. |
| 38 | + |
| 39 | +Disable Botstopper for one vhost with: |
| 40 | + |
| 41 | +```bash |
| 42 | +hypernode-manage-vhosts example.com --disable-botstopper |
| 43 | +``` |
| 44 | + |
| 45 | +Enable it again for that vhost with: |
| 46 | + |
| 47 | +```bash |
| 48 | +hypernode-manage-vhosts example.com --botstopper |
| 49 | +``` |
| 50 | + |
| 51 | +## Choose an AI Policy |
| 52 | + |
| 53 | +Botstopper has three AI policies. The default policy is `aggressive`. |
| 54 | + |
| 55 | +```bash |
| 56 | +hypernode-systemctl settings botstopper_ai_policy aggressive |
| 57 | +hypernode-systemctl settings botstopper_ai_policy moderate |
| 58 | +hypernode-systemctl settings botstopper_ai_policy permissive |
| 59 | +``` |
| 60 | + |
| 61 | +| Policy | Behavior | |
| 62 | +| ------------ | ---------------------------------------------------------------------------------------------------------------- | |
| 63 | +| `aggressive` | Blocks AI training crawlers, AI search crawlers, and AI clients as much as possible. | |
| 64 | +| `moderate` | Blocks AI training crawlers and unknown AI bots. Allows documented AI search bots and user-triggered AI clients. | |
| 65 | +| `permissive` | Allows documented AI bots. Blocks unknown AI-style bots. | |
| 66 | + |
| 67 | +Use `aggressive` if you want the strictest AI blocking. Use `moderate` if you want to block AI training while keeping documented AI search and user tools working. Use `permissive` if you only want to block unclear or undocumented AI crawlers. |
| 68 | + |
| 69 | +Some AI crawlers also require `robots.txt` rules before they respect your opt-out. Botstopper blocks requests at the webserver layer, but `robots.txt` is still useful for crawlers that require policy signals there. See the [Magento 1 robots.txt](../../ecommerce-applications/magento-1/how-to-create-a-robots-txt-for-your-magento-1-shop.md) or [Magento 2 robots.txt](../../ecommerce-applications/magento-2/how-to-create-a-robots-txt-for-magento-2-x.md) articles if you need to configure one. |
| 70 | + |
| 71 | +## How Botstopper Handles Requests |
| 72 | + |
| 73 | +Botstopper evaluates policy rules from top to bottom. A rule can allow, deny, challenge, or weigh a request. |
| 74 | + |
| 75 | +| Action | What happens | |
| 76 | +| ----------- | ------------------------------------------------------------ | |
| 77 | +| `ALLOW` | The request is sent to your shop immediately. | |
| 78 | +| `DENY` | The request is blocked with HTTP `403`. | |
| 79 | +| `CHALLENGE` | The visitor receives a browser challenge. | |
| 80 | +| `WEIGH` | Suspicion points are added or removed. Evaluation continues. | |
| 81 | + |
| 82 | +`ALLOW`, `DENY`, and `CHALLENGE` stop evaluation immediately. The first matching rule wins. |
| 83 | + |
| 84 | +`WEIGH` does not stop evaluation. Multiple `WEIGH` rules can match the same request. After all rules are checked, Botstopper uses the final weight to decide whether the request should be allowed or challenged. |
| 85 | + |
| 86 | +Challenge responses use HTTP `200`. This is intentional. Many aggressive scraper bots stop retrying once they receive a `200` response. |
| 87 | + |
| 88 | +## Standard Hypernode Policies |
| 89 | + |
| 90 | +Hypernode ships Botstopper with a standard policy that keeps important services working and blocks common abusive traffic. |
| 91 | + |
| 92 | +The standard policy does the following: |
| 93 | + |
| 94 | +1. Allows Hypernode platform services, payment providers, monitoring tools, and common e-commerce integrations. |
| 95 | +1. Allows IP addresses on the Hypernode WAF allowlist. |
| 96 | +1. Runs your custom pre-policy from `/data/web/botstopper/pre.policy.yml`. |
| 97 | +1. Denies sensitive Magento media paths, such as `/media/customer/`, `/media/import/`, and `/media/downloadable/`. |
| 98 | +1. Allows storefront assets, such as `/static/`, normal `/media/` files, etc. |
| 99 | +1. Denies or weighs known bad bots, headless browsers, abusive cloud ranges, and suspicious HTTP clients. |
| 100 | +1. Applies the configured AI policy. |
| 101 | +1. Allows known good search engine crawlers when they are verified by IP ranges or reverse DNS. |
| 102 | +1. Allows common public files, such as `robots.txt`, `sitemap.xml`, `favicon.ico`, and `.well-known` paths. |
| 103 | +1. Adds suspicion weight for some high-risk countries, networks, and browser-like user agents. |
| 104 | +1. Runs your custom post-policy from `/data/web/botstopper/post.policy.yml`. |
| 105 | +1. Uses the final suspicion weight to allow or challenge the request. |
| 106 | + |
| 107 | +The WAF allowlist is shared with the Hypernode firewall allowlist. Botstopper allows these IPs before your custom `pre.policy.yml` and before the standard deny and challenge rules. |
| 108 | + |
| 109 | +Add a trusted IP to the WAF allowlist with: |
| 110 | + |
| 111 | +```bash |
| 112 | +hypernode-systemctl whitelist add waf 1.2.3.4 --description "Office IP" |
| 113 | +``` |
| 114 | + |
| 115 | +View the WAF allowlist with: |
| 116 | + |
| 117 | +```bash |
| 118 | +hypernode-systemctl whitelist get --type waf |
| 119 | +``` |
| 120 | + |
| 121 | +See [How to allowlist FTP, WAF and database](../../best-practices/firewall/ftp-waf-database-allowlist.md) for more details. |
| 122 | + |
| 123 | +The order matters. For example, a broad `DENY` rule in `pre.policy.yml` can block a good crawler before the standard verified crawler allow rules are reached. |
| 124 | + |
| 125 | +## Write Custom Policies |
| 126 | + |
| 127 | +You can add your own rules in these files: |
| 128 | + |
| 129 | +| File | When it runs | Use it for | |
| 130 | +| -------------------------------------- | ---------------------------------------------------- | ----------------------------------------------------------------- | |
| 131 | +| `/data/web/botstopper/pre.policy.yml` | Before most standard deny and challenge rules | Allowing trusted traffic or blocking very specific traffic early. | |
| 132 | +| `/data/web/botstopper/post.policy.yml` | After standard rules, before final weight thresholds | Adding suspicion weight or handling fallback cases. | |
| 133 | + |
| 134 | +Both files contain a YAML list of policy rules. An empty file looks like this: |
| 135 | + |
| 136 | +```yaml |
| 137 | +[] |
| 138 | +``` |
| 139 | + |
| 140 | +Edit the files as the `app` user. For example: |
| 141 | + |
| 142 | +```bash |
| 143 | +sensible-editor /data/web/botstopper/pre.policy.yml |
| 144 | +``` |
| 145 | + |
| 146 | +After changing a policy file, restart Botstopper: |
| 147 | + |
| 148 | +```bash |
| 149 | +hypernode-servicectl restart techaro-botstopper@default.service |
| 150 | +``` |
| 151 | + |
| 152 | +## Policy Conditions |
| 153 | + |
| 154 | +A policy rule can match on request details, such as the client IP, user agent, path, or headers. |
| 155 | + |
| 156 | +Common fields are: |
| 157 | + |
| 158 | +| Field | Checks | |
| 159 | +| ------------------ | ----------------------------------------------------- | |
| 160 | +| `remote_addresses` | Client IP address against CIDR ranges. | |
| 161 | +| `user_agent_regex` | The `User-Agent` header against a regular expression. | |
| 162 | +| `path_regex` | The request path against a regular expression. | |
| 163 | +| `headers_regex` | Request headers against regular expressions. | |
| 164 | +| `expression` | A custom expression for advanced matching. | |
| 165 | + |
| 166 | +When a rule has multiple conditions, all conditions must match. This is useful for trusted allow rules. For example, you can allow a monitoring tool only when both its user agent and source IP match. |
| 167 | + |
| 168 | +## Examples |
| 169 | + |
| 170 | +Allow a trusted monitoring service: |
| 171 | + |
| 172 | +```yaml |
| 173 | +- name: allow-my-monitor |
| 174 | + action: ALLOW |
| 175 | + user_agent_regex: MyMonitor |
| 176 | + remote_addresses: |
| 177 | + - 203.0.113.10/32 |
| 178 | +``` |
| 179 | +
|
| 180 | +Block a specific bot: |
| 181 | +
|
| 182 | +```yaml |
| 183 | +- name: block-bad-bot |
| 184 | + action: DENY |
| 185 | + user_agent_regex: BadBot |
| 186 | +``` |
| 187 | +
|
| 188 | +Challenge traffic to an expensive search page: |
| 189 | +
|
| 190 | +```yaml |
| 191 | +- name: challenge-suspicious-search |
| 192 | + action: CHALLENGE |
| 193 | + path_regex: ^/catalogsearch/result/.* |
| 194 | +``` |
| 195 | +
|
| 196 | +Add suspicion weight for bots crawling layered navigation URLs: |
| 197 | +
|
| 198 | +```yaml |
| 199 | +- name: weigh-layered-navigation-bots |
| 200 | + action: WEIGH |
| 201 | + path_regex: ^/.*(color|size|brand)=.* |
| 202 | + user_agent_regex: (?i:bot|crawler|spider) |
| 203 | + weight: |
| 204 | + adjust: 20 |
| 205 | +``` |
| 206 | +
|
| 207 | +Protect a trusted integration from a broader custom rule: |
| 208 | +
|
| 209 | +```yaml |
| 210 | +- name: allow-partner-feed |
| 211 | + action: ALLOW |
| 212 | + path_regex: ^/partner/feed/.* |
| 213 | + user_agent_regex: PartnerFeedClient |
| 214 | + remote_addresses: |
| 215 | + - 198.51.100.0/24 |
| 216 | +``` |
| 217 | +
|
| 218 | +Allow JSON API requests, using [CEL expressions](https://anubis.techaro.lol/docs/admin/configuration/expressions): |
| 219 | +
|
| 220 | +```yaml |
| 221 | +- name: allow-api-requests |
| 222 | + action: ALLOW |
| 223 | + expression: |
| 224 | + all: |
| 225 | + - '"Accept" in headers' |
| 226 | + - 'headers["Accept"] == "application/json"' |
| 227 | + - 'path.startsWith("/api/")' |
| 228 | +``` |
| 229 | +
|
| 230 | +You usually do not need allow rules for API or webhook traffic. Botstopper allows traffic by default. Use an `ALLOW` rule when you already have, or plan to add, a broader custom rule that could otherwise challenge or block this trusted traffic. |
| 231 | + |
| 232 | +## Logging |
| 233 | + |
| 234 | +The botstopper service logs to `/var/log/botstopper/botstopper.log`. The log file consists [JSON Lines](https://jsonlines.org/), meaning that each line in the log file is a JSON-parseable line. |
| 235 | + |
| 236 | +You can render the entire log file: |
| 237 | + |
| 238 | +```bash |
| 239 | +cat /var/log/botstopper/botstopper.log | jq . |
| 240 | +``` |
| 241 | + |
| 242 | +Or follow the log file |
| 243 | + |
| 244 | +```bash |
| 245 | +tail -f /var/log/botstopper/botstopper.log | jq . |
| 246 | +``` |
| 247 | + |
| 248 | +## Safe Policy Changes |
| 249 | + |
| 250 | +Use specific rules whenever possible. Broad user-agent rules can block legitimate crawlers or integrations. |
| 251 | + |
| 252 | +Prefer `WEIGH` when you are not fully sure traffic should be blocked. A `WEIGH` rule lets Botstopper combine multiple signals before it challenges the request. |
| 253 | + |
| 254 | +Use `ALLOW` with both a user agent and IP range for trusted services when possible. User agents can be spoofed, IP ranges are harder to fake. |
| 255 | + |
| 256 | +Use the WAF allowlist for trusted source IPs that should always bypass Botstopper checks. This is often better than maintaining your own IP allow rule in `pre.policy.yml`. |
| 257 | + |
| 258 | +Keep custom `DENY` rules narrow. A broad `DENY` rule in `pre.policy.yml` can override the standard Hypernode allow rules that run later. |
| 259 | + |
| 260 | +Do not add allow rules for every API endpoint or webhook. Add them when a specific Botstopper rule would otherwise match that traffic. |
| 261 | + |
| 262 | +## Anubis Documentation |
| 263 | + |
| 264 | +Because Botstopper is the commercial derivative of Anubis, the Anubis documentation is a useful reference when you want to understand the underlying concepts or write advanced custom policies: |
| 265 | + |
| 266 | +- [How Anubis works](https://anubis.techaro.lol/docs/design/how-anubis-works) |
| 267 | +- [Anubis policies](https://anubis.techaro.lol/docs/admin/policies/) |
| 268 | +- [Anubis policy thresholds](https://anubis.techaro.lol/docs/admin/configuration/thresholds) |
0 commit comments