Skip to content

Commit 4fc0f2a

Browse files
authored
Merge branch 'master' into akeneo-3-staging-guide-fixes
2 parents a4afb41 + afdc760 commit 4fc0f2a

12 files changed

Lines changed: 355 additions & 7 deletions

docs/_static/css/main.css

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/_static/scss/components/defaults/_module.scss

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,20 @@ p {
2525
}
2626
}
2727

28+
.rst-content table.docutils thead th,
29+
.rst-content table.docutils thead td,
30+
.rst-content table.field-list thead th,
31+
.rst-content table.field-list thead td,
32+
.wy-table thead th,
33+
.wy-table thead td {
34+
p,
35+
a {
36+
font-size: inherit;
37+
font-weight: inherit;
38+
line-height: inherit;
39+
}
40+
}
41+
2842
a {
2943
text-decoration: none;
3044

docs/best-practices/firewall/ftp-waf-database-allowlist.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ Follow these steps to whitelist an IP addresses for FTP:
2626

2727
The `hypernode-systemctl whitelist` command allows you to manage allowlist entries for different services on your Hypernode. You can use it to add, remove, or list allowlist entries for FTP, WAF, database, and SSH.
2828

29+
When [Botstopper](../../hypernode-platform/botstopper/how-to-use-botstopper.md) is enabled, IP addresses on the WAF allowlist are also allowed by Botstopper before custom and standard Botstopper deny or challenge rules are evaluated.
30+
2931
### Command structure
3032

3133
```

docs/best-practices/performance/how-to-fix-performance-issues-caused-by-bots-and-crawlers.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Among the thousands of shops on our platform, **excessive bot traffic is the num
2020

2121
Layered navigation pages do not require crawling, and in fact, their indexation could **produce a penalty** for your search engine ranking, as it produces a lot of duplicate content. So you are advised to resolve this, both for performance and SEO reasons.
2222

23-
With these four measures, you will resolve this situation completely.
23+
With these four measures, you will resolve this situation completely. If you want Hypernode to challenge, deny, or allow bot traffic before it reaches your shop, see [How to Use Botstopper on Hypernode](../../hypernode-platform/botstopper/how-to-use-botstopper.md).
2424

2525
# How to Block Abusive Bots (If Any)
2626

@@ -35,7 +35,7 @@ app@abcdef-example-magweb-cmbl:~$ pnl --yesterday --php --bots --fields ua | sor
3535

3636
```
3737

38-
In this example, there were almost 4K Bingbot pageviews, 2K Google pageviews and almost 13K MegaIndex pageviews. So you could eliminate a large chunk of load by blocking MegaIndex (a shady crawler whose benefits to you are disputable). [Here](../../hypernode-platform/nginx/how-to-block-user-agents-and-referrer-sites.md) are instructions on blocking specific bots on Hypernode.
38+
In this example, there were almost 4K Bingbot pageviews, 2K Google pageviews and almost 13K MegaIndex pageviews. So you could eliminate a large chunk of load by blocking MegaIndex (a shady crawler whose benefits to you are disputable). You can use [Botstopper](../../hypernode-platform/botstopper/how-to-use-botstopper.md) for policy-based bot handling, or use [Nginx rules](../../hypernode-platform/nginx/how-to-block-user-agents-and-referrer-sites.md) to block specific user agents yourself.
3939

4040
# How to Block Bot Access to Layered Navigation
4141

@@ -89,3 +89,4 @@ Make sure all URLs in the layered navigation have “nofollow” in its links. H
8989

9090
- [How to Block Specific Countries From Accessing Your Shop](../../hypernode-platform/nginx/how-to-block-your-webshop-for-specific-countries.md)
9191
- [How to Resolve 429 Too Many Requests](../../hypernode-platform/nginx/how-to-resolve-rate-limited-requests-429-too-many-requests.md)
92+
- [How to Use Botstopper on Hypernode](../../hypernode-platform/botstopper/how-to-use-botstopper.md)

docs/ecommerce-applications/woocommerce/how-to-use-redis-with-woocommerce-and-wordpress-on-hypernode.md

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,44 @@ myst:
1111

1212
Remote Dictionary Server (Redis) is an in-memory, persistent, key-value database known as a data structure server. Unlike similar servers, Redis can store and manipulate high-level data types such as lists, maps, sets, and sorted sets.
1313

14-
By storing important data in its memory, Redis ensures fast data retrieval, significantly boosting performance and reducing response times.
14+
Because Redis stores data in memory, it can return frequently requested data very quickly. This can improve the performance of WordPress and WooCommerce by reducing database load and speeding up response times.
1515

1616
## Which Plugins Can We Use for Redis in WordPress/WooCommerce?
1717

1818
There are several plugins available for Redis. The two most commonly used are [Redis Object Cache](https://wordpress.org/plugins/redis-cache/) and [W3 Total Cache](https://wordpress.org/plugins/w3-total-cache/).
1919

2020
Due to the complexity of the cache module in "W3 Total Cache" and the possibility that you may already be using other cache plugins, we recommend the "Redis Object Cache" plugin.
2121

22+
## How to set a TTL on Redis keys
23+
24+
BBy default, most Redis plugins for WordPress do not set a TTL (time to live) on keys stored in Redis. This means cached keys may remain in memory indefinitely, which can eventually fill up Redis memory and lead to performance issues or downtime.
25+
26+
To set a TTL for all keys stored in Redis, add the following lines to your wp-config.php file:
27+
28+
```console
29+
define('WP_REDIS_PREFIX', 'example');
30+
define('WP_REDIS_MAXTTL', '900');
31+
define('WP_REDIS_SELECTIVE_FLUSH', true);
32+
```
33+
34+
```{important}
35+
Be sure to change the example prefix to a unique name for your application so Redis keys do not get mixed up when Redis is used by multiple applications on the same Hypernode.
36+
```
37+
38+
### Explanation of the wp-config.php options
39+
40+
- **WP_REDIS_PREFIX** adds a clear prefix to your Redis keys. This helps prevent key collisions, especially when multiple applications use Redis.
41+
- **WP_REDIS_MAXTTL** sets a maximum lifetime for cached items, in this example 900 seconds.
42+
- **WP_REDIS_SELECTIVE_FLUSH**\* ensures that only keys related to this WordPress installation are flushed, instead of clearing the entire Redis database.
43+
2244
## How to Install Redis Object Cache
2345

24-
Redis is already active on the server on port `6379`.
46+
Redis is already available on Hypernode and listens on port 6379.
47+
48+
Install the Redis Object Cache plugin through the WordPress Dashboard or with Composer. For general plugin installation steps, see the standard WordPress plugin installation documentation.
49+
50+
After installing and activating the plugin, go to Settings -> Redis or Network Admin -> Settings -> Redis on Multisite networks
2551

26-
Next, install the Redis Object Cache plugin via the WordPress Dashboard or using Composer. For detailed installation instructions, please refer to the standard installation procedure for WordPress plugins.
52+
Enable object caching and verify that the plugin connects to Redis automatically.
2753

28-
After installing and activating the plugin, navigate to `WordPress` -> `Settings` -> `Redis` or `Network Admin` -> `Settings` -> `Redis on Multisite networks`. Enable the cache and check if the plugin can connect automatically.
54+
If the plugin does not connect automatically, check whether Redis is reachable on 127.0.0.1:6379 and confirm that your WordPress configuration does not override the default connection settings.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
myst:
3+
html_meta:
4+
description: This table of contents gives you a summary of all Hypernode platform
5+
knowledge base articles that include information about the botstopper.
6+
title: Botstopper | Hypernode platform
7+
---
8+
9+
# Botstopper
10+
11+
```{toctree}
12+
---
13+
caption: Table of Contents
14+
maxdepth: 1
15+
glob:
16+
---
17+
botstopper/*
18+
```
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
---
2+
myst:
3+
html_meta:
4+
description: Learn how to enable Botstopper, choose an AI policy, and write custom
5+
Botstopper policies on Hypernode.
6+
title: How to use Botstopper on Hypernode | Hypernode
7+
---
8+
9+
# How to Use Botstopper on Hypernode
10+
11+
Bot traffic has changed. Some bots still identify themselves with clear user agents, but many scrapers now use large sets of user agents or pretend to be a normal Chrome browser. That makes simple user-agent blocking less reliable.
12+
13+
Botstopper gives you control over this traffic before it reaches Magento, Shopware, or another application. It checks incoming requests and decides whether they should be allowed, blocked, or challenged. Botstopper is the commercial derivative of the open source project [Anubis](https://anubis.techaro.lol/).
14+
15+
Use Botstopper when bots cause high load, scrape content, crawl expensive layered navigation URLs, or ignore your `robots.txt` file. Botstopper also lets each merchant choose how strict they want to be with AI crawlers and AI clients.
16+
17+
```{tip}
18+
For more background on bot traffic and Magento performance, see [How to Fix Performance Issues Caused by Bots and Crawlers](../../best-practices/performance/how-to-fix-performance-issues-caused-by-bots-and-crawlers.md).
19+
```
20+
21+
## Enable Botstopper
22+
23+
Botstopper is disabled by default on a Hypernode. Enable it with:
24+
25+
```bash
26+
hypernode-systemctl settings botstopper_enabled True
27+
```
28+
29+
Disable it again with:
30+
31+
```bash
32+
hypernode-systemctl settings botstopper_enabled False
33+
```
34+
35+
## Configure Botstopper Per Vhost
36+
37+
Botstopper is enabled per vhost by default. This means that when you enable Botstopper on Hypernode level, Botstopper becomes active for all managed vhosts unless you disabled it for a specific vhost. See [Hypernode Managed Vhosts](../nginx/hypernode-managed-vhosts.md) for more information about vhost configuration.
38+
39+
Disable Botstopper for one vhost with:
40+
41+
```bash
42+
hypernode-manage-vhosts example.com --disable-botstopper
43+
```
44+
45+
Enable it again for that vhost with:
46+
47+
```bash
48+
hypernode-manage-vhosts example.com --botstopper
49+
```
50+
51+
## Choose an AI Policy
52+
53+
Botstopper has three AI policies. The default policy is `aggressive`.
54+
55+
```bash
56+
hypernode-systemctl settings botstopper_ai_policy aggressive
57+
hypernode-systemctl settings botstopper_ai_policy moderate
58+
hypernode-systemctl settings botstopper_ai_policy permissive
59+
```
60+
61+
| Policy | Behavior |
62+
| ------------ | ---------------------------------------------------------------------------------------------------------------- |
63+
| `aggressive` | Blocks AI training crawlers, AI search crawlers, and AI clients as much as possible. |
64+
| `moderate` | Blocks AI training crawlers and unknown AI bots. Allows documented AI search bots and user-triggered AI clients. |
65+
| `permissive` | Allows documented AI bots. Blocks unknown AI-style bots. |
66+
67+
Use `aggressive` if you want the strictest AI blocking. Use `moderate` if you want to block AI training while keeping documented AI search and user tools working. Use `permissive` if you only want to block unclear or undocumented AI crawlers.
68+
69+
Some AI crawlers also require `robots.txt` rules before they respect your opt-out. Botstopper blocks requests at the webserver layer, but `robots.txt` is still useful for crawlers that require policy signals there. See the [Magento 1 robots.txt](../../ecommerce-applications/magento-1/how-to-create-a-robots-txt-for-your-magento-1-shop.md) or [Magento 2 robots.txt](../../ecommerce-applications/magento-2/how-to-create-a-robots-txt-for-magento-2-x.md) articles if you need to configure one.
70+
71+
## How Botstopper Handles Requests
72+
73+
Botstopper evaluates policy rules from top to bottom. A rule can allow, deny, challenge, or weigh a request.
74+
75+
| Action | What happens |
76+
| ----------- | ------------------------------------------------------------ |
77+
| `ALLOW` | The request is sent to your shop immediately. |
78+
| `DENY` | The request is blocked with HTTP `403`. |
79+
| `CHALLENGE` | The visitor receives a browser challenge. |
80+
| `WEIGH` | Suspicion points are added or removed. Evaluation continues. |
81+
82+
`ALLOW`, `DENY`, and `CHALLENGE` stop evaluation immediately. The first matching rule wins.
83+
84+
`WEIGH` does not stop evaluation. Multiple `WEIGH` rules can match the same request. After all rules are checked, Botstopper uses the final weight to decide whether the request should be allowed or challenged.
85+
86+
Challenge responses use HTTP `200`. This is intentional. Many aggressive scraper bots stop retrying once they receive a `200` response.
87+
88+
## Standard Hypernode Policies
89+
90+
Hypernode ships Botstopper with a standard policy that keeps important services working and blocks common abusive traffic.
91+
92+
The standard policy does the following:
93+
94+
1. Allows Hypernode platform services, payment providers, monitoring tools, and common e-commerce integrations.
95+
1. Allows IP addresses on the Hypernode WAF allowlist.
96+
1. Runs your custom pre-policy from `/data/web/botstopper/pre.policy.yml`.
97+
1. Denies sensitive Magento media paths, such as `/media/customer/`, `/media/import/`, and `/media/downloadable/`.
98+
1. Allows storefront assets, such as `/static/`, normal `/media/` files, etc.
99+
1. Denies or weighs known bad bots, headless browsers, abusive cloud ranges, and suspicious HTTP clients.
100+
1. Applies the configured AI policy.
101+
1. Allows known good search engine crawlers when they are verified by IP ranges or reverse DNS.
102+
1. Allows common public files, such as `robots.txt`, `sitemap.xml`, `favicon.ico`, and `.well-known` paths.
103+
1. Adds suspicion weight for some high-risk countries, networks, and browser-like user agents.
104+
1. Runs your custom post-policy from `/data/web/botstopper/post.policy.yml`.
105+
1. Uses the final suspicion weight to allow or challenge the request.
106+
107+
The WAF allowlist is shared with the Hypernode firewall allowlist. Botstopper allows these IPs before your custom `pre.policy.yml` and before the standard deny and challenge rules.
108+
109+
Add a trusted IP to the WAF allowlist with:
110+
111+
```bash
112+
hypernode-systemctl whitelist add waf 1.2.3.4 --description "Office IP"
113+
```
114+
115+
View the WAF allowlist with:
116+
117+
```bash
118+
hypernode-systemctl whitelist get --type waf
119+
```
120+
121+
See [How to allowlist FTP, WAF and database](../../best-practices/firewall/ftp-waf-database-allowlist.md) for more details.
122+
123+
The order matters. For example, a broad `DENY` rule in `pre.policy.yml` can block a good crawler before the standard verified crawler allow rules are reached.
124+
125+
## Write Custom Policies
126+
127+
You can add your own rules in these files:
128+
129+
| File | When it runs | Use it for |
130+
| -------------------------------------- | ---------------------------------------------------- | ----------------------------------------------------------------- |
131+
| `/data/web/botstopper/pre.policy.yml` | Before most standard deny and challenge rules | Allowing trusted traffic or blocking very specific traffic early. |
132+
| `/data/web/botstopper/post.policy.yml` | After standard rules, before final weight thresholds | Adding suspicion weight or handling fallback cases. |
133+
134+
Both files contain a YAML list of policy rules. An empty file looks like this:
135+
136+
```yaml
137+
[]
138+
```
139+
140+
Edit the files as the `app` user. For example:
141+
142+
```bash
143+
sensible-editor /data/web/botstopper/pre.policy.yml
144+
```
145+
146+
After changing a policy file, restart Botstopper:
147+
148+
```bash
149+
hypernode-servicectl restart techaro-botstopper@default.service
150+
```
151+
152+
## Policy Conditions
153+
154+
A policy rule can match on request details, such as the client IP, user agent, path, or headers.
155+
156+
Common fields are:
157+
158+
| Field | Checks |
159+
| ------------------ | ----------------------------------------------------- |
160+
| `remote_addresses` | Client IP address against CIDR ranges. |
161+
| `user_agent_regex` | The `User-Agent` header against a regular expression. |
162+
| `path_regex` | The request path against a regular expression. |
163+
| `headers_regex` | Request headers against regular expressions. |
164+
| `expression` | A custom expression for advanced matching. |
165+
166+
When a rule has multiple conditions, all conditions must match. This is useful for trusted allow rules. For example, you can allow a monitoring tool only when both its user agent and source IP match.
167+
168+
## Examples
169+
170+
Allow a trusted monitoring service:
171+
172+
```yaml
173+
- name: allow-my-monitor
174+
action: ALLOW
175+
user_agent_regex: MyMonitor
176+
remote_addresses:
177+
- 203.0.113.10/32
178+
```
179+
180+
Block a specific bot:
181+
182+
```yaml
183+
- name: block-bad-bot
184+
action: DENY
185+
user_agent_regex: BadBot
186+
```
187+
188+
Challenge traffic to an expensive search page:
189+
190+
```yaml
191+
- name: challenge-suspicious-search
192+
action: CHALLENGE
193+
path_regex: ^/catalogsearch/result/.*
194+
```
195+
196+
Add suspicion weight for bots crawling layered navigation URLs:
197+
198+
```yaml
199+
- name: weigh-layered-navigation-bots
200+
action: WEIGH
201+
path_regex: ^/.*(color|size|brand)=.*
202+
user_agent_regex: (?i:bot|crawler|spider)
203+
weight:
204+
adjust: 20
205+
```
206+
207+
Protect a trusted integration from a broader custom rule:
208+
209+
```yaml
210+
- name: allow-partner-feed
211+
action: ALLOW
212+
path_regex: ^/partner/feed/.*
213+
user_agent_regex: PartnerFeedClient
214+
remote_addresses:
215+
- 198.51.100.0/24
216+
```
217+
218+
Allow JSON API requests, using [CEL expressions](https://anubis.techaro.lol/docs/admin/configuration/expressions):
219+
220+
```yaml
221+
- name: allow-api-requests
222+
action: ALLOW
223+
expression:
224+
all:
225+
- '"Accept" in headers'
226+
- 'headers["Accept"] == "application/json"'
227+
- 'path.startsWith("/api/")'
228+
```
229+
230+
You usually do not need allow rules for API or webhook traffic. Botstopper allows traffic by default. Use an `ALLOW` rule when you already have, or plan to add, a broader custom rule that could otherwise challenge or block this trusted traffic.
231+
232+
## Logging
233+
234+
The botstopper service logs to `/var/log/botstopper/botstopper.log`. The log file consists [JSON Lines](https://jsonlines.org/), meaning that each line in the log file is a JSON-parseable line.
235+
236+
You can render the entire log file:
237+
238+
```bash
239+
cat /var/log/botstopper/botstopper.log | jq .
240+
```
241+
242+
Or follow the log file
243+
244+
```bash
245+
tail -f /var/log/botstopper/botstopper.log | jq .
246+
```
247+
248+
## Safe Policy Changes
249+
250+
Use specific rules whenever possible. Broad user-agent rules can block legitimate crawlers or integrations.
251+
252+
Prefer `WEIGH` when you are not fully sure traffic should be blocked. A `WEIGH` rule lets Botstopper combine multiple signals before it challenges the request.
253+
254+
Use `ALLOW` with both a user agent and IP range for trusted services when possible. User agents can be spoofed, IP ranges are harder to fake.
255+
256+
Use the WAF allowlist for trusted source IPs that should always bypass Botstopper checks. This is often better than maintaining your own IP allow rule in `pre.policy.yml`.
257+
258+
Keep custom `DENY` rules narrow. A broad `DENY` rule in `pre.policy.yml` can override the standard Hypernode allow rules that run later.
259+
260+
Do not add allow rules for every API endpoint or webhook. Add them when a specific Botstopper rule would otherwise match that traffic.
261+
262+
## Anubis Documentation
263+
264+
Because Botstopper is the commercial derivative of Anubis, the Anubis documentation is a useful reference when you want to understand the underlying concepts or write advanced custom policies:
265+
266+
- [How Anubis works](https://anubis.techaro.lol/docs/design/how-anubis-works)
267+
- [Anubis policies](https://anubis.techaro.lol/docs/admin/policies/)
268+
- [Anubis policy thresholds](https://anubis.techaro.lol/docs/admin/configuration/thresholds)

docs/hypernode-platform/nginx/how-to-block-user-agents-and-referrer-sites.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ redirect_from:
1515

1616
**Blocking IP addresses, User Agents or Referres may cause unforseen issues, since it's easy to block more then expected.**
1717

18+
If your goal is to manage bot traffic more broadly, consider [Botstopper](../botstopper/how-to-use-botstopper.md). Botstopper can allow, deny, challenge, or weigh requests before they reach your application.
19+
1820
## How to Block User Agents
1921

2022
First you need to know what User Agent you wish to block. You can retrieve such information from the access logs (`/var/log/nginx/access.log`)

0 commit comments

Comments
 (0)