Skip to content

Commit 010b5cf

Browse files
committed
Add googlebot IP passthru [minor] (#71)
1 parent 5db058d commit 010b5cf

File tree

8 files changed

+374
-194
lines changed

8 files changed

+374
-194
lines changed

CLAUDE.md

Lines changed: 0 additions & 187 deletions
This file was deleted.

README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,10 @@ services:
6262
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.captchaProvider: turnstile
6363
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.siteKey: ${TURNSTILE_SITE_KEY}
6464
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.secretKey: ${TURNSTILE_SECRET_KEY}
65-
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.goodBots: apple.com,archive.org,commoncrawl.org,duckduckgo.com,facebook.com,google.com,googlebot.com,googleusercontent.com,instagram.com,kagibot.org,linkedin.com,msn.com,openalex.org,twitter.com,x.com
65+
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.goodBots: apple.com,archive.org,commoncrawl.org,duckduckgo.com,facebook.com,google.com,instagram.com,kagibot.org,linkedin.com,msn.com,openalex.org,twitter.com,x.com
6666
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.persistentStateFile: /tmp/state.json
6767
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableStateReconciliation: "false"
68+
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableGooglebotIPCheck: "true"
6869
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.periodSeconds: 30
6970
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.failureThreshold: 3
7071
networks:
@@ -82,7 +83,7 @@ services:
8283
--providers.docker=true
8384
--providers.docker.network=default
8485
--experimental.plugins.captcha-protect.modulename=github.com/libops/captcha-protect
85-
--experimental.plugins.captcha-protect.version=v1.11.1
86+
--experimental.plugins.captcha-protect.version=v1.12.0
8687
volumes:
8788
- /var/run/docker.sock:/var/run/docker.sock:z
8889
- /CHANGEME/TO/A/HOST/PATH/FOR/STATE/FILE:/tmp/state.json:rw
@@ -117,6 +118,7 @@ services:
117118
| `ipForwardedHeader` | `string` | `""` | Header to check for the original client IP if Traefik is behind a load balancer. |
118119
| `ipDepth` | `int` | `0` | How deep past the last non-exempt IP to fetch the real IP from `ipForwardedHeader`. Default 0 returns the last IP in the forward header |
119120
| `goodBots` | `[]string` (encouraged) | *see below* | List of second-level domains for bots that are never challenged or rate-limited. |
121+
| `enableGooglebotIPCheck`| `string`. | `"false"` | Treat IPs coming from googlebot's known IP ranges as good bots |
120122
| `protectParameters` | `string` | `"false"` | Forces rate limiting even for good bots if URL parameters are present. Useful for protecting faceted search pages. |
121123
| `protectFileExtensions` | `[]string` | `""` | Comma-separated file extensions to protect. By default, your protected routes only protect html files. This is to prevent files like CSS/JS/img from tripping the rate limit. |
122124
| `protectHttpMethods` | `[]string` | `"GET,HEAD"` | Comma-separated list of HTTP methods to protect against |
@@ -152,14 +154,17 @@ The circuit breaker provides automatic failover when the primary captcha provide
152154

153155
### Good Bots
154156

155-
To avoid having this middleware impact your SEO score, it's recommended to provide a value for `goodBots`. By default, no bots will be allowed to crawl your protected routes beyond the rate limit unless their second level domain (e.g. `google.com`) is configured as a good bot.
157+
To avoid having this middleware impact your SEO score, it's recommended to provide a value for `goodBots`. By default, no bots will be allowed to crawl your protected routes beyond the rate limit unless their second level domain (e.g. `bing.com`) is configured as a good bot.
156158

157159
A good default value for `goodBots` would be:
158160

159161
```
160-
goodBots: apple.com,archive.org,duckduckgo.com,facebook.com,google.com,googlebot.com,googleusercontent.com,instagram.com,kagibot.org,linkedin.com,msn.com,openalex.org,twitter.com,x.com
162+
enableGooglebotIPCheck: "true"
163+
goodBots: apple.com,archive.org,duckduckgo.com,facebook.com,google.com,instagram.com,kagibot.org,linkedin.com,msn.com,openalex.org,twitter.com,x.com
161164
```
162165

166+
Since google publishes their bot IPs, we can also leverage their API to let google crawl the site unchallenged based on client IP. This can be enabled with `enableGooglebotIPCheck: "true"`
167+
163168
**However** if you set the config parameter `protectParameters="true"`, even good bots won't be allowed to crawl protected routes if a URL parameter is on the request (e.g. `/foo?bar=baz`). This `protectParameters` feature is meant to help protect faceted search pages.
164169

165170

ci/docker-compose.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ services:
1818
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableStatsPage: "true"
1919
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.ipForwardedHeader: "X-Forwarded-For"
2020
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.logLevel: "DEBUG"
21+
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.protectParameters: "${PROTECT_PARAMETERS:-false}"
2122
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.goodBots: ""
23+
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableGooglebotIPCheck: "true"
2224
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.protectRoutes: "/"
2325
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.persistentStateFile: "/tmp/state.json"
2426
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableStateReconciliation: "true"
@@ -47,7 +49,9 @@ services:
4749
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableStatsPage: "true"
4850
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.ipForwardedHeader: "X-Forwarded-For"
4951
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.logLevel: "DEBUG"
52+
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.protectParameters: "${PROTECT_PARAMETERS:-false}"
5053
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.goodBots: ""
54+
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableGooglebotIPCheck: "true"
5155
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.protectRoutes: "/"
5256
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.persistentStateFile: "/tmp/state.json"
5357
traefik.http.middlewares.captcha-protect.plugin.captcha-protect.enableStateReconciliation: "true"

0 commit comments

Comments
 (0)