Skip to content

Conversation

@tkotthakota-adobe
Copy link
Contributor

@tkotthakota-adobe tkotthakota-adobe commented Dec 20, 2025

  • Include CHALLENGE_PATTERNS to add additional bot protection checks
  • Add additional methods to use in content scraper
  • Read allowlisted IPs as environment variables (stored in secrets manager)

Detected Bot Protection Types:

  • ✅ Cloudflare (cf-ray header + 403)
  • ✅ Imperva/Incapsula (x-iinfo header + 403)
  • ✅ Akamai (x-akamai-request-id + 403) - Added in v1.86.0
  • ✅ Fastly (x-served-by + 403) - Added in v1.86.0
  • ✅ AWS CloudFront (x-amz-cf-id + 403) - Added in v1.86.0
  • ✅ HTTP/2 stream errors (NGHTTP2_INTERNAL_ERROR)

Confidence Levels:

  • 1.0 (ABSOLUTE) - Site is crawlable (200 OK)
  • 0.99 (HIGH) - Known bot blocker detected
  • 0.95 (MEDIUM) - HTTP/2 protocol errors
  • 0.5 - Unknown 403 without blocker signature
  • 0.3 - Unknown error

https://jira.corp.adobe.com/browse/SITES-37727

@tkotthakota-adobe tkotthakota-adobe marked this pull request as draft December 23, 2025 02:28
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

This PR will trigger no release when merged.

@tkotthakota-adobe tkotthakota-adobe marked this pull request as ready for review January 11, 2026 21:04
@tkotthakota-adobe tkotthakota-adobe merged commit 0c34a8d into main Jan 26, 2026
7 checks passed
@tkotthakota-adobe tkotthakota-adobe deleted the SITES-37727 branch January 26, 2026 22:32
solaris007 pushed a commit that referenced this pull request Jan 26, 2026
# [@adobe/spacecat-shared-utils-v1.89.1](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.89.0...@adobe/spacecat-shared-utils-v1.89.1) (2026-01-26)

### Bug Fixes

* Additional checks and methods on bot protection ([#1250](#1250)) ([0c34a8d](0c34a8d))
@solaris007
Copy link
Member

🎉 This PR is included in version @adobe/spacecat-shared-utils-v1.89.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

anshikag-adobe pushed a commit that referenced this pull request Jan 27, 2026
- Include CHALLENGE_PATTERNS to add additional bot protection checks
- Add additional methods to use in content scraper
- Read allowlisted IPs as environment variables (stored in secrets
manager)

Detected Bot Protection Types:

-    ✅ Cloudflare (cf-ray header + 403)
-    ✅ Imperva/Incapsula (x-iinfo header + 403)
-    ✅ Akamai (x-akamai-request-id + 403) - Added in v1.86.0
-    ✅ Fastly (x-served-by + 403) - Added in v1.86.0
-    ✅ AWS CloudFront (x-amz-cf-id + 403) - Added in v1.86.0
-    ✅ HTTP/2 stream errors (NGHTTP2_INTERNAL_ERROR)

Confidence Levels:

-   1.0 (ABSOLUTE) - Site is crawlable (200 OK)
-   0.99 (HIGH) - Known bot blocker detected
-   0.95 (MEDIUM) - HTTP/2 protocol errors
-   0.5 - Unknown 403 without blocker signature
-   0.3 - Unknown error

https://jira.corp.adobe.com/browse/SITES-37727
anshikag-adobe pushed a commit that referenced this pull request Jan 27, 2026
# [@adobe/spacecat-shared-utils-v1.89.1](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-utils-v1.89.0...@adobe/spacecat-shared-utils-v1.89.1) (2026-01-26)

### Bug Fixes

* Additional checks and methods on bot protection ([#1250](#1250)) ([0c34a8d](0c34a8d))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants