Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 24 additions & 22 deletions src/content/docs/cache/troubleshooting/always-online.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,44 +9,46 @@ head:
content: Always Online - Troubleshooting
---

Observe the following best practices when enabling Always Online with Internet Archive integration.
[Always Online](/cache/how-to/always-online/) serves archived versions of your website when your origin server is unreachable. It relies on the [Internet Archive](https://archive.org/) to crawl and store your pages. The following best practices help avoid common problems with this integration.

- **Allow requests from the Internet Archive IP addresses.** Origin servers receive requests from the Internet Archive IPs. Make sure you are not blocking requests from the Internet Archive IP range: `207.241.224.0/20` and `208.70.24.0/21`.
- **The Internet Archive does not consider your origin server's cache-control header.** When the Internet Archive is crawling sites, it will crawl sites regardless of their cache-control, since the Internet Archive does not cache assets, but archives them.
- **Consider potential conflicts with Cloudflare features that transform URIs.** Always Online with Internet Archive integration may cause issues with Cache Rules and other Cloudflare features that transform URIs due to the way the Internet Archive crawls pages to archive. Specifically, some redirects that take place at the edge may cause the Internet Archive's crawler not to archive the target URL. Before enabling Origin Cache Control, review [how Cloudflare caches resources by default](/cache/concepts/default-cache-behavior/) as well as any Cache Rules you have configured so that you can avoid these issues. If you experience problems, disable Always Online.
- **Do not block Known Bots or Verified Bots via a WAF custom rule.** If you block either of these bot lists, the Internet Archive will not be able to crawl.
## Best practices

- **Allow requests from the Internet Archive IP addresses.** The Internet Archive crawler sends requests directly to your origin server. Make sure you are not blocking the Internet Archive IP ranges: `207.241.224.0/20` and `208.70.24.0/21`.
- **The Internet Archive does not consider your origin server's `Cache-Control` header.** The Internet Archive archives pages regardless of `Cache-Control` directives because it archives content rather than caching it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider reframing this bullet in terms of what the user should do or not do

- **Consider potential conflicts with Cloudflare features that transform URIs.** Always Online with Internet Archive integration may cause issues with [Cache Rules](/cache/how-to/cache-rules/) and other Cloudflare features that transform URIs. Some redirects that take place at the edge may prevent the Internet Archive crawler from archiving the target URL. Review [how Cloudflare caches resources by default](/cache/concepts/default-cache-behavior/) and any Cache Rules you have configured. If you experience problems, disable Always Online.
- **Do not block Known Bots or Verified Bots via a WAF custom rule.** If you block either of these bot categories, the Internet Archive crawler cannot reach your site.

## Incompatible configurations

Do not use Always Online with:

- API traffic.
- An [IP Access rule](/waf/tools/ip-access-rules/) or a [WAF custom rule](/waf/custom-rules/) that blocks the United States or
- Bypass Cache cache rules. Always Online ignores Bypass Cache cache rules and serves Always Online cached assets.
- An [IP Access rule](/waf/tools/ip-access-rules/) or a [WAF custom rule](/waf/custom-rules/) that blocks the United States.
- Bypass Cache cache rules. Always Online ignores Bypass Cache cache rules and serves Always Online cached assets instead.

## Limitations

There are limitations with the Always Online functionality:

1. Always Online is not immediately active for sites recently added due to:
- DNS record propagation, which can take 24-72 hours
- Always Online has not initially crawled the website
2. Cloudflare cannot show private content behind logins or handle form submission (POSTs) if your origin web server is offline.
- DNS record propagation, which can take 24-72 hours.
- The Internet Archive has not yet crawled the website.
2. Cloudflare cannot show private content behind logins or handle form submissions (`POST` requests) when your origin server is offline.

Always Online does not trigger for HTTP response codes such as [404](/support/troubleshooting/http-status-codes/4xx-client-error/error-404/), [503](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-503/), or [500](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-500/) errors such as database connection errors or internal server errors.
Always Online only activates when your origin is completely unreachable (Cloudflare [520-527](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-520/) errors). It does not activate for HTTP response codes that the origin itself returns, such as [404](/support/troubleshooting/http-status-codes/4xx-client-error/error-404/), [500](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-500/), or [503](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-503/).

## Frequently asked questions
Comment thread
Oxyjun marked this conversation as resolved.

1. How can I know if a page has been crawled?
### How can I check if a page has been crawled?

Search for the page URL on the [Internet Archive](https://web.archive.org/) or query the [Internet Archive Availability API](https://archive.org/help/wayback_api.php).

- You can go to the [Internet Archive](https://web.archive.org/) and search for the page URL to see if it has been crawled or not.
- You can also check this via the [Internet Archive Availability API](https://archive.org/help/wayback_api.php).
### Why are some pages not crawled?

2. Why were not pages x, y, and z crawled?
Cloudflare only requests the Internet Archive to crawl the most popular pages on your site, based on `GET` requests that returned a `200` status code in the previous five hours. Some pages may not be archived. To archive a specific page, submit it through the [Internet Archive save page](https://web.archive.org/save).

- Since Cloudflare only requests to crawl the most popular pages on the site, it is possible that there will be missing pages. If you really want to archive a page, then you can visit the [Internet Archive](https://web.archive.org/save) save page and ask them to crawl a particular page.
### What IP addresses should I allowlist?

3. What IP addresses do we need to allowlist to make sure crawling works?
Add the Internet Archive IP ranges to your allowlist: `207.241.224.0/20` and `208.70.24.0/21`. These IP addresses belong to the Internet Archive, not Cloudflare.

- IP Range: `207.241.224.0/20` and `208.70.24.0/21`. Note that this ip range belongs to Internet Archive and NOT Cloudflare, since it is the Internet Archive that does the crawling.
### What user agent should the origin expect?

4. What user agent should the origin expect to see?
- Currently the Internet Archive uses: `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/605.1.15 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/605.1.15`.
The Internet Archive currently uses: `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/605.1.15 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/605.1.15`.
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ This usually happens when all of the following are true:
- The origin sends a `Set-Cookie` header.
- An [Edge TTL](/cache/how-to/cache-rules/settings/#edge-ttl) or status-code TTL overrides origin cache directives.

In this configuration, Cloudflare can cache the response and remove the `Set-Cookie` header before the response is stored at the edge. As a result, the browser receives the login page but never gets the session cookie required for the next request.
When an Edge TTL override forces caching, Cloudflare removes the `Set-Cookie` header from the stored response. Subsequent visitors receive the cached login page without the session cookie their browser needs for the next request. For the full set of rules governing this behavior, refer to [Interaction of Set-Cookie response header with Cache](/cache/concepts/cache-behavior/#interaction-of-set-cookie-response-header-with-cache).

### How to confirm

Check the response for the login page or other dynamic route.
Check the response headers for the login page or other dynamic route.

If you see both of the following, the page is probably cached when it should not be:

Expand All @@ -46,7 +46,7 @@ You may also see framework-specific failures after form submission, for example:

- A redirect back to the login page
- A `403` or `500` after sign-in
- CSRF validation errors
- CSRF (cross-site request forgery) validation errors
- Missing server-side session state

This issue is common with frameworks that rely on a session or CSRF cookie on the first page load, including JavaServer Faces, ASP.NET, PHP session handlers, Django, Rails, and Laravel.
Expand All @@ -62,8 +62,6 @@ Instead:
3. If the origin must control caching, remove any Edge TTL override that forces the page to be cached.
4. Verify the fixed response now returns `CF-Cache-Status: DYNAMIC`, `MISS`, or `BYPASS`, and preserves `Set-Cookie`.

For more information on cookie behavior, refer to [Interaction of Set-Cookie response header with Cache](/cache/concepts/cache-behavior/#interaction-of-set-cookie-response-header-with-cache).

## Challenge loops on login or form flows

Security challenges can also interrupt dynamic flows.
Expand Down
2 changes: 1 addition & 1 deletion src/content/docs/cache/troubleshooting/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ sidebar:

import { DirectoryListing } from "~/components"

The following topics are useful for troubleshooting Cache issues.
The following topics cover common Cache issues and how to resolve them.

<DirectoryListing />
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,26 @@ title: Issues with MP4 videos on iOS and Safari
pcx_content_type: troubleshooting
products:
- cache
description: Learn how to resolve issues with MP4 videos not playing on iOS and Safari.
description: Resolve MP4 videos not playing on iOS and Safari.
sidebar:
order: 2
---

import { DashButton } from "~/components";

When traffic is proxied through Cloudflare, Safari on macOS and iOS devices may fail to load MP4 video files.
When traffic is proxied through Cloudflare, Safari on macOS and iOS devices may fail to load MP4 video files. Videos may not play at all or may display as black screens.

This issue occurs because Safari handles HTTP range requests differently than other browsers, particularly in how it processes ETags during video streaming.
### Why this happens

Safari and iOS devices rely on HTTP range requests to support video features such as seeking to specific timestamps and resuming interrupted downloads.
Safari relies on HTTP range requests to stream video. Range requests allow the browser to fetch specific portions of a file — for example, when a user seeks to a particular timestamp or resumes an interrupted download.

When Cloudflare's caching layer processes these range requests with weak ETags, Safari may reject the cached response entirely, resulting in videos that fail to load or display as black screens.
To verify that each portion belongs to the same file, Safari checks the [ETag header](/cache/reference/etag-headers/) in the response. Safari requires a strong ETag (an exact byte-for-byte identifier) for range requests. By default, Cloudflare may convert strong ETags to weak ETags during caching. When Safari receives a weak ETag in a range response, it can reject the response entirely.

To resolve this issue, configure two cache rules in the following order.
### Resolution

To resolve this issue, configure two [cache rules](/cache/how-to/cache-rules/) in the following order. The first rule ensures Cloudflare preserves strong ETags for MP4 files. The second rule bypasses cache so that range requests go directly to your origin server, which avoids serving cached responses with potentially mismatched ETags.

The first rule must appear above the second rule in the Cache Rules list. Cloudflare evaluates cache rules in order, and the strong ETag setting from the first rule applies even though the second rule ultimately bypasses the cache.

## 1. Create the strong ETags rule

Expand Down Expand Up @@ -49,8 +53,4 @@ Create another cache rule that applies to all MP4 files and bypasses cache entir
5. Select **Bypass cache** in the **Cache eligibility** section.
6. Select **Last** as **Place at**.

## Why this order matters

The first rule preserves strong ETags for MP4 files, which satisfies Safari's requirements for range request handling. The second rule bypasses cache so that Cloudflare forwards range requests to the origin server instead of serving cached responses with potentially mismatched ETags.

The first rule must appear above the second rule in the Cache Rules list.
Loading