Cachew is a server and tooling for incredibly efficient, protocol-aware caching. It is designed to be used at scale, with minimal impact on upstream systems. By "protocol-aware", we mean that the proxy isn't just a naive HTTP proxy, it is aware of the higher level protocol being proxied (Git, Docker, etc.) and can make more efficient decisions.
Git causes a number of problems for us, but the most obvious are:
- Rate limiting by service providers.
git cloneis very slow, even discounting network overhead
To solve this we apply two different strategies on the server:
- Periodic full
.tar.zstsnapshots of the repository. These snapshots restore 4-5x faster thangit clone. - Passthrough caching of the packs returned by
POST /repo.git/git-upload-packto support incremental pulls.
On the client we redirect git to the proxy:
[url "https://cachew.local/github/"]
insteadOf = https://github.com/As Git itself isn't aware of the snapshots, Git-specific code in the Cachew CLI can be used to reconstruct a repository.
Cachew uses Open Policy Agent (OPA) for request authorization. A default policy is
always active even without any configuration, allowing any request from 127.0.0.1 and GET and HEAD requests from
elsewhere.
To customise the policy, add an opa block to your configuration with either an inline policy or a path to a .rego file:
# Inline policy
opa {
policy = <<EOF
package cachew.authz
default allow := false
allow if input.method == "GET"
allow if input.method == "HEAD"
allow if { input.method == "POST"; input.path[0] == "api" }
EOF
}
# Or reference an external file
opa {
policy-file = "./policy.rego"
}Policies must be written under package cachew.authz and define a boolean allow rule. The input document available to policies contains:
| Field | Type | Description |
|---|---|---|
input.method |
string | HTTP method (GET, POST, etc.) |
input.path |
[]string | URL path split by / (e.g. ["api", "v1", "object"]) |
input.headers |
map[string]string | Request headers (lowercased keys) |
input.remote_addr |
string | Client address (ip:port) |
Since remote_addr includes the port, use startswith to match by IP:
allow if startswith(input.remote_addr, "127.0.0.1:")Policies can reference external data that becomes available as data.* in Rego. Provide it inline via data or from a file via data-file:
# Inline JSON data
opa {
policy-file = "./policy.rego"
data = <<EOF
{"allowed_cidrs": ["10.0.0.0/8"], "jwks": {"keys": [...]}}
EOF
}
# Or from a file
opa {
policy-file = "./policy.rego"
data-file = "./opa-data.json"
}{"allowed_cidrs": ["10.0.0.0/8"], "jwks": {"keys": [...]}}package cachew.authz
default allow := false
allow if net.cidr_contains(data.allowed_cidrs[_], input.remote_addr)If data-file is not set, data.* is empty but policies can still use http.send to fetch data at evaluation time.
Caches Hermit package downloads from all sources (golang.org, npm, GitHub releases, etc.).
URL pattern: /hermit/{host}/{path...}
Example: GET /hermit/golang.org/dl/go1.21.0.tar.gz
GitHub releases are automatically redirected to the github-releases strategy.