Skip to content

FastHTTP migration and performance improvements#5

Merged
jkyberneees merged 17 commits intomainfrom
perf/preload-and-fast-path
Mar 8, 2026
Merged

FastHTTP migration and performance improvements#5
jkyberneees merged 17 commits intomainfrom
perf/preload-and-fast-path

Conversation

@jkyberneees
Copy link
Contributor

No description provided.

…e, and GC tuning

- Add cache.Preload() to read all eligible files into LRU at startup
- Replace http.ServeContent with direct w.Write() for non-Range cache hits
- Pre-format Content-Type and Content-Length header slices on CachedFile
- Add sync.Map-based PathCache to eliminate per-request EvalSymlinks
- Pre-warm path cache from preloaded file paths
- Add gc_percent config option to tune Go runtime GOGC
- Add --preload, --gc-percent, --benchmark-mode CLI flags
- Add STATIC_CACHE_PRELOAD and STATIC_CACHE_GC_PERCENT env vars
- SIGHUP now flushes both file cache and path-safety cache
- Add minimal benchmark handler for isolated throughput testing

Production mode with preload + GC 400 reaches ~137k req/sec on Apple
M2 Pro, beating Bun native static serve (~129k) by ~6%.
- README: new end-to-end benchmarks (137k req/sec), preload/gc_percent
  config and env vars, updated architecture diagram with path cache
- CLI.md: add --preload and --gc-percent to flag reference
- USER_GUIDE.md: new 'Preloading for Maximum Performance' section,
  GC tuning guide, Docker preload example, updated SIGHUP docs
- Landing page: hero stats show 137k req/sec, benchmark table replaces
  old Docker numbers with localhost results (beats Bun), updated feature
  cards, config tabs, structured data, and meta tags
--benchmark-mode produced identical throughput (~76k req/sec) to
--preload --gc-percent 400 in bare-metal benchmarks. Remove the
dedicated minimal handler, its tests, all CLI flag wiring, and the
benchmark-mode test from baremetal.sh. The production preload path
is now the single optimised configuration.
Replace inflated 137k req/sec claims with honest reproducible numbers:
static-web ~76k req/sec, Bun ~90k req/sec (Apple M-series, bombardier
-c 50 -n 100000). Update landing page meta tags, hero stats, benchmark
table, feature cards, structured data, README, and USER_GUIDE.
Replace the entire net/http stack with valyala/fasthttp for ~85% higher
throughput on cached static file serving (140k req/s vs 76k on net/http).

Key changes:
- http.Handler/HandlerFunc → fasthttp.RequestHandler throughout
- http.ResponseWriter + *http.Request → single *fasthttp.RequestCtx
- http.Server → fasthttp.Server with Serve(ln) / ShutdownWithContext
- Custom Range request impl (parseRange/serveRange) replaces http.ServeContent
- Compress middleware becomes post-processing (compress body after handler)
- Security middleware uses manual status/body writes to preserve headers
  (fasthttp ctx.Error resets all headers)
- CachedFile header fields changed from []string to string
- Listener uses tcp4 to match fasthttp internals and avoid dual-stack overhead
- net/http retained only for http.DetectContentType (standalone utility)

Benchmark (bare-metal, 100 conns, 100k reqs, preload+gc400):
  static-web (fasthttp): 140,662 req/s  p50=619µs  p99=2.46ms  469 MB/s
  Bun native static:      90,346 req/s  p50=1.05ms p99=2.33ms  306 MB/s

Dependencies added: fasthttp v1.69.0, andybalholm/brotli v1.2.0,
klauspost/compress v1.18.2, valyala/bytebufferpool v1.0.0
…ate baremetal script

Add standalone hello-world servers for fasthttp (140k req/s) and net/http
(74k req/s) to establish raw HTTP-layer baselines independent of the
application. Update baremetal.sh to use a pre-built binary instead of
rebuilding from source on each run.
- Update benchmark numbers: ~141k req/sec (55% faster than Bun)
- Update architecture diagram: post-processing compress, ctx.SetBody() fast path
- Update key design decisions: fasthttp engine, tcp4 listener, custom Range impl
- Update DoS mitigations: MaxRequestBodySize replaces MaxHeaderBytes
- Update landing page (docs/index.html): hero stats, meta tags, JSON-LD, perf cards
- Update USER_GUIDE.md: preload section, GC tuning notes
- Add CHANGELOG.md v1.3.0 entry with full migration details
… such field)

fasthttp uses a single ReadTimeout covering the full read phase (headers +
body). The separate read_header_timeout config field was parsed and defaulted
but never applied to the server — dead code since the fasthttp migration.

- Remove struct field, default, and env override from config.go
- Remove assertion from config_test.go
- Remove from both config.toml.example files
- Update README.md, USER_GUIDE.md, docs/index.html: remove setting from
  config tables and env vars tables, note ReadTimeout provides Slowloris
  protection
@jkyberneees jkyberneees merged commit 9682183 into main Mar 8, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant