Conversation
…length - New FileCacheEx module alongside original FileCache (non-invasive) - Configurable max_header_length (was hardcoded HTTP_HEADER_MAX_LENGTH) - Configurable max_file_size, max_cache_num at runtime - prepend_header() returns bool to report success/failure - Expose header metrics: get_header_reserve(), get_header_used(), header_fits() - Fix stat() name collision in is_modified() using ::stat() - Add FileCacheEx.h to exported headers in cmake/vars.cmake
- Add per-entry mutex to file_cache_ex_s for thread-safe mutations - Hold fc->mutex during is_modified() / stat check to prevent data race - Hold fc->mutex during resize_buf() / read to prevent use-after-free - Make prepend_header() thread-safe with lock_guard - Defer put() into LRU cache until entry is fully initialized (prevents stale/incomplete cache entries on ERR_OVER_LIMIT) - Use read() loop to handle partial reads (EINTR) - Invalidate httpbuf pointers after resize_buf() reallocation - Thread-safe get_header_used() / get_header_remaining() accessors
- Revert _read/_close to plain read/close (POSIX compat via <io.h>) - Keep is_modified() using ::stat() (only called on non-Windows path) - Consistent with original FileCache.cpp Windows handling patterns
- Add docs/FileCacheEx.md (English API documentation) - Add docs/cn/FileCacheEx.md (Chinese API documentation) - Fix include comments to follow libhv convention (add 'import' keyword)
Prevent integer overflow when reserved < 0 is passed to resize_buf(), which would cause (size_t)reserved to become a very large value.
There was a problem hiding this comment.
Pull request overview
This PR introduces FileCacheEx as an enhanced replacement for the HTTP server’s existing FileCache, aiming to avoid silent HTTP header truncation, improve thread-safety, and make cache limits configurable at runtime.
Changes:
- Replace HTTP server usage of
FileCachewith the newFileCacheEximplementation. - Add
FileCacheExmodule with configurable header reserve/max file size, safer buffer resizing, deferred cache insertion, and partial-read handling. - Add English/Chinese documentation and install the new header via CMake.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| http/server/HttpServer.cpp | Switch server privdata cache type to FileCacheEx and adjust timer callback casts. |
| http/server/HttpHandler.h | Replace FileCache pointers/types with FileCacheEx equivalents. |
| http/server/HttpHandler.cpp | Use FileCacheEx::OpenParam; adjust max-file-size eviction logic to runtime getter. |
| http/server/FileCacheEx.h | New enhanced cache entry + cache class API (mutex per entry, header reserve/metrics). |
| http/server/FileCacheEx.cpp | New cache implementation (deferred put, read loop, configurable reserve/limits). |
| docs/cn/FileCacheEx.md | Add Chinese documentation for FileCacheEx. |
| docs/FileCacheEx.md | Add English documentation for FileCacheEx. |
| cmake/vars.cmake | Add FileCacheEx.h to installed HTTP server headers list. |
Comments suppressed due to low confidence (1)
http/server/HttpHandler.cpp:806
fc->prepend_header(...)now returnsbool, but the return value is ignored. If the generated response header exceeds the reserved space,prepend_headerreturns false and leaveshttpbufunchanged (potentially stale/NULL), so this code can send an incorrect header or invalid buffer. Please handle the failure path (e.g., fall back to sending header and file content separately, or rebuild a contiguous buffer) instead of unconditionally usingfc->httpbuf.
header = pResp->Dump(true, false);
fc->prepend_header(header.c_str(), header.size());
*data = fc->httpbuf.base;
*len = fc->httpbuf.len;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
http/server/FileCacheEx.cpp
Outdated
| // Loop to handle partial reads (EINTR, etc.) | ||
| char* dst = fc->filebuf.base; | ||
| size_t remaining = fc->filebuf.len; | ||
| while (remaining > 0) { | ||
| int nread = read(fd, dst, remaining); | ||
| if (nread <= 0) { | ||
| hloge("Failed to read file: %s", filepath); | ||
| param->error = ERR_READ_FILE; | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
The file read loop uses int nread = read(...) and treats any nread <= 0 as fatal. For POSIX read, the return type is ssize_t, and -1 with errno == EINTR should be retried rather than failing the request. Please switch to ssize_t and explicitly handle EINTR (and any other retryable errors you want) inside the loop.
http/server/FileCacheEx.cpp
Outdated
| fd = open(filepath, flags); | ||
| #endif | ||
| if (fd < 0) { | ||
| param->error = ERR_OPEN_FILE; | ||
| return NULL; | ||
| } | ||
| defer(if (fd > 0) { close(fd); }) | ||
| if (fc == NULL) { |
There was a problem hiding this comment.
defer(if (fd > 0) { close(fd); }) can leak a valid file descriptor if open() returns 0 (possible if stdin is closed). Since this is new code, it would be better to close any fd >= 0 that actually represents an opened file, and avoid using 0 as a directory sentinel (or use a separate boolean) to keep the close logic correct.
http/server/FileCacheEx.cpp
Outdated
| void FileCacheEx::RemoveExpiredFileCache() { | ||
| time_t now = time(NULL); | ||
| remove_if([this, now](const std::string& filepath, const file_cache_ex_ptr& fc) { | ||
| return (now - fc->stat_time > expired_time); |
There was a problem hiding this comment.
RemoveExpiredFileCache reads fc->stat_time without taking the per-entry mutex. Since Open() updates stat_time under fc->mutex, this is a data race and also contradicts the thread-safety guarantees described in the docs. Please lock fc->mutex (or make stat_time atomic) when evaluating expiry in the remove_if predicate.
| return (now - fc->stat_time > expired_time); | |
| std::lock_guard<std::mutex> lock(fc->mutex); | |
| time_t stat_time = fc->stat_time; | |
| return (now - stat_time > expired_time); |
http/server/FileCacheEx.h
Outdated
| int get_header_reserve() const { return header_reserve; } | ||
| int get_header_used() const { std::lock_guard<std::mutex> lock(mutex); return header_used; } | ||
| int get_header_remaining() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve - header_used; } | ||
| bool header_fits(int len) const { return len > 0 && len <= header_reserve; } |
There was a problem hiding this comment.
These are labeled as "thread-safe accessors", but get_header_reserve() and header_fits() read header_reserve without locking, even though resize_buf() can modify header_reserve under the mutex. This introduces a data race and makes the thread-safety claim inaccurate. Please either take the mutex in these accessors as well, or document them as requiring the caller to hold mutex.
| int get_header_reserve() const { return header_reserve; } | |
| int get_header_used() const { std::lock_guard<std::mutex> lock(mutex); return header_used; } | |
| int get_header_remaining() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve - header_used; } | |
| bool header_fits(int len) const { return len > 0 && len <= header_reserve; } | |
| int get_header_reserve() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve; } | |
| int get_header_used() const { std::lock_guard<std::mutex> lock(mutex); return header_used; } | |
| int get_header_remaining() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve - header_used; } | |
| bool header_fits(int len) const { std::lock_guard<std::mutex> lock(mutex); return len > 0 && len <= header_reserve; } |
http/server/FileCacheEx.h
Outdated
| * 1. Configurable max_header_length (no more hardcoded 4096) | ||
| * 2. prepend_header() returns bool to report success/failure | ||
| * 3. Exposes header/buffer metrics via accessors | ||
| * 4. Fixes stat() name collision in is_modified() | ||
| * 5. max_cache_num / max_file_size configurable at runtime | ||
| * 6. Reserved header space can be tuned per-instance | ||
| * 7. Fully backward-compatible struct layout |
There was a problem hiding this comment.
The header comment says "Configurable max_header_length (no more hardcoded 4096)" and "Fully backward-compatible struct layout", but FILECACHE_EX_DEFAULT_HEADER_LENGTH is still 4096 by default and the struct layout is not compatible with file_cache_t (added mutex + new fields). Please clarify this comment to avoid misleading readers about ABI/layout compatibility and defaults.
| * 1. Configurable max_header_length (no more hardcoded 4096) | |
| * 2. prepend_header() returns bool to report success/failure | |
| * 3. Exposes header/buffer metrics via accessors | |
| * 4. Fixes stat() name collision in is_modified() | |
| * 5. max_cache_num / max_file_size configurable at runtime | |
| * 6. Reserved header space can be tuned per-instance | |
| * 7. Fully backward-compatible struct layout | |
| * 1. Configurable max_header_length (default 4096, tunable per-instance) | |
| * 2. prepend_header() returns bool to report success/failure | |
| * 3. Exposes header/buffer metrics via accessors | |
| * 4. Fixes stat() name collision in is_modified() | |
| * 5. max_cache_num / max_file_size configurable at runtime | |
| * 6. Reserved header space can be tuned per-instance | |
| * 7. Source-level API compatible; struct layout differs from file_cache_t (no ABI/layout compatibility) |
| set(HTTP_SERVER_HEADERS | ||
| http/server/HttpServer.h | ||
| http/server/HttpService.h | ||
| http/server/HttpContext.h | ||
| http/server/HttpResponseWriter.h | ||
| http/server/WebSocketServer.h | ||
| http/server/FileCacheEx.h | ||
| ) |
There was a problem hiding this comment.
FileCacheEx.h is added to the CMake installed header list, but the repo also has a Makefile-based install flow (Makefile.vars) with its own HTTP_SERVER_HEADERS list. To keep both build systems consistent, please add http/server/FileCacheEx.h to Makefile.vars as well (otherwise make install won’t install the new public header).
|
直接删除原来的FileCache,然后重命名FileCacheEx为FileCache吧,没必要还保留原来的实现了 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
http/server/FileCache.cpp:76
- In the slow path you
stat()into localst, but for already-cached entries you never assignfc->st = stunderfc->mutexbefore later usingfc->st(size/type) whenparam->need_readis true. If the file changes withinstat_interval(somodifiedstays false), a follow-up GET can re-read using a stalefc->st.st_sizeand mis-handle reads/ETag. Consider updatingfc->stfrom the freshly obtainedstwhenever you take this path (especially when re-reading content).
if (fc == NULL || modified || param->need_read) {
struct stat st;
int flags = O_RDONLY;
#ifdef O_BINARY
flags |= O_BINARY;
#endif
int fd = -1;
#ifdef OS_WIN
if (wfilepath.empty()) wfilepath = hv::utf8_to_wchar(filepath);
if (_wstat(wfilepath.c_str(), (struct _stat*)&st) != 0) {
param->error = ERR_OPEN_FILE;
return NULL;
}
if (S_ISREG(st.st_mode)) {
fd = _wopen(wfilepath.c_str(), flags);
} else if (S_ISDIR(st.st_mode)) {
fd = 0;
}
#else
if (::stat(filepath, &st) != 0) {
param->error = ERR_OPEN_FILE;
return NULL;
}
fd = open(filepath, flags);
#endif
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bool prepend_header(const char* header, int len) { | ||
| std::lock_guard<std::mutex> lock(mutex); | ||
| if (len <= 0 || len > header_reserve) return false; | ||
| httpbuf.base = filebuf.base - len; | ||
| httpbuf.len = len + filebuf.len; | ||
| httpbuf.len = (size_t)len + filebuf.len; |
There was a problem hiding this comment.
file_cache_t::prepend_header now returns false when the header doesn’t fit, but it leaves httpbuf unchanged (and resize_buf() explicitly invalidates httpbuf). Callers that still read fc->httpbuf unconditionally can end up sending an empty/invalid response. Consider either (1) always setting httpbuf to a safe fallback (e.g., point at filebuf / clear and document) on failure, and/or (2) requiring/updating callers to check the return value and fall back to non-cached header+body sending.
http/server/FileCache.h
Outdated
| void SetMaxHeaderLength(int len) { max_header_length = len; } | ||
| void SetMaxFileSize(int size) { max_file_size = size; } |
There was a problem hiding this comment.
SetMaxHeaderLength / SetMaxFileSize accept negative values, which can lead to surprising behavior (e.g., reserved header space clamped to 0 at resize time, or max_file_size < 0). It would be safer to validate/clamp in the setters (e.g., minimum 0/1) so misconfiguration fails fast and predictably.
| void SetMaxHeaderLength(int len) { max_header_length = len; } | |
| void SetMaxFileSize(int size) { max_file_size = size; } | |
| void SetMaxHeaderLength(int len) { max_header_length = len < 0 ? 0 : len; } | |
| void SetMaxFileSize(int size) { max_file_size = size < 1 ? 1 : size; } |
http/server/FileCache.cpp
Outdated
| gmtime_fmt(fc->st.st_mtime, fc->last_modified); | ||
| snprintf(fc->etag, sizeof(fc->etag), ETAG_FMT, (size_t)fc->st.st_mtime, (size_t)fc->st.st_size); | ||
| snprintf(fc->etag, sizeof(fc->etag), ETAG_FMT, | ||
| (size_t)fc->st.st_mtime, (size_t)fc->st.st_size); | ||
| // Cache the fully initialized entry | ||
| put(filepath, fc); |
There was a problem hiding this comment.
Potential deadlock: Open() holds fc->mutex for the remainder of initialization and then calls put(filepath, fc), which acquires the LRUCache internal mutex. Meanwhile RemoveExpiredFileCache() acquires the LRUCache mutex (via remove_if) and then locks fc->mutex. This lock-order inversion can deadlock under concurrent traffic. Avoid calling put/remove/contains/remove_if while holding fc->mutex, or change the expiration logic to avoid locking fc->mutex while LRUCache is locked (e.g., snapshot fields needed for eviction without taking fc->mutex, or collect keys to remove and erase them after releasing the cache lock).
http/server/FileCache.cpp
Outdated
| void FileCache::RemoveExpiredFileCache() { | ||
| time_t now = time(NULL); | ||
| remove_if([this, now](const std::string& filepath, const file_cache_ptr& fc) { | ||
| std::lock_guard<std::mutex> lock(fc->mutex); |
There was a problem hiding this comment.
remove_if runs the predicate while holding the LRUCache internal mutex; locking fc->mutex inside the predicate introduces a lock-order inversion with Open() (which locks fc->mutex and later calls put()). This can deadlock. Consider avoiding per-entry locking inside remove_if predicates (e.g., store stat_time as an atomic, or do a two-phase approach: gather expired keys under cache lock without locking entries, then remove them).
| std::lock_guard<std::mutex> lock(fc->mutex); | |
| std::unique_lock<std::mutex> lock(fc->mutex, std::try_to_lock); | |
| if (!lock.owns_lock()) { | |
| return false; | |
| } |
http/server/FileCache.cpp
Outdated
| #ifdef OS_WIN | ||
| defer(if (fd > 0) { close(fd); }) // fd=0 is Windows directory sentinel | ||
| #else |
There was a problem hiding this comment.
On Windows, using fd=0 as a directory sentinel combined with defer(if (fd > 0) close(fd);) can leak a real file descriptor if _wopen() happens to return 0 (possible if stdin is closed). Consider using a non-valid sentinel (e.g., -1/-2) for directories or tracking is_dir separately, and close any real fd with fd >= 0 when appropriate.
| // NOTE: remove file cache if > max_file_size | ||
| if (fc && fc->filebuf.len > files->GetMaxFileSize()) { | ||
| files->Close(fc->filepath.c_str()); | ||
| } |
There was a problem hiding this comment.
files->GetMaxFileSize() is used here to decide whether to evict the cached entry, but FileCache::max_file_size is never wired up to HttpService::max_file_cache_size (which controls OpenParam.max_read). If max_file_cache_size is increased, entries may still be evicted immediately due to the default 4MB FileCache::max_file_size. Consider initializing FileCache::max_file_size from service->max_file_cache_size when the server starts (same place stat_interval/expired_time are configured) so caching behavior is consistent.
| // NOTE: remove file cache if > max_file_size | |
| if (fc && fc->filebuf.len > files->GetMaxFileSize()) { | |
| files->Close(fc->filepath.c_str()); | |
| } | |
| // Avoid immediately evicting the just-served cached file based on a | |
| // potentially stale FileCache max_file_size setting. Cache size policy | |
| // should be enforced where the cache is configured/populated so it stays | |
| // consistent with the service-level max_file_cache_size. |
…add max_file_size setter
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // FileCache | ||
| // NOTE: no copy filebuf, more efficient | ||
| header = pResp->Dump(true, false); | ||
| fc->prepend_header(header.c_str(), header.size()); | ||
| *data = fc->httpbuf.base; | ||
| *len = fc->httpbuf.len; | ||
| state = SEND_DONE; | ||
| return *len; | ||
| if (fc->prepend_header(header.c_str(), header.size())) { | ||
| *data = fc->httpbuf.base; | ||
| *len = fc->httpbuf.len; | ||
| state = SEND_DONE; | ||
| return *len; | ||
| } |
There was a problem hiding this comment.
fc->prepend_header() writes the response header into the shared cache buffer (fc->buf) and then GetSendData() returns a pointer into that shared buffer. Since FileCache is shared across worker threads and multiple requests can serve the same file_cache_t concurrently, another thread can call prepend_header() and overwrite the header region while the current thread is still inside hio_write() copying/issuing the write, leading to a data race and potentially corrupted responses. To make this safe, avoid mutating shared cache memory for per-request headers (e.g., send header separately and then send fc->filebuf), or refactor so the entry mutex is held across the entire write/copy window (would require changing the GetSendData/SendHttpResponse interface).
| std::lock_guard<std::mutex> lock(mutex); | ||
| if (len <= 0 || len > header_reserve) { | ||
| // Safe fallback: point httpbuf at filebuf so callers always get valid data | ||
| httpbuf = filebuf; |
There was a problem hiding this comment.
On prepend_header() failure (len <= 0 || len > header_reserve), header_used is not reset. If a previous call succeeded, metrics returned by get_header_used() / get_header_remaining() will be stale/misleading after a failed prepend. Consider setting header_used = 0 in the failure path as well.
| httpbuf = filebuf; | |
| httpbuf = filebuf; | |
| header_used = 0; |
| // --- configurable parameters (were hardcoded macros before) --- | ||
| int stat_interval; // seconds between stat() checks | ||
| int expired_time; // seconds before cache entry expires | ||
| int max_header_length; // reserved header bytes per entry | ||
| int max_file_size; // max cached file size (larger = large-file path) | ||
|
|
||
| FileCache(size_t capacity = FILE_CACHE_MAX_NUM); | ||
| explicit FileCache(size_t capacity = FILE_CACHE_DEFAULT_MAX_NUM); | ||
|
|
||
| struct OpenParam { | ||
| bool need_read; | ||
| int max_read; | ||
| const char* path; | ||
| size_t filesize; | ||
| int error; | ||
| bool need_read; | ||
| int max_read; // per-request override for max file size | ||
| const char* path; // URL path (for directory listing) | ||
| size_t filesize; // [out] actual file size | ||
| int error; // [out] error code if Open returns NULL |
There was a problem hiding this comment.
max_file_size / OpenParam::max_read are declared as int, but they represent a byte size and are compared against st.st_size (typically off_t, potentially >2GB). Using int can overflow/truncate on large files and makes it harder to configure sizes beyond INT_MAX. Consider switching these fields and related APIs to size_t (or uint64_t) so large-file thresholds work correctly on 64-bit platforms.
| struct OpenParam { | ||
| bool need_read; | ||
| int max_read; | ||
| const char* path; | ||
| size_t filesize; | ||
| int error; | ||
| bool need_read; | ||
| int max_read; // per-request override for max file size | ||
| const char* path; // URL path (for directory listing) | ||
| size_t filesize; // [out] actual file size | ||
| int error; // [out] error code if Open returns NULL | ||
|
|
||
| OpenParam() { | ||
| need_read = true; | ||
| max_read = FILE_CACHE_MAX_SIZE; | ||
| max_read = FILE_CACHE_DEFAULT_MAX_FILE_SIZE; | ||
| path = "/"; | ||
| filesize = 0; | ||
| error = 0; | ||
| } |
There was a problem hiding this comment.
OpenParam::max_read defaults to FILE_CACHE_DEFAULT_MAX_FILE_SIZE, while FileCache now has a runtime-configurable max_file_size (and HttpServer sets it from service->max_file_cache_size). Callers that don't explicitly set param.max_read will silently ignore the instance’s configured max_file_size, which can lead to inconsistent caching behavior across call sites. Consider defaulting OpenParam::max_read from the owning FileCache instance (e.g., in Open() when param->max_read is unset/0), or remove one of these knobs to keep a single source of truth.
http/server/HttpServer.cpp
Outdated
| FileCache* filecache = &privdata->filecache; | ||
| filecache->stat_interval = service->file_cache_stat_interval; | ||
| filecache->expired_time = service->file_cache_expired_time; | ||
| filecache->max_file_size = service->max_file_cache_size; |
There was a problem hiding this comment.
This assigns filecache->max_file_size directly, bypassing the new SetMaxFileSize() clamping/validation logic. If service->max_file_cache_size can be 0 or negative (e.g., misconfig to disable caching), this can cause surprising behavior in later comparisons/removals. Prefer calling filecache->SetMaxFileSize(service->max_file_cache_size) here to keep invariants consistent.
| filecache->max_file_size = service->max_file_cache_size; | |
| filecache->SetMaxFileSize(service->max_file_cache_size); |
…use setter for max_file_size
|
基本问题已修复 2、int 溢出(>2GB 文件) 3、OpenParam::max_read 默认值与实例配置不一致 |
背景
我加入了
FileCacheEx作为FileCache的增强版。最初发现的问题,是 HTTP 响应头超过 1KB(比如多个
Set-Cookie、Content-Security-Policy、CORS 头等)时,FileCache会静默截断,于是我加入了FileCacheEx用来正确处理和错误报错提示。虽然可以修改源码来增加 HEADER 长度,但是需要重新编译 libhv,不方便自动化引入。
FileCacheEx vs FileCache 对比
HTTP_HEADER_MAX_LENGTH 1024)prepend_header()bool,溢出返回falseis_modified()stat()与 POSIXstat函数名冲突::stat()修复命名冲突SetMaxFileSize()get_header_used()/get_header_remaining()std::mutex,prepend_header()和指标读取加锁resize_buf()reserved参数解决可能存在的问题
1:
stat()函数名冲突struct stat st是成员变量,而stat()是 POSIX 函数。在某些编译器下非限定调用stat()可能会解析到成员自身,可能导致编译错误或非预期行为。处理方式:使用
::stat()显式指定全局作用域。2:
read()短读未处理POSIX
read()不保证一次读完所有数据(信号中断EINTR等都可能导致短读)。大多数情况下单次调用没问题,但特定条件下可能返回不完整数据。处理方式:循环读取直到全部读完:
3:缓存条目在初始化完成前就加入 LRU
在多线程场景下,如果另一个线程在
put()之后、文件读取完成之前调用Get(),可能会取到一个内容为空的缓存条目。处理方式:延迟
put()到条目完全初始化之后(文件读取 + ETag + Content-Type 全部设好再入缓存)。4:
resize_buf()后httpbuf可能悬空HBuf::resize()可能重新分配内存,如果httpbuf.base之前被设置过,可能变成悬空指针。处理方式:
resize_buf()之后显式置空httpbuf:5:
prepend_header()静默失败header 超过预留空间时静默丢弃,调用方无法感知。在响应头较多的场景下(多个 Set-Cookie、CSP、CORS 等),可能导致响应发送不完整。
处理方式:返回
bool,失败返回false,调用方可以回退到非缓冲发送。6:所有参数编译期固定,无法运行时调整
处理方式:运行时可配置:
默认头部预留从 1024 提升到 4096 字节。