Skip to content

feat: Enhance filecache#823

Open
Yundi339 wants to merge 11 commits intoithewei:masterfrom
Yundi339:feature/enhanced-filecache
Open

feat: Enhance filecache#823
Yundi339 wants to merge 11 commits intoithewei:masterfrom
Yundi339:feature/enhanced-filecache

Conversation

@Yundi339
Copy link
Copy Markdown

代码方面,我提供了技术方向讨论和指导,实现由Claude辅助完成,PR仅供参考,若不合适可随时close。

背景

我加入了 FileCacheEx 作为 FileCache 的增强版。
最初发现的问题,是 HTTP 响应头超过 1KB(比如多个 Set-CookieContent-Security-Policy、CORS 头等)时, FileCache 会静默截断,于是我加入了 FileCacheEx 用来正确处理和错误报错提示。
虽然可以修改源码来增加 HEADER 长度,但是需要重新编译 libhv,不方便自动化引入。


FileCacheEx vs FileCache 对比

差异 FileCache FileCacheEx
header 预留空间 硬编码 1KB (HTTP_HEADER_MAX_LENGTH 1024) 可配置,默认 4KB
prepend_header() 无返回值,溢出时静默丢弃 返回 bool,溢出返回 false
is_modified() 调用 stat() 与 POSIX stat 函数名冲突 使用 ::stat() 修复命名冲突
最大缓存文件 硬编码 4MB 运行时可配 SetMaxFileSize()
最大缓存数量 硬编码 100 运行时可配(构造函数参数)
header 指标 暴露 get_header_used() / get_header_remaining()
线程安全 无 mutex 每个 entry 有 std::mutexprepend_header() 和指标读取加锁
resize_buf() 无法调整 header 预留 支持自定义 reserved 参数

解决可能存在的问题

1:stat() 函数名冲突

// file_cache_s::is_modified()
bool is_modified() {
    time_t mtime = st.st_mtime;
    stat(filepath.c_str(), &st);   // ← 成员变量 st 遮蔽了 POSIX stat()
    return mtime != st.st_mtime;
}

struct stat st 是成员变量,而 stat() 是 POSIX 函数。在某些编译器下非限定调用 stat() 可能会解析到成员自身,可能导致编译错误或非预期行为。

处理方式:使用 ::stat() 显式指定全局作用域。


2:read() 短读未处理

// FileCache::Open()
int nread = read(fd, fc->filebuf.base, fc->filebuf.len);
if (nread != fc->filebuf.len) {   // ← 只调一次 read,可能短读

POSIX read() 不保证一次读完所有数据(信号中断 EINTR 等都可能导致短读)。大多数情况下单次调用没问题,但特定条件下可能返回不完整数据。

处理方式:循环读取直到全部读完:

while (remaining > 0) {
    int nread = read(fd, dst, remaining);
    if (nread <= 0) { /* 错误处理 */ }
    dst += nread;
    remaining -= nread;
}

3:缓存条目在初始化完成前就加入 LRU

// FileCache::Open()
fc = std::make_shared<file_cache_t>();
put(filepath, fc);   // ← 立即放入缓存
// ... 后面才读文件内容、设置 ETag 等

在多线程场景下,如果另一个线程在 put() 之后、文件读取完成之前调用 Get(),可能会取到一个内容为空的缓存条目。

处理方式:延迟 put() 到条目完全初始化之后(文件读取 + ETag + Content-Type 全部设好再入缓存)。


4:resize_buf()httpbuf 可能悬空

// file_cache_s::resize_buf()
void resize_buf(int filesize) {
    buf.resize(HTTP_HEADER_MAX_LENGTH + filesize);
    filebuf.base = buf.base + HTTP_HEADER_MAX_LENGTH;
    filebuf.len = filesize;
    // httpbuf.base 仍然指向旧的(已释放的)内存!
}

HBuf::resize() 可能重新分配内存,如果 httpbuf.base 之前被设置过,可能变成悬空指针。

处理方式resize_buf() 之后显式置空 httpbuf

httpbuf.base = NULL;
httpbuf.len = 0;
header_used = 0;

5:prepend_header() 静默失败

void prepend_header(const char* header, int len) {
    if (len > HTTP_HEADER_MAX_LENGTH) return;  // ← 返回 void,调用方不知道失败了

header 超过预留空间时静默丢弃,调用方无法感知。在响应头较多的场景下(多个 Set-Cookie、CSP、CORS 等),可能导致响应发送不完整。

处理方式:返回 bool,失败返回 false,调用方可以回退到非缓冲发送。


6:所有参数编译期固定,无法运行时调整

#define HTTP_HEADER_MAX_LENGTH  1024       // 生产环境常常不够用
#define FILE_CACHE_MAX_SIZE     (1 << 22)  // 4MB,不可调
  • 1024 字节的 HTTP 头预留空间,在添加 CSP、CORS、缓存控制、Cookie 等响应头后可能不够用
  • 最大缓存文件大小无法根据服务器内存状况调整

处理方式:运行时可配置:

filecache.SetMaxHeaderLength(8192);   // 8K 头部预留
filecache.SetMaxFileSize(1 << 24);    // 16MB 最大缓存文件

默认头部预留从 1024 提升到 4096 字节。

…length

- New FileCacheEx module alongside original FileCache (non-invasive)
- Configurable max_header_length (was hardcoded HTTP_HEADER_MAX_LENGTH)
- Configurable max_file_size, max_cache_num at runtime
- prepend_header() returns bool to report success/failure
- Expose header metrics: get_header_reserve(), get_header_used(), header_fits()
- Fix stat() name collision in is_modified() using ::stat()
- Add FileCacheEx.h to exported headers in cmake/vars.cmake
- Add per-entry mutex to file_cache_ex_s for thread-safe mutations
- Hold fc->mutex during is_modified() / stat check to prevent data race
- Hold fc->mutex during resize_buf() / read to prevent use-after-free
- Make prepend_header() thread-safe with lock_guard
- Defer put() into LRU cache until entry is fully initialized
  (prevents stale/incomplete cache entries on ERR_OVER_LIMIT)
- Use read() loop to handle partial reads (EINTR)
- Invalidate httpbuf pointers after resize_buf() reallocation
- Thread-safe get_header_used() / get_header_remaining() accessors
- Revert _read/_close to plain read/close (POSIX compat via <io.h>)
- Keep is_modified() using ::stat() (only called on non-Windows path)
- Consistent with original FileCache.cpp Windows handling patterns
- Add docs/FileCacheEx.md (English API documentation)
- Add docs/cn/FileCacheEx.md (Chinese API documentation)
- Fix include comments to follow libhv convention (add 'import' keyword)
Prevent integer overflow when reserved < 0 is passed to resize_buf(),
which would cause (size_t)reserved to become a very large value.
Copilot AI review requested due to automatic review settings March 29, 2026 09:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces FileCacheEx as an enhanced replacement for the HTTP server’s existing FileCache, aiming to avoid silent HTTP header truncation, improve thread-safety, and make cache limits configurable at runtime.

Changes:

  • Replace HTTP server usage of FileCache with the new FileCacheEx implementation.
  • Add FileCacheEx module with configurable header reserve/max file size, safer buffer resizing, deferred cache insertion, and partial-read handling.
  • Add English/Chinese documentation and install the new header via CMake.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
http/server/HttpServer.cpp Switch server privdata cache type to FileCacheEx and adjust timer callback casts.
http/server/HttpHandler.h Replace FileCache pointers/types with FileCacheEx equivalents.
http/server/HttpHandler.cpp Use FileCacheEx::OpenParam; adjust max-file-size eviction logic to runtime getter.
http/server/FileCacheEx.h New enhanced cache entry + cache class API (mutex per entry, header reserve/metrics).
http/server/FileCacheEx.cpp New cache implementation (deferred put, read loop, configurable reserve/limits).
docs/cn/FileCacheEx.md Add Chinese documentation for FileCacheEx.
docs/FileCacheEx.md Add English documentation for FileCacheEx.
cmake/vars.cmake Add FileCacheEx.h to installed HTTP server headers list.
Comments suppressed due to low confidence (1)

http/server/HttpHandler.cpp:806

  • fc->prepend_header(...) now returns bool, but the return value is ignored. If the generated response header exceeds the reserved space, prepend_header returns false and leaves httpbuf unchanged (potentially stale/NULL), so this code can send an incorrect header or invalid buffer. Please handle the failure path (e.g., fall back to sending header and file content separately, or rebuild a contiguous buffer) instead of unconditionally using fc->httpbuf.
                header = pResp->Dump(true, false);
                fc->prepend_header(header.c_str(), header.size());
                *data = fc->httpbuf.base;
                *len = fc->httpbuf.len;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +111 to +120
// Loop to handle partial reads (EINTR, etc.)
char* dst = fc->filebuf.base;
size_t remaining = fc->filebuf.len;
while (remaining > 0) {
int nread = read(fd, dst, remaining);
if (nread <= 0) {
hloge("Failed to read file: %s", filepath);
param->error = ERR_READ_FILE;
return NULL;
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file read loop uses int nread = read(...) and treats any nread <= 0 as fatal. For POSIX read, the return type is ssize_t, and -1 with errno == EINTR should be retried rather than failing the request. Please switch to ssize_t and explicitly handle EINTR (and any other retryable errors you want) inside the loop.

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +82
fd = open(filepath, flags);
#endif
if (fd < 0) {
param->error = ERR_OPEN_FILE;
return NULL;
}
defer(if (fd > 0) { close(fd); })
if (fc == NULL) {
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer(if (fd > 0) { close(fd); }) can leak a valid file descriptor if open() returns 0 (possible if stdin is closed). Since this is new code, it would be better to close any fd >= 0 that actually represents an opened file, and avoid using 0 as a directory sentinel (or use a separate boolean) to keep the close logic correct.

Copilot uses AI. Check for mistakes.
void FileCacheEx::RemoveExpiredFileCache() {
time_t now = time(NULL);
remove_if([this, now](const std::string& filepath, const file_cache_ex_ptr& fc) {
return (now - fc->stat_time > expired_time);
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RemoveExpiredFileCache reads fc->stat_time without taking the per-entry mutex. Since Open() updates stat_time under fc->mutex, this is a data race and also contradicts the thread-safety guarantees described in the docs. Please lock fc->mutex (or make stat_time atomic) when evaluating expiry in the remove_if predicate.

Suggested change
return (now - fc->stat_time > expired_time);
std::lock_guard<std::mutex> lock(fc->mutex);
time_t stat_time = fc->stat_time;
return (now - stat_time > expired_time);

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +107
int get_header_reserve() const { return header_reserve; }
int get_header_used() const { std::lock_guard<std::mutex> lock(mutex); return header_used; }
int get_header_remaining() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve - header_used; }
bool header_fits(int len) const { return len > 0 && len <= header_reserve; }
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are labeled as "thread-safe accessors", but get_header_reserve() and header_fits() read header_reserve without locking, even though resize_buf() can modify header_reserve under the mutex. This introduces a data race and makes the thread-safety claim inaccurate. Please either take the mutex in these accessors as well, or document them as requiring the caller to hold mutex.

Suggested change
int get_header_reserve() const { return header_reserve; }
int get_header_used() const { std::lock_guard<std::mutex> lock(mutex); return header_used; }
int get_header_remaining() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve - header_used; }
bool header_fits(int len) const { return len > 0 && len <= header_reserve; }
int get_header_reserve() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve; }
int get_header_used() const { std::lock_guard<std::mutex> lock(mutex); return header_used; }
int get_header_remaining() const { std::lock_guard<std::mutex> lock(mutex); return header_reserve - header_used; }
bool header_fits(int len) const { std::lock_guard<std::mutex> lock(mutex); return len > 0 && len <= header_reserve; }

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +14
* 1. Configurable max_header_length (no more hardcoded 4096)
* 2. prepend_header() returns bool to report success/failure
* 3. Exposes header/buffer metrics via accessors
* 4. Fixes stat() name collision in is_modified()
* 5. max_cache_num / max_file_size configurable at runtime
* 6. Reserved header space can be tuned per-instance
* 7. Fully backward-compatible struct layout
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says "Configurable max_header_length (no more hardcoded 4096)" and "Fully backward-compatible struct layout", but FILECACHE_EX_DEFAULT_HEADER_LENGTH is still 4096 by default and the struct layout is not compatible with file_cache_t (added mutex + new fields). Please clarify this comment to avoid misleading readers about ABI/layout compatibility and defaults.

Suggested change
* 1. Configurable max_header_length (no more hardcoded 4096)
* 2. prepend_header() returns bool to report success/failure
* 3. Exposes header/buffer metrics via accessors
* 4. Fixes stat() name collision in is_modified()
* 5. max_cache_num / max_file_size configurable at runtime
* 6. Reserved header space can be tuned per-instance
* 7. Fully backward-compatible struct layout
* 1. Configurable max_header_length (default 4096, tunable per-instance)
* 2. prepend_header() returns bool to report success/failure
* 3. Exposes header/buffer metrics via accessors
* 4. Fixes stat() name collision in is_modified()
* 5. max_cache_num / max_file_size configurable at runtime
* 6. Reserved header space can be tuned per-instance
* 7. Source-level API compatible; struct layout differs from file_cache_t (no ABI/layout compatibility)

Copilot uses AI. Check for mistakes.
Comment on lines 100 to 107
set(HTTP_SERVER_HEADERS
http/server/HttpServer.h
http/server/HttpService.h
http/server/HttpContext.h
http/server/HttpResponseWriter.h
http/server/WebSocketServer.h
http/server/FileCacheEx.h
)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileCacheEx.h is added to the CMake installed header list, but the repo also has a Makefile-based install flow (Makefile.vars) with its own HTTP_SERVER_HEADERS list. To keep both build systems consistent, please add http/server/FileCacheEx.h to Makefile.vars as well (otherwise make install won’t install the new public header).

Copilot uses AI. Check for mistakes.
@ithewei
Copy link
Copy Markdown
Owner

ithewei commented Mar 30, 2026

直接删除原来的FileCache,然后重命名FileCacheEx为FileCache吧,没必要还保留原来的实现了

Copilot AI review requested due to automatic review settings April 5, 2026 16:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

http/server/FileCache.cpp:76

  • In the slow path you stat() into local st, but for already-cached entries you never assign fc->st = st under fc->mutex before later using fc->st (size/type) when param->need_read is true. If the file changes within stat_interval (so modified stays false), a follow-up GET can re-read using a stale fc->st.st_size and mis-handle reads/ETag. Consider updating fc->st from the freshly obtained st whenever you take this path (especially when re-reading content).
    if (fc == NULL || modified || param->need_read) {
        struct stat st;
        int flags = O_RDONLY;
#ifdef O_BINARY
        flags |= O_BINARY;
#endif
        int fd = -1;
#ifdef OS_WIN
        if (wfilepath.empty()) wfilepath = hv::utf8_to_wchar(filepath);
        if (_wstat(wfilepath.c_str(), (struct _stat*)&st) != 0) {
            param->error = ERR_OPEN_FILE;
            return NULL;
        }
        if (S_ISREG(st.st_mode)) {
            fd = _wopen(wfilepath.c_str(), flags);
        } else if (S_ISDIR(st.st_mode)) {
            fd = 0;
        }
#else
        if (::stat(filepath, &st) != 0) {
            param->error = ERR_OPEN_FILE;
            return NULL;
        }
        fd = open(filepath, flags);
#endif

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to +94
bool prepend_header(const char* header, int len) {
std::lock_guard<std::mutex> lock(mutex);
if (len <= 0 || len > header_reserve) return false;
httpbuf.base = filebuf.base - len;
httpbuf.len = len + filebuf.len;
httpbuf.len = (size_t)len + filebuf.len;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_cache_t::prepend_header now returns false when the header doesn’t fit, but it leaves httpbuf unchanged (and resize_buf() explicitly invalidates httpbuf). Callers that still read fc->httpbuf unconditionally can end up sending an empty/invalid response. Consider either (1) always setting httpbuf to a safe fallback (e.g., point at filebuf / clear and document) on failure, and/or (2) requiring/updating callers to check the return value and fall back to non-cached header+body sending.

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +148
void SetMaxHeaderLength(int len) { max_header_length = len; }
void SetMaxFileSize(int size) { max_file_size = size; }
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetMaxHeaderLength / SetMaxFileSize accept negative values, which can lead to surprising behavior (e.g., reserved header space clamped to 0 at resize time, or max_file_size < 0). It would be safer to validate/clamp in the setters (e.g., minimum 0/1) so misconfiguration fails fast and predictably.

Suggested change
void SetMaxHeaderLength(int len) { max_header_length = len; }
void SetMaxFileSize(int size) { max_file_size = size; }
void SetMaxHeaderLength(int len) { max_header_length = len < 0 ? 0 : len; }
void SetMaxFileSize(int size) { max_file_size = size < 1 ? 1 : size; }

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +158
gmtime_fmt(fc->st.st_mtime, fc->last_modified);
snprintf(fc->etag, sizeof(fc->etag), ETAG_FMT, (size_t)fc->st.st_mtime, (size_t)fc->st.st_size);
snprintf(fc->etag, sizeof(fc->etag), ETAG_FMT,
(size_t)fc->st.st_mtime, (size_t)fc->st.st_size);
// Cache the fully initialized entry
put(filepath, fc);
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential deadlock: Open() holds fc->mutex for the remainder of initialization and then calls put(filepath, fc), which acquires the LRUCache internal mutex. Meanwhile RemoveExpiredFileCache() acquires the LRUCache mutex (via remove_if) and then locks fc->mutex. This lock-order inversion can deadlock under concurrent traffic. Avoid calling put/remove/contains/remove_if while holding fc->mutex, or change the expiration logic to avoid locking fc->mutex while LRUCache is locked (e.g., snapshot fields needed for eviction without taking fc->mutex, or collect keys to remove and erase them after releasing the cache lock).

Copilot uses AI. Check for mistakes.
void FileCache::RemoveExpiredFileCache() {
time_t now = time(NULL);
remove_if([this, now](const std::string& filepath, const file_cache_ptr& fc) {
std::lock_guard<std::mutex> lock(fc->mutex);
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_if runs the predicate while holding the LRUCache internal mutex; locking fc->mutex inside the predicate introduces a lock-order inversion with Open() (which locks fc->mutex and later calls put()). This can deadlock. Consider avoiding per-entry locking inside remove_if predicates (e.g., store stat_time as an atomic, or do a two-phase approach: gather expired keys under cache lock without locking entries, then remove them).

Suggested change
std::lock_guard<std::mutex> lock(fc->mutex);
std::unique_lock<std::mutex> lock(fc->mutex, std::try_to_lock);
if (!lock.owns_lock()) {
return false;
}

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +83
#ifdef OS_WIN
defer(if (fd > 0) { close(fd); }) // fd=0 is Windows directory sentinel
#else
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, using fd=0 as a directory sentinel combined with defer(if (fd > 0) close(fd);) can leak a real file descriptor if _wopen() happens to return 0 (possible if stdin is closed). Consider using a non-valid sentinel (e.g., -1/-2) for directories or tracking is_dir separately, and close any real fd with fd >= 0 when appropriate.

Copilot uses AI. Check for mistakes.
Comment on lines +845 to 848
// NOTE: remove file cache if > max_file_size
if (fc && fc->filebuf.len > files->GetMaxFileSize()) {
files->Close(fc->filepath.c_str());
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files->GetMaxFileSize() is used here to decide whether to evict the cached entry, but FileCache::max_file_size is never wired up to HttpService::max_file_cache_size (which controls OpenParam.max_read). If max_file_cache_size is increased, entries may still be evicted immediately due to the default 4MB FileCache::max_file_size. Consider initializing FileCache::max_file_size from service->max_file_cache_size when the server starts (same place stat_interval/expired_time are configured) so caching behavior is consistent.

Suggested change
// NOTE: remove file cache if > max_file_size
if (fc && fc->filebuf.len > files->GetMaxFileSize()) {
files->Close(fc->filepath.c_str());
}
// Avoid immediately evicting the just-served cached file based on a
// potentially stale FileCache max_file_size setting. Cache size policy
// should be enforced where the cache is configured/populated so it stays
// consistent with the service-level max_file_cache_size.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 801 to +809
// FileCache
// NOTE: no copy filebuf, more efficient
header = pResp->Dump(true, false);
fc->prepend_header(header.c_str(), header.size());
*data = fc->httpbuf.base;
*len = fc->httpbuf.len;
state = SEND_DONE;
return *len;
if (fc->prepend_header(header.c_str(), header.size())) {
*data = fc->httpbuf.base;
*len = fc->httpbuf.len;
state = SEND_DONE;
return *len;
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fc->prepend_header() writes the response header into the shared cache buffer (fc->buf) and then GetSendData() returns a pointer into that shared buffer. Since FileCache is shared across worker threads and multiple requests can serve the same file_cache_t concurrently, another thread can call prepend_header() and overwrite the header region while the current thread is still inside hio_write() copying/issuing the write, leading to a data race and potentially corrupted responses. To make this safe, avoid mutating shared cache memory for per-request headers (e.g., send header separately and then send fc->filebuf), or refactor so the entry mutex is held across the entire write/copy window (would require changing the GetSendData/SendHttpResponse interface).

Copilot uses AI. Check for mistakes.
std::lock_guard<std::mutex> lock(mutex);
if (len <= 0 || len > header_reserve) {
// Safe fallback: point httpbuf at filebuf so callers always get valid data
httpbuf = filebuf;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On prepend_header() failure (len <= 0 || len > header_reserve), header_used is not reset. If a previous call succeeded, metrics returned by get_header_used() / get_header_remaining() will be stale/misleading after a failed prepend. Consider setting header_used = 0 in the failure path as well.

Suggested change
httpbuf = filebuf;
httpbuf = filebuf;
header_used = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +129
// --- configurable parameters (were hardcoded macros before) ---
int stat_interval; // seconds between stat() checks
int expired_time; // seconds before cache entry expires
int max_header_length; // reserved header bytes per entry
int max_file_size; // max cached file size (larger = large-file path)

FileCache(size_t capacity = FILE_CACHE_MAX_NUM);
explicit FileCache(size_t capacity = FILE_CACHE_DEFAULT_MAX_NUM);

struct OpenParam {
bool need_read;
int max_read;
const char* path;
size_t filesize;
int error;
bool need_read;
int max_read; // per-request override for max file size
const char* path; // URL path (for directory listing)
size_t filesize; // [out] actual file size
int error; // [out] error code if Open returns NULL
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_file_size / OpenParam::max_read are declared as int, but they represent a byte size and are compared against st.st_size (typically off_t, potentially >2GB). Using int can overflow/truncate on large files and makes it harder to configure sizes beyond INT_MAX. Consider switching these fields and related APIs to size_t (or uint64_t) so large-file thresholds work correctly on 64-bit platforms.

Copilot uses AI. Check for mistakes.
Comment on lines 124 to 137
struct OpenParam {
bool need_read;
int max_read;
const char* path;
size_t filesize;
int error;
bool need_read;
int max_read; // per-request override for max file size
const char* path; // URL path (for directory listing)
size_t filesize; // [out] actual file size
int error; // [out] error code if Open returns NULL

OpenParam() {
need_read = true;
max_read = FILE_CACHE_MAX_SIZE;
max_read = FILE_CACHE_DEFAULT_MAX_FILE_SIZE;
path = "/";
filesize = 0;
error = 0;
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenParam::max_read defaults to FILE_CACHE_DEFAULT_MAX_FILE_SIZE, while FileCache now has a runtime-configurable max_file_size (and HttpServer sets it from service->max_file_cache_size). Callers that don't explicitly set param.max_read will silently ignore the instance’s configured max_file_size, which can lead to inconsistent caching behavior across call sites. Consider defaulting OpenParam::max_read from the owning FileCache instance (e.g., in Open() when param->max_read is unset/0), or remove one of these knobs to keep a single source of truth.

Copilot uses AI. Check for mistakes.
FileCache* filecache = &privdata->filecache;
filecache->stat_interval = service->file_cache_stat_interval;
filecache->expired_time = service->file_cache_expired_time;
filecache->max_file_size = service->max_file_cache_size;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assigns filecache->max_file_size directly, bypassing the new SetMaxFileSize() clamping/validation logic. If service->max_file_cache_size can be 0 or negative (e.g., misconfig to disable caching), this can cause surprising behavior in later comparisons/removals. Prefer calling filecache->SetMaxFileSize(service->max_file_cache_size) here to keep invariants consistent.

Suggested change
filecache->max_file_size = service->max_file_cache_size;
filecache->SetMaxFileSize(service->max_file_cache_size);

Copilot uses AI. Check for mistakes.
@Yundi339
Copy link
Copy Markdown
Author

Yundi339 commented Apr 5, 2026

基本问题已修复
遗留问题:
1、prepend_header 多线程竞争写共享 buffer
原理正确,但这是 libhv 原始设计的固有问题。两个线程同时服务同一个缓存文件,都调用 prepend_header,第二个会覆盖第一个正在发送的 header 区域。解决需要重构整个 GetSendData/SendHttpResponse 接口(复制到 per-request buffer 或持锁跨越整个写操作)。超出本 PR 范围,不修。

2、int 溢出(>2GB 文件)
理论正确,但 HttpService::max_file_cache_size 本身就是 int,缓存 >2GB 文件不合理。改 size_t 需改 HttpService 公共 API。不修。

3、OpenParam::max_read 默认值与实例配置不一致
理论正确,但主要调用者都已显式设置 param.max_read = service->max_file_cache_size。error page 路径默认 4MB 也足够。不修。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants