feat: add FileStore implementation for cache (#427) #443

achsanalfitra · 2026-01-06T05:49:42Z

Pull Request to Issue #427

Key info:

CAS-based registry
Hashed using md-5 (which is already an existing dependency from sqlx)
Eviction on get and contains key (similar to memory cache)

Note:

Uses DateTime to parse serialized expiry into Timeout (this can be improved by implementing From in Timeout)
No retry policy implemented
No entry cap implemented

ElijahAhianyo · 2026-01-13T16:23:20Z

cot/src/cache/store/file.rs

+        };
+
+        if expiry.is_expired(None) {
+            return Ok(false);


should we not remove the file here (early on), right when we find out the file has expired, making this a single source of truth?

Yes, that would be more efficient. Initially, I was aiming to match the eviction policy that only read() and contains_key() may delete. I guess it would be better to eagerly remove it once expired

ElijahAhianyo · 2026-01-13T16:29:13Z

cot/src/cache/store/file.rs

+        let mut buffer = Vec::new();
+
+        // advances cursor by the expiry header offset
+        file.seek(SeekFrom::Start(8))


We seem to seek here again after doing so when we parse the expiry. Does that not make this seek redundant?

Thinking about it again, I agree

I don't agree. Firstly, the parse_expiry function resets the cursor back to the beginning (as it should) so this is needed. Secondly, this should stay here for correctness purpose, we can't assume the cursor on the file will always be at the beginning.

ElijahAhianyo · 2026-01-13T16:38:48Z

cot/src/cache/store/file.rs

+}
+
+#[cfg(test)]
+mod tests {


we should also update

cot/cot-macros/src/cache.rs

Line 5 in 906c6cb

pub(super) fn fn_to_cache_test(test_fn: &ItemFn) -> TokenStream {

as well as

cot/cot/src/test.rs

Line 1777 in 906c6cb

pub struct TestCache {

with a File variant to setup integration tests for the file store

ElijahAhianyo · 2026-01-13T16:44:40Z

For the failing nightly tests, I suspect it might be broken upstream, I have that pinned in my PR(

cot/.github/workflows/rust.yml

Line 20 in 26ef6bb

_RUST_NIGHTLY: &rust_nightly nightly-2025-11-11

) which should make the CI deterministic, that should get merged soon

m4tx

Thank you for your patience waiting for my review! This looks very good already, but there are a few things I'd like to see clarified before we merge.

cot/src/cache/store/file.rs

m4tx · 2026-01-13T23:19:56Z

cot/src/cache/store/file.rs

+use std::path::Path;
+
+use chrono::{DateTime, Utc};
+use md5::{Digest, Md5};


Maybe it's worth using sha2 instead? I know md5 is already in our indirect dependencies - but sha2 is a direct dependency (used in the auth), and probably less prone to collisions. After all it doesn't matter that much, but it might be worth sticking to one algorithm as broadly as possible.

Actually, for caching purpose we don't need a classic cryptographic secure hashing algorithm, we should focus on a fast one instead. I'd suggest using BLAKE3 for that case and it has official Rust implementation. If we trust their benchmarks, it's much faster than SHA2.

@seqre only if we (or any of our dependencies) use it already. I don't see much value in having the fastest function for hashing the cache keys (which will typically be small).

After looking at this again, I think I agree we should use blake3 for this. Currently we use:

md5 for MySQL and PostgreSQL (both behind feature flags)

blake2 for hashing passwords (not behind feature flags, but maybe should be)

sha2 for hashing static file contents (not behind feature flag)

I think it's okay to use blake3 here, and we can update the static file content hashing to use blake3, too, to avoid extra dependencies.

@m4tx sha2 and blake2 are relatively smaller than blake3 though.

The upside is that blake3 offers more features and is more performant than the rest (as they claimed). I agree that this can be beneficial long term if we reuse the dependency in other cases, such as the static file content hashing.

I think it's fine to use blake3 even if it's bigger than blake2. With much better performance it'll be a big benefit for the static files. At the end of the day, changing the hashing algorithm when needed is trivial.

m4tx · 2026-01-13T23:21:38Z

cot/src/cache/store/file.rs

+        key: &str,
+    ) -> CacheStoreResult<Option<(tokio::fs::File, std::path::PathBuf)>> {
+        let key_hash = FileStore::create_key_hash(key);
+        let path = self.dir_path.join(&key_hash);


What do we do in case there's a collision when hashing?

Currently, there's no resolve on collision. My ideas are to embed real name on the file as header (this is simpler but may incur syscall only if the real name doesn't match the file). Another option would be to implement a jump table and sync to file on push new hash

I don't think we really need a collision discovery unless it'd be simple to implement and fast at runtime. If we switch to another algorithm, their collision resistance is pretty high. For example for BLAKE3, it's 2**128. At that level of resistance, getting any collision is a feat and getting a specific collision for an attack seems almost impossible.

m4tx · 2026-01-13T23:23:56Z

cot/src/cache/store/file.rs

+        let data = serde_json::to_string(&value)
+            .map_err(|e| FileCacheStoreError::Serialize(Box::new(e)))?;
+
+        let mut buffer: Vec<u8> = Vec::with_capacity(8 + data.len());


Let's extract the magic number to a named constant to clarify what it means.

And since we essentially create a custom binary format, it might be worth documenting this at the module level.

m4tx · 2026-01-13T23:28:15Z

cot/src/cache/store/file.rs

+            if let Ok(meta) = entry.metadata().await
+                && meta.is_file()
+            {
+                total_size += meta.len();


This will return the number of bytes; not the number of entries, no?

Yes this would return the total bytes. I was under the impression that approx_size depends on the cache type to track its quantity unit. If this is changed to entries number, should this bytes aggregation function be keep around for future use (maybe for monitoring)?

I wouldn't say so, until we add functionality for that for all cache stores, it would be just dead code. We can always recreate it!

seqre

Thank you for your contribution, it's a great start! There are some things that needs changing before we merge it though.

seqre · 2026-01-14T10:59:07Z

cot/src/cache/store/file.rs

+//! let key = "example_key".to_string();
+//! let value = serde_json::json!({"data": "example_value"});
+//!
+//! store.insert(key.clone(), value.clone(), Default::default()).await.unwrap();


nit: I'd change it to Timeout::default() so that people reading the docs know what the 3rd argument is.

seqre · 2026-01-14T11:08:43Z

cot/src/cache/store/file.rs

+use std::path::Path;
+
+use chrono::{DateTime, Utc};
+use md5::{Digest, Md5};


Actually, for caching purpose we don't need a classic cryptographic secure hashing algorithm, we should focus on a fast one instead. I'd suggest using BLAKE3 for that case and it has official Rust implementation. If we trust their benchmarks, it's much faster than SHA2.

seqre · 2026-01-14T11:10:24Z

cot/src/cache/store/file.rs

+use crate::config::Timeout;
+use crate::error::error_impl::impl_into_cot_error;
+
+const ERROR_PREFIX: &str = "file based cache store error:";


nit:

Suggested change

const ERROR_PREFIX: &str = "file based cache store error:";

const ERROR_PREFIX: &str = "file-based cache store error:";

seqre · 2026-01-14T11:13:01Z

cot/src/cache/store/file.rs

+    #[error("{ERROR_PREFIX} file dir creation error: {0}")]
+    DirCreation(Box<dyn std::error::Error + Send + Sync>),
+
+    /// An error occured during temp file creation
+    #[error("{ERROR_PREFIX} file temp file creation error: {0}")]
+    TempFileCreation(Box<dyn std::error::Error + Send + Sync>),
+
+    /// An error occured during write/stream file
+    #[error("{ERROR_PREFIX} file io error: {0}")]


nit: "file" is already included in ERROR_PREFIX

Suggested change

#[error("{ERROR_PREFIX} file dir creation error: {0}")]

DirCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during temp file creation

#[error("{ERROR_PREFIX} file temp file creation error: {0}")]

TempFileCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during write/stream file

#[error("{ERROR_PREFIX} file io error: {0}")]

#[error("{ERROR_PREFIX} dir creation error: {0}")]

DirCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during temp file creation

#[error("{ERROR_PREFIX} temp file creation error: {0}")]

TempFileCreation(Box<dyn std::error::Error + Send + Sync>),

/// An error occured during write/stream file

#[error("{ERROR_PREFIX} io error: {0}")]

seqre · 2026-01-14T11:18:10Z

cot/src/cache/store/file.rs

+        store.create_dir_sync_root()?;
+
+        Ok(store)
+    }
+
+    fn create_dir_sync_root(&self) -> CacheStoreResult<()> {


nit: I think sync should be a suffix to show it's another version of existing async function

Suggested change

store.create_dir_sync_root()?;

Ok(store)

}

fn create_dir_sync_root(&self) -> CacheStoreResult<()> {

store.create_dir_root_sync()?;

Ok(store)

}

fn create_dir_root_sync(&self) -> CacheStoreResult<()> {

seqre · 2026-01-14T12:06:49Z

cot/src/cache/store/file.rs

+        &self,
+        file: &mut tokio::fs::File,
+    ) -> CacheStoreResult<Option<Value>> {
+        if !self.parse_expiry(file).await? {


I'm not fully convinced the parse_expiry is the best name, I had to spent few seconds parsing this line mentally. It's what the function does technically, but logically it checks if the file expired. I wonder if it would be better to rename it to check_expiry, is_expired or similar

seqre · 2026-01-14T12:25:34Z

cot/src/cache/store/file.rs

+        key: &str,
+    ) -> CacheStoreResult<Option<(tokio::fs::File, std::path::PathBuf)>> {
+        let key_hash = FileStore::create_key_hash(key);
+        let path = self.dir_path.join(&key_hash);


I don't think we really need a collision discovery unless it'd be simple to implement and fast at runtime. If we switch to another algorithm, their collision resistance is pretty high. For example for BLAKE3, it's 2**128. At that level of resistance, getting any collision is a feat and getting a specific collision for an attack seems almost impossible.

seqre · 2026-01-14T12:27:00Z

cot/src/cache/store/file.rs

+        Ok((temp_file, temp_path))
+    }
+
+    async fn file_open(


nit: maybe open_file_for_reading instead? It seems more aligned with what it does, or at least it says more about what it specifically does.

seqre · 2026-01-14T12:30:07Z

cot/src/cache/store/file.rs

+            if let Ok(meta) = entry.metadata().await
+                && meta.is_file()
+            {
+                total_size += meta.len();


I wouldn't say so, until we add functionality for that for all cache stores, it would be just dead code. We can always recreate it!

seqre · 2026-01-14T12:33:59Z

cot/src/cache/store/file.rs

+    }
+
+    async fn contains_key(&self, key: &str) -> CacheStoreResult<bool> {
+        let Ok(Some(mut file_tuple)) = self.file_open(key).await else {


Please deconstruct the tuple here with pattern matching into specific parts, so you don't have to use .0 and .1 below.

achsanalfitra · 2026-01-17T12:50:25Z

Hi everyone, thank you for the feedback. And... sorry for the wait!

I think that is all, excluding

Integration test (considering that it'd be too complicated if I add it here now?)
Reimplementation of hashing/on collision mechanism

m4tx

This looks good - however, I've identified one major problem: I think we need to lock the files before writing to avoid data races and broken cache entries. Please have a look at my comment for more details.

In addition to that, please ensure the change can be cleanly merged into the master branch - at the moment it seems like it does some changes in unrelated files (looks like 2721b6a hasn't properly got into your branch?).

I think after these are fixed, this will be ready to be merged!

m4tx · 2026-01-20T10:56:54Z

cot/src/cache/store/file.rs

+#[non_exhaustive]
+pub enum FileCacheStoreError {
+    /// An error occured during directory creation
+    #[error("{ERROR_PREFIX} dir creation error: {0}")]


Suggested change

#[error("{ERROR_PREFIX} dir creation error: {0}")]

#[error("{ERROR_PREFIX} directory creation error: {0}")]

nitpick: clarity

m4tx · 2026-01-20T10:57:27Z

cot/src/cache/store/file.rs

+    DirCreation(Box<dyn std::error::Error + Send + Sync>),
+
+    /// An error occured during temp file creation
+    #[error("{ERROR_PREFIX} temp file creation error: {0}")]


Suggested change

#[error("{ERROR_PREFIX} temp file creation error: {0}")]

#[error("{ERROR_PREFIX} temporary file creation error: {0}")]

nitpick

m4tx · 2026-01-20T10:57:35Z

cot/src/cache/store/file.rs

+    TempFileCreation(Box<dyn std::error::Error + Send + Sync>),
+
+    /// An error occured during write/stream file
+    #[error("{ERROR_PREFIX} io error: {0}")]


Suggested change

#[error("{ERROR_PREFIX} io error: {0}")]

#[error("{ERROR_PREFIX} I/O error: {0}")]

nitpick

m4tx · 2026-01-20T11:38:22Z

cot/src/cache/store/file.rs

+use std::path::Path;
+
+use chrono::{DateTime, Utc};
+use md5::{Digest, Md5};


I think it's fine to use blake3 even if it's bigger than blake2. With much better performance it'll be a big benefit for the static files. At the end of the day, changing the hashing algorithm when needed is trivial.

m4tx · 2026-01-20T11:49:47Z

cot/src/cache/store/file.rs

+        file.sync_data()
+            .await
+            .map_err(|e| FileCacheStoreError::Io(Box::new(e)))?;
+        tokio::fs::rename(file_path, self.dir_path.join(&key_hash))


Should we close the file before renaming?

I think fs currently has a no-op on flush() for Windows and Unix. And, since we use write_all() that will not return mid-flight, this is safe to call since all the write operations were completed before rename(). Are there other cases that we miss here?

m4tx · 2026-01-20T12:35:58Z

cot/src/cache/store/file.rs

+        &self,
+        key_hash: &str,
+    ) -> CacheStoreResult<(tokio::fs::File, std::path::PathBuf)> {
+        let temp_path = self.dir_path.join(format!("{key_hash}.{TEMPFILE_SUFFIX}"));


One thing I'm worried about is that this can lead to data races. Since we are using a multithreaded runtime (but this would also apply to a case where we would have multiple workers set up with the same cache directory), it's possible that we'll have two competing threads trying to write to the same file, resulting in a garbled output.

I see that Django seems to be locking the files exclusively before attempting to write to them - perhaps we should implement a similar mechanism using something like fs4?

I think if we lock the file, there's still a TOCTOU race condition on rename(). When the lock is released before it completes, another thread can start loading new data.

What about retry on create_new() instead, since that method will return error if the file already exist? I'll test this approach right away.

I've tested the create_new() approach. That works, but with a catch:

There maybe ghosted .tmp due to crash mid-write. Therefore create_dir_root_sync() was modified to clean the directory out of all the leftovers.

Only problem comes if separate processes (not threads) are writing cache into the same cache directory and cleanup on sync() would delete ongoing write from another process.

Edit: on second thought, I'll try to lock the temp file approach

Update: the locking approach on temp file works, but it can lead to race conditions on rename(). That's manageable by making the file cache best effort where this would mirror django's approach.

However, using this approach, the store initiation/cleanup won't corrupt mid-flight writes.

Which one should we implement?

Note: try_lock_shared() for cleaning up orphaned .tmp files requires MSRV 1.89 to be stable. Alternatively, we can use the try_lock_exclusive() which doesn't corrupt the write, but sometimes remove the data pre-rename if it got hold onto the file first, effectively making the file gone.

…ative windows API

achsanalfitra · 2026-01-22T05:13:01Z

I've tested 0adfa9e on Windows and it seems to work properly. The tests also pass on Miri (Ubuntu build).

The problem was I didn't check the result boolean on try_lock_exclusive() which on Windows OS would give Ok(false) if the lock is used by another thread. Also fixed an ACCESS_DENIED issue when renaming a locked file.

Hopefully, this will work now.

m4tx

Looks good overall. I'll need to take a longer look (hopefully tomorrow), but I'm not expecting to find anything major that needs to be changed.

Regarding the CI License check - you're free to add the BSD-2 clause license to the cargo-deny config.

Just one question about the implementation: my understanding is that we could just as well skip using temporary files, and achieve the same thing with lock_shared/lock_exclusive. Temporary files just give us potentially better performance, because reading will never lock. Is that correct? (just to clarify, the current approach is fine; I just want to make sure it's clear it's not the only valid one)

m4tx · 2026-01-23T10:11:15Z

cot/Cargo.toml

 email_address.workspace = true
 fake = { workspace = true, optional = true, features = ["derive", "chrono"] }
 form_urlencoded.workspace = true
+fs4.workspace = true


Since cache is an optional feature, this could be an optional dependency, too.

m4tx · 2026-01-23T10:11:50Z

cot/Cargo.toml

 async-trait.workspace = true
 axum = { workspace = true, features = ["http1", "tokio"] }
 backtrace.workspace = true
+blake3.workspace = true


Technically this should also be an optional dependency, but since we plan to switch static files to use blake3 instead of sha2, it's fine to leave this as-is.

m4tx · 2026-01-23T10:24:48Z

cot/src/cache/store.rs

 //! provide a simple asynchronous interface for putting, getting, and managing
 //! cached values, optionally with expiration policies.

+pub mod file;


For a full implementation, please take a look at config.rs whether it also needs to be modified, so that the users can easily switch between cache implementations in the config TOML file. This will also require a change here:

cot/cot/src/cache.rs

Line 791 in 01aeef5

_ => {

codecov · 2026-01-23T10:29:33Z

Codecov Report

❌ Patch coverage is 84.58150% with 70 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
cot/src/cache/store/file.rs	84.58%	37 Missing and 33 partials ⚠️

Flag	Coverage Δ
rust	`90.24% <84.58%> (-0.14%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
cot/src/cache/store.rs	`100.00% <ø> (ø)`
cot/src/cache/store/file.rs	`84.58% <84.58%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

achsanalfitra · 2026-01-23T10:51:36Z

Yes, I agree. We could use lock_shared / lock_exclusive on a single file and skip the swap entirely. However, in a multithreaded runtime, lock_exclusive can lead to deadlocks, which pushes the design toward using try_lock_exclusive with retries. At that point, we are effectively implementing a retry-based contention loop.

With the swap-based approach, readers never need to acquire a lock, and contention is naturally pushed onto writers instead. I think this approach potentially gives us more performance and efficiency (due to minimum lock during read) when the caching workload is read-heavy.

Looks good overall. I'll need to take a longer look (hopefully tomorrow), but I'm not expecting to find anything major that needs to be changed.

Regarding the CI License check - you're free to add the BSD-2 clause license to the cargo-deny config.

Just one question about the implementation: my understanding is that we could just as well skip using temporary files, and achieve the same thing with lock_shared/lock_exclusive. Temporary files just give us potentially better performance, because reading will never lock. Is that correct? (just to clarify, the current approach is fine; I just want to make sure it's clear it's not the only valid one)

github-actions bot added the C-lib Crate: cot (main library crate) label Jan 6, 2026

ElijahAhianyo reviewed Jan 13, 2026

View reviewed changes

m4tx requested changes Jan 13, 2026

View reviewed changes

seqre requested changes Jan 14, 2026

View reviewed changes

achsanalfitra force-pushed the cache/store/file branch from ccd8c53 to 9e6cda3 Compare January 17, 2026 12:45

github-actions bot added C-cli Crate: cot-cli (issues and Pull Requests related to Cot CLI) A-ci Area: CI (Continuous Integration) labels Jan 17, 2026

achsanalfitra requested review from m4tx and seqre January 17, 2026 12:54

m4tx requested changes Jan 20, 2026

View reviewed changes

achsanalfitra force-pushed the cache/store/file branch from 9e6cda3 to 4b0489c Compare January 20, 2026 15:48

achsanalfitra added 5 commits January 22, 2026 06:25

wip: reset branch to match master

f571f6c

fix: dependency

3b05a5f

fix: add branch for windows error code

da541c8

fix: unpack inner boolean properly on try_lock_exclusive to use the n…

98c4bf6

…ative windows API

fix: access denied on rename on Windows

0adfa9e

achsanalfitra force-pushed the cache/store/file branch from cd74051 to 0adfa9e Compare January 22, 2026 05:06

m4tx reviewed Jan 23, 2026

View reviewed changes

	const ERROR_PREFIX: &str = "file based cache store error:";
	const ERROR_PREFIX: &str = "file-based cache store error:";

	#[error("{ERROR_PREFIX} dir creation error: {0}")]
	#[error("{ERROR_PREFIX} directory creation error: {0}")]

	#[error("{ERROR_PREFIX} temp file creation error: {0}")]
	#[error("{ERROR_PREFIX} temporary file creation error: {0}")]

	#[error("{ERROR_PREFIX} io error: {0}")]
	#[error("{ERROR_PREFIX} I/O error: {0}")]

Uh oh!

feat: add FileStore implementation for cache (#427) #443

Are you sure you want to change the base?

feat: add FileStore implementation for cache (#427) #443

Conversation

achsanalfitra commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElijahAhianyo commented Jan 13, 2026

Uh oh!

m4tx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seqre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Jan 14, 2026 •

edited

Loading

achsanalfitra Jan 20, 2026 •

edited

Loading