Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
1c6c1bc
refactor(block): expand ConfigSpace to full virtio-blk layout
kalyazin Apr 24, 2026
a0adea0
refactor(block): add Discard request type and discard segment struct
kalyazin Apr 24, 2026
08fb060
feat(block): add discard method to FileEngine using fallocate
kalyazin Apr 24, 2026
cb2ef05
style(seccomp): fix indentation and trailing whitespace in filter files
kalyazin Apr 24, 2026
9321e95
feat(seccomp): allow fallocate syscall in vmm thread filter
kalyazin Apr 24, 2026
c06070a
feat(block): handle VIRTIO_BLK_T_DISCARD requests
kalyazin Apr 24, 2026
4b3b62d
chore(snapshot): fix ConfigSpace restore for VIRTIO_BLK_F_DISCARD
kalyazin Apr 24, 2026
4ca982a
feat(block): advertise VIRTIO_BLK_F_DISCARD for non-read-only devices
kalyazin Apr 24, 2026
83d35bd
doc(block): document VIRTIO_BLK_F_DISCARD discard support
kalyazin Apr 24, 2026
6de6d87
test(block): add unit tests for VIRTIO_BLK_F_DISCARD
kalyazin Apr 24, 2026
58127c0
test(block): add pytest integration tests for VIRTIO_BLK_F_DISCARD
kalyazin Apr 24, 2026
4a78790
refactor(block): extend ConfigSpace with write-zeroes fields
kalyazin May 6, 2026
997e58c
refactor(block): add WriteZeroes request type and supporting variants
kalyazin May 6, 2026
87b8fbb
feat(block): add write_zeroes method to FileEngine using fallocate
kalyazin May 6, 2026
32302ed
feat(block): handle VIRTIO_BLK_T_WRITE_ZEROES requests
kalyazin May 6, 2026
5e17017
feat(block): advertise VIRTIO_BLK_F_WRITE_ZEROES for non-read-only de…
kalyazin May 6, 2026
b0e9d4a
test(block): add unit tests for VIRTIO_BLK_F_WRITE_ZEROES
kalyazin May 6, 2026
e028273
test(block): add pytest integration tests for VIRTIO_BLK_F_WRITE_ZEROES
kalyazin May 6, 2026
d06e144
doc(block): document VIRTIO_BLK_F_WRITE_ZEROES support
kalyazin May 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/api_requests/block-discard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Block device discard (TRIM)

Firecracker supports the `VIRTIO_BLK_F_DISCARD` feature, which allows the guest
to issue discard (TRIM) requests to the block device. Discard requests tell the
host that a range of sectors is no longer needed, enabling the host to reclaim
space on sparse or thin-provisioned backing files.

## How it works

For all non-read-only block devices, Firecracker automatically advertises the
`VIRTIO_BLK_F_DISCARD` feature to the guest driver. No API configuration is
required — discard support is always-on for writable drives.

When the guest driver issues a `VIRTIO_BLK_T_DISCARD` request, Firecracker calls
`fallocate(2)` with `FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE` on the backing
file for each discard segment. This punches a hole in the file, freeing the
underlying disk blocks without changing the file size.

Guest tools that trigger discard include:

- `fstrim -v /` — manually trim a mounted filesystem
- `discard` mount option — automatic discard on file deletion
- `blkdiscard /dev/vda` — discard the entire block device

## Host requirements

The backing file must reside on a filesystem and kernel that support
`FALLOC_FL_PUNCH_HOLE`. This is supported on ext4, xfs, btrfs, and tmpfs on
Linux 3.5+. On filesystems that do not support hole-punching, `fallocate`
returns `EOPNOTSUPP`. Firecracker detects this on the first discard, logs a
one-time warning, and replies to the guest with `VIRTIO_BLK_S_UNSUPP`. The Linux
virtio-blk driver propagates `VIRTIO_BLK_S_UNSUPP` through the block layer and
stops issuing further discard requests. Firecracker short-circuits any remaining
discard requests with `VIRTIO_BLK_S_UNSUPP` immediately — no additional
`fallocate` calls are made.

## Limitations

- Discard is only available for non-read-only block devices.
- At most one discard segment per request is supported (`max_discard_seg = 1`).
- The discard segment flags field must be zero; non-zero flags are rejected with
an I/O error.
66 changes: 66 additions & 0 deletions docs/api_requests/block-write-zeroes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Block device write-zeroes

Firecracker supports the `VIRTIO_BLK_F_WRITE_ZEROES` feature, which allows the
guest to ask the device to zero a range of sectors without transferring a buffer
of zeros over the virtqueue. Common consumers are `mkfs` (clearing inode tables
and journals), filesystem snapshots, encrypted-volume initial wipe, and
`blkdiscard -z` / `blkzeroout` from userspace.

## How it works

For all non-read-only block devices, Firecracker automatically advertises the
`VIRTIO_BLK_F_WRITE_ZEROES` feature to the guest driver. No API configuration
is required — write-zeroes support is always-on for writable drives.

Each `VIRTIO_BLK_T_WRITE_ZEROES` request carries a 16-byte segment with a
`flags` field. Bit 0 (`VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP`) tells the device
whether it may also deallocate the underlying backing-file blocks. Firecracker
advertises `write_zeroes_may_unmap=1`, so guests are free to set this flag.

Firecracker translates the guest's UNMAP bit into a `fallocate(2)` mode on the
backing file:

| UNMAP | fallocate mode | Effect |
|-------|---------------------------------------------|---------------------------------------|
| 0 | `FALLOC_FL_ZERO_RANGE \| FALLOC_FL_KEEP_SIZE` | zeros in place, no deallocation |
| 1 | `FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE` | zeros + deallocate (sparse holes) |

The virtio spec requires that when UNMAP is clear the device MUST NOT
deallocate sectors (so `ZERO_RANGE` is mandatory for that path); when UNMAP
is set, the device MAY deallocate, and `PUNCH_HOLE` reads as zeros on every
filesystem that supports it.

## Host requirements

The backing file must reside on a filesystem that supports the corresponding
`fallocate` mode:

- `FALLOC_FL_PUNCH_HOLE` (UNMAP=1) is widely supported: ext4, xfs, btrfs, tmpfs.
- `FALLOC_FL_ZERO_RANGE` (UNMAP=0) is supported on ext4, xfs, btrfs; on tmpfs
it requires Linux 6.8+. Other filesystems may not support it.

If `fallocate` returns `EOPNOTSUPP` for either mode, Firecracker logs a one-time
warning and replies with `VIRTIO_BLK_S_UNSUPP`. The Linux virtio-blk driver
propagates that status through the block layer and stops issuing further
write-zeroes requests, so subsequent guest writes fall back to plain
`REQ_OP_WRITE` traffic. Firecracker short-circuits any in-flight write-zeroes
requests with `VIRTIO_BLK_S_UNSUPP` for the rest of the device's lifetime — no
additional `fallocate` calls are made.

The EOPNOTSUPP cache is shared across UNMAP=0 and UNMAP=1 paths: a single
fallback flag disables both. This is conservative — a filesystem that
supports `PUNCH_HOLE` but not `ZERO_RANGE` will see UNMAP=1 requests rejected
once an UNMAP=0 request fails — but it matches the discard fallback design
and avoids subtle host-side state.

## Limitations

- Write-zeroes is only available for non-read-only block devices.
- At most one segment per request is supported (`max_write_zeroes_seg = 1`).
- Only bit 0 (UNMAP) of the segment flags is allowed; non-zero reserved bits
are rejected with an I/O error.
- `EOPNOTSUPP` errors from the io_uring async engine are surfaced as
`VIRTIO_BLK_S_IOERR`, not silent `VIRTIO_BLK_S_UNSUPP` (the cache only
triggers for the synchronous engine, which is where the EOPNOTSUPP
detection currently lives). For async-engine devices, configure the
backing filesystem to support both modes.
14 changes: 9 additions & 5 deletions resources/seccomp/aarch64-unknown-linux-musl.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@
{
"syscall": "fsync"
},
{
"syscall": "fallocate",
"comment": "Used by the block device for VIRTIO_BLK_F_DISCARD (FALLOC_FL_PUNCH_HOLE)"
},
{
"syscall": "close"
},
Expand Down Expand Up @@ -110,8 +114,8 @@
"comment": "sigaltstack is used by Rust stdlib to remove alternative signal stack during thread teardown."
},
{
"syscall": "getrandom",
"comment": "getrandom is used by aws-lc library which we consume in virtio-rng"
"syscall": "getrandom",
"comment": "getrandom is used by aws-lc library which we consume in virtio-rng"
},
{
"syscall": "accept4",
Expand Down Expand Up @@ -213,7 +217,7 @@
},
{
"syscall": "madvise",
"comment": "Used by the VirtIO balloon device and by musl for some customer workloads. It is also used by aws-lc during random number generation. They setup a memory page that mark with MADV_WIPEONFORK to be able to detect forks. They also call it with -1 to see if madvise is supported in certain platforms."
"comment": "Used by the VirtIO balloon device and by musl for some customer workloads. It is also used by aws-lc during random number generation. They setup a memory page that mark with MADV_WIPEONFORK to be able to detect forks. They also call it with -1 to see if madvise is supported in certain platforms."
},
{
"syscall": "msync",
Expand Down Expand Up @@ -544,8 +548,8 @@
"comment": "sigaltstack is used by Rust stdlib to remove alternative signal stack during thread teardown."
},
{
"syscall": "getrandom",
"comment": "getrandom is used by `HttpServer` to reinialize `HashMap` after moving to the API thread"
"syscall": "getrandom",
"comment": "getrandom is used by `HttpServer` to reinialize `HashMap` after moving to the API thread"
},
{
"syscall": "accept4",
Expand Down
8 changes: 6 additions & 2 deletions resources/seccomp/x86_64-unknown-linux-musl.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@
{
"syscall": "fsync"
},
{
"syscall": "fallocate",
"comment": "Used by the block device for VIRTIO_BLK_F_DISCARD (FALLOC_FL_PUNCH_HOLE)"
},
{
"syscall": "close"
},
Expand Down Expand Up @@ -559,8 +563,8 @@
"comment": "sigaltstack is used by Rust stdlib to remove alternative signal stack during thread teardown."
},
{
"syscall": "getrandom",
"comment": "getrandom is used by `HttpServer` to reinialize `HashMap` after moving to the API thread"
"syscall": "getrandom",
"comment": "getrandom is used by `HttpServer` to reinialize `HashMap` after moving to the API thread"
},
{
"syscall": "accept4",
Expand Down
Loading
Loading