Skip to content

Comments

Optimize data cluster format for VHD and QCOW#6895

Open
last-genius wants to merge 9 commits intoxapi-project:26.1-lcmfrom
last-genius:asv/8.3-sparse-vhd-qcow
Open

Optimize data cluster format for VHD and QCOW#6895
last-genius wants to merge 9 commits intoxapi-project:26.1-lcmfrom
last-genius:asv/8.3-sparse-vhd-qcow

Conversation

@last-genius
Copy link
Contributor

@last-genius last-genius commented Feb 6, 2026

mirage/ocaml-qcow#134 changes the type of the data structure containing info on allocated data clusters, returning allocated intervals instead of all the virtual cluster addresses. Change qcow2-to-stdout to the new interval-based format.

Add vhd-tool read_headers_interval command which also conforms to this new format, and change the parsing code in stream_vdi to accept both formats depending on a feature flag. Add cram tests verifying legacy format is preserved as-is.

I've ran vm export and vdi integrity quicktests and tested this extensively locally. The PR will only build once the new ocaml-qcow version is packaged into xs-opam, so keeping this as draft for now.

@last-genius
Copy link
Contributor Author

Opened the corresponding xs-opam PR: xapi-project/xs-opam#755

@last-genius last-genius marked this pull request as ready for review February 18, 2026 09:47
| x :: y :: _ ->
(to_int x, to_int y)
| _ ->
raise (Invalid_argument "Invalid JSON")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to report the json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be rather large and pollute the logs... This shouldn't happen unless you have a version incompatibility, really

…rmat

Qcow_stream now uses Qcow_mapping to store information on allocated clusters,
which offers .to_interval_seq, outputting a list of pairs representing
intervals of allocated virtual clusters.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This allows to switch on the more efficient interval format later.
(QCOW always uses the new format)

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This is just an easy way to make sure the semantics are preserved in any
future refactorings, without having to run full VHD exports.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This command returns a more efficient representation of allocated clusters
(when compared to read_headers), utilizing a sparse interval format instead of
returning every single allocated cluster.

This is the more efficient option, decreasing the filesize and memory usage in
vhd-tool, but it's currently under a feature flag, so it's added as a new
command instead of replacing read_headers immediately.

Cram test for read_headers is still passing, so this refactoring has preserved
the legacy format.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Since the runtime feature flag vhd_legacy_blocks_format determines which block
format is used to describe allocated VHD clusters, this requires duplicate
parse_header_interval functions for VHD and QCOW.

The right functions are selected in stream_vdi based on the feature flag.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…er allocation

Instead of using a set with every individual allocated cluster index as a
member, use a sorted list of intervals to verify if cluster is allocated - this
uses much less memory and directly follows from the JSON format
qcow-stream-tool and vhd-tool output now.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…_clusters

nonzero_clusters no longer contain every single allocated cluster and instead
are intervals of allocated clusters.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
… files

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
@last-genius last-genius force-pushed the asv/8.3-sparse-vhd-qcow branch from d41347a to bcf47a8 Compare February 19, 2026 10:33
@last-genius
Copy link
Contributor Author

Changed the approach:

  • Refactored the QCOW-only side to the interval-based format (qcow2-to-stdout.py + new interface in ocaml-qcow)
  • Added the interval-based format to vhd-tool alongside "legacy" format
  • Added a vhd_legacy_blocks_format feature flag to control which format stream_vdi uses for VHD and QCOW. It's defaulted to legacy.
  • Added cram tests to verify vhd-tool read_headers legacy format is preserved across this refactoring.

This way, XS (which only uses VHD) can keep using the legacy format until they can ensure the new format didn't accidentally break anything else. XCP-ng will flip the feature flag to make VHD consistent with QCOW - we will use the interval-based format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants