Rotation metadata support #1173

mollyxu · 2026-01-16T21:25:53Z

Add rotation metadata support

Context: #1084

Videos recorded on mobile devices often store rotation metadata indicating how the video should be displayed. TorchCodec now extracts this metadata and applies the rotation during decoding so frames are returned in the correct orientation.

This PR adds support for extracting and applying rotation metadata from video streams. The rotation angle (in degrees counter clockwise) is extracted from the display matrix side data and:

Exposed via VideoStreamMetadata.rotation
Automatically applied during decoding — frames are rotated so they display correctly

from torchcodec.decoders import VideoDecoder

decoder = VideoDecoder("rotated_video.mp4")
print(decoder.metadata.rotation)  # e.g., 90

The `rotation` property

None: The video has no rotation metadata (most videos)
float: The rotation angle in degrees (counter-clockwise) needed for correct display. Common values are 0, 90, 180, -90, or -180.

Implementation

Metadata extraction (FFMPEGCommon.cpp):

Added getRotationFromStream() which extracts the rotation from AV_PKT_DATA_DISPLAYMATRIX side data
Handles FFmpeg version differences:
- FFmpeg ≥ 6.1: Uses av_packet_side_data_get() with codecpar->coded_side_data
- FFmpeg 5/6: Uses av_stream_get_side_data() with deprecation warning suppression for FFmpeg 6
- FFmpeg 4: Uses av_stream_get_side_data() with int* size parameter
Uses av_display_rotation_get() to compute the angle from the display matrix

Metadata propagation:

StreamMetadata struct now includes an std::optional<double> rotation field (Metadata.h)
SingleStreamDecoder populates this during stream scanning
Python VideoStreamMetadata dataclass exposes rotation: float | None (_metadata.py)
JSON metadata serialization includes the rotation value (custom_ops.cpp)

Test coverage

Added nasa_13013_rotated.mp4 test resource with 90° rotation metadata
generated video using

ffmpeg -y -display_rotation 90 -i test/resources/nasa_13013.mp4 \
  -c copy \
  test/resources/nasa_13013_rotated.mp4

test_rotation_metadata (in test_metadata.py): Verifies rotation is correctly extracted for rotated videos and is None for non-rotated videos
test_rotation_applied_to_frames (in test_decoders.py): Performs pixel-exact comparison with ffmpeg CLI output

mollyxu · 2026-01-20T22:53:20Z

src/torchcodec/_core/FFMPEGCommon.h

+// Extracts the rotation angle in degrees from the stream's display matrix
+// side data. The display matrix is used to specify how the video should be
+// rotated for correct display.
+std::optional<double> getRotationFromStream(const AVStream* avStream);


Not sure if there is a huge difference between using optional rather than 0 to represent no rotation.

Would it be useful to distinguish the case of a video with no rotation metadata and a video that is known to not been rotated before (ie. rotation of 0 degrees)?

I'm not sure if this distinction is important to users or FFmpeg, but optional seems like a reasonable type to use here. Correct me if I'm wrong, the only difference is that we need to check rotation is not None?

mollyxu · 2026-01-21T04:11:30Z

src/torchcodec/_core/FFMPEGCommon.cpp

+#elif LIBAVFORMAT_VERSION_MAJOR >= 60 // FFmpeg 6.0
+  // FFmpeg 6.0: Use av_stream_get_side_data (deprecated but still available)
+  // Suppress deprecation warning for this specific call
+#pragma GCC diagnostic push


I'm not sure if there is a better way to handle the case of FFmpeg 6 deprecating av_stream_get_side_data while the replacement codecpar->coded_side_data not being available until FFmpeg 6.1

This is awful 😄 . But I think you're doing the right thing!

I'd just add a comment at the top of the function, with a high-level explanation of what's going on. Something like this (please check me on the versions and the API names):

av_stream_get_side_data was deprecated in version FFmpeg 6.0, but its replacement (av_packet_side_data_get + codecpar->coded_side_data) is only available from version 6.1. We need some #pragma magic to silence the deprecation warning which our compile chain would otherwise treat as an error.

(CC @scotts, I'm sure you will enjoy this).

Dan-Flores · 2026-01-21T18:11:09Z

test/test_decoders.py

+            dimension_order=dimension_order,
+        )
+
+        frame = decoder[0]


Lets modify this test to check accuracy for multiple frames.

Also, the nasa_13013 video contains multiple video streams, so we could parametrize over stream_index as well.

Agreed on Daniel's suggestions. Kinda related but: do we actually need to check against FFmpeg? I'm hoping we can get away with just comparing the output of NASA_VIDEO and NASA_VIDEO_ROTATED (where we manually apply the 90 rotation to either, in order to compare them)?
To be clear I do agree that comparing against FFmpeg is the "golden standard" test... but I think we can get away with the simpler approach described above.

On parametrizating over streams: if we're going to check-in both json files for both stream 0 and stream 3, I agree, we might as well parametrize over both. But do we need to? Would a single stream be enough here?

Dan-Flores · 2026-01-21T18:11:58Z

test/test_decoders.py

+            # Frame is (H, W, C)
+            frame_h, frame_w = frame.shape[0], frame.shape[1]
+
+        # Pixel-exact comparison with ffmpeg CLI (autorotate is on by default)


Does autorotate refer to an ffmpeg CLI option? It's not used in the command below, so this comment is a bit confusing.

Dan-Flores · 2026-01-21T18:12:18Z

src/torchcodec/decoders/_video_decoder.py

+        if self._dimension_order == "NCHW":
+            dims = (-2, -1)
+        else:
+            dims = (-3, -2)


Is there a reason we use negative indices here? It works, but I had to think about it for a second.

I was using negative indices to account for batch vs unbatched call but it might not be applicable after moving the logic to C++

Dan-Flores · 2026-01-21T18:12:28Z

src/torchcodec/decoders/_video_decoder.py

+        # Precompute rotation parameters for torch.rot90
+        # k is the number of 90-degree counter-clockwise rotations
+        rotation = self.metadata.rotation
+        if rotation is not None and rotation != 0:


Checking rotation != 0 shouldn't matter here, we will set self._rotation_k to 0 either way.

Dan-Flores · 2026-01-21T18:12:41Z

src/torchcodec/_core/Metadata.h

+  // Extracted from the display matrix side data. This indicates how the video
+  // should be rotated for correct display. Common for videos recorded on mobile
+  // devices.
+  std::optional<double> rotation;


The information from this comment is also in _metadata.py, it's probably not necessary here.

Dan-Flores · 2026-01-21T18:12:59Z

src/torchcodec/_core/FFMPEGCommon.cpp

+      AV_PKT_DATA_DISPLAYMATRIX);
+  if (sideData != nullptr && sideData->size >= 9 * sizeof(int32_t)) {
+    displayMatrix = reinterpret_cast<const int32_t*>(sideData->data);
+  }


Why do we verify the size of sideData here to be >= 9 * sizeof(int32_t)? I found one example in FFmpeg where there is only a nullptr check, but its possible I am missing something.

Thanks for the suggestion! I think I was being overly cautious to check that the display matrix has 9 elements.

Dan-Flores · 2026-01-21T18:13:22Z

test/resources/nasa_13013_rotated.mp4

Since we test that rotation is present in the metadata, it makes sense to check in a video.

test_rotation_applied_to_frames only needs a few frames, perhaps we can reduce the size of the video file and associated all_frames_info JSONs?

Dan-Flores · 2026-01-21T18:38:25Z

src/torchcodec/decoders/_video_decoder.py


        frame_data, *_ = core.get_frame_at_index(self._decoder, frame_index=key)
-        return frame_data
+        return self._apply_rotation(frame_data)


A side effect of applying the rotation at the python level is that we have to update each frame retrieval API.

I wonder if this should be handled in C++, similar to maybePermuteHWC2CHW. We would still need to add it to every SingleStreamDecoder::getFrame ... method, but it keeps the complexity in the C++ implementation.
Let me know your thoughts @mollyxu @NicolasHug

Thanks for the suggestion! I think it would make the public API more clean to do something similar to maybePermuteHWC2CHW

I was also considering applying the ffmpeg transpose filter as an option to consolidate the calls in one place. The con of that would be the pixels being moved instead of it just being a view operation

That's a good call-out, I agree that we should try to implement this logic in C++. The main reason is that, while we don't want to have a public C++ decoding API, we should still aim to have a fully-featured C++ decoder. Ideally, we want the python layer to be as slim as possible (with no logic). We can't always do that, but that's what we should aim for.
It makes a lot of sense to me to treat this rotation logic in the same way we treat maybePermuteHWC2CHW.

On the implementation, fully agreed that we should prefer using torch::rot90 (which should still exist in C++) so that we can return a view instead of incurring a copy through filtergraph.

NicolasHug · 2026-01-22T11:47:17Z

test/test_decoders.py

+            dimension_order=dimension_order,
+        )
+
+        frame = decoder[0]


Agreed on Daniel's suggestions. Kinda related but: do we actually need to check against FFmpeg? I'm hoping we can get away with just comparing the output of NASA_VIDEO and NASA_VIDEO_ROTATED (where we manually apply the 90 rotation to either, in order to compare them)?
To be clear I do agree that comparing against FFmpeg is the "golden standard" test... but I think we can get away with the simpler approach described above.

On parametrizating over streams: if we're going to check-in both json files for both stream 0 and stream 3, I agree, we might as well parametrize over both. But do we need to? Would a single stream be enough here?

NicolasHug · 2026-01-22T11:47:37Z

test/test_decoders.py

+
+        Verifies pixel-exact match with ffmpeg CLI.
+        """
+        import subprocess


If we keep the test against FFmpeg, this should be a top-level import.

NicolasHug · 2026-01-22T11:51:31Z

src/torchcodec/_core/_metadata.py

+    display. Common values are 0, 90, 180, or -90, -180 degrees. Videos recorded on
+    mobile devices often have rotation metadata. TorchCodec automatically
+    applies this rotation during decoding, so the returned frames are
+    in the correct orientation (float or None)."""


Great comment! We should indicate that we're rounding to the closest multiple of 90.

Also: The height and width that we report, are they pre-rotation or post-rotation? I think we'll want them to be post-rotation, since this is what we'll return from our decoding methods? No matter what we choose, we should make sure to document it!

NicolasHug · 2026-01-22T11:54:48Z

src/torchcodec/decoders/_video_decoder.py


        frame_data, *_ = core.get_frame_at_index(self._decoder, frame_index=key)
-        return frame_data
+        return self._apply_rotation(frame_data)


That's a good call-out, I agree that we should try to implement this logic in C++. The main reason is that, while we don't want to have a public C++ decoding API, we should still aim to have a fully-featured C++ decoder. Ideally, we want the python layer to be as slim as possible (with no logic). We can't always do that, but that's what we should aim for.
It makes a lot of sense to me to treat this rotation logic in the same way we treat maybePermuteHWC2CHW.

On the implementation, fully agreed that we should prefer using torch::rot90 (which should still exist in C++) so that we can return a view instead of incurring a copy through filtergraph.

rotation metadata support

f7d5764

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 16, 2026

Molly Xu added 9 commits January 16, 2026 13:42

fix

4889b6f

fix

6d7d0af

fix

40dd216

fix

996518c

fix ffmpeg version issues

80fb257

update test

b19165f

fix

8ae38ff

test

a4cde86

ffmpeg versions

b3077d2

mollyxu changed the title ~~Add rotation metadata support~~ Rotation metadata support Jan 20, 2026

mollyxu commented Jan 20, 2026

View reviewed changes

mollyxu commented Jan 21, 2026

View reviewed changes

mollyxu marked this pull request as ready for review January 21, 2026 04:11

Dan-Flores reviewed Jan 21, 2026

View reviewed changes

NicolasHug reviewed Jan 22, 2026

View reviewed changes

addressing comments

505470f

Rotation metadata support #1173

Are you sure you want to change the base?

Rotation metadata support #1173

Uh oh!

Conversation

mollyxu commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add rotation metadata support

The rotation property

Implementation

Test coverage

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mollyxu commented Jan 16, 2026 •

edited

Loading

The `rotation` property