[TRTLLM-12533][refactor] Move Media IO modality loading into MediaIO Interfaces#14010
[TRTLLM-12533][refactor] Move Media IO modality loading into MediaIO Interfaces#14010aswinvisva wants to merge 1 commit into
Conversation
…interfaces Signed-off-by: Aswin Visva <31215515+aswinvisva@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR refactors the media I/O system from a kwargs-resolution function to a generic ChangesMedia I/O Refactoring
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tensorrt_llm/serve/chat_utils.py (1)
94-105: 💤 Low valueConsider using
MediaModalityLiteral type for themodalityparameter.The
modalityparameter accepts any string, butMEDIA_IO_REGISTRYonly contains specific keys ("image", "video", "audio"). Using theMediaModalityLiteral type (already defined intensorrt_llm.inputs.media_io) would provide compile-time type safety.🔧 Proposed fix
+from tensorrt_llm.inputs.media_io import MEDIA_IO_REGISTRY, BaseMediaIO, MediaModality def _make_media_io( - modality: str, + modality: MediaModality, server_config: Optional[MultimodalServerConfig], request_kwargs: Optional[Dict[str, Dict[str, Any]]], ) -> BaseMediaIO:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/serve/chat_utils.py` around lines 94 - 105, The _make_media_io function currently types modality as str but should use the narrower MediaModality literal type to match MEDIA_IO_REGISTRY keys; update the signature of _make_media_io to accept modality: MediaModality, import MediaModality from tensorrt_llm.inputs.media_io, and adjust any callers if needed to satisfy the stronger type (or add explicit casts where caller values are validated) so the registry lookup and create call remain unchanged.tensorrt_llm/inputs/media_io.py (2)
124-128: 💤 Low valueConsider renaming
formatparameter to avoid shadowing the builtin.The
formatparameter shadows Python's built-informat()function. Consider renaming toimage_format,output_format, ordata_formatfor clarity and to avoid the shadowing.♻️ Proposed fix
- def __init__(self, format: str = "pt", device: str = "cpu", **kwargs: Any) -> None: - assert format in ["pt", "pil"], "format must be either Pytorch or PIL" - super().__init__(format=format, device=device, **kwargs) - self.format = format + def __init__(self, output_format: str = "pt", device: str = "cpu", **kwargs: Any) -> None: + assert output_format in ["pt", "pil"], "format must be either Pytorch or PIL" + super().__init__(format=output_format, device=device, **kwargs) + self.format = output_format self.device = deviceAs per coding guidelines, avoid shadowing builtins in Python code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/inputs/media_io.py` around lines 124 - 128, The constructor parameter name format shadows Python's built-in format()—rename it (e.g., image_format) in the __init__ signature, update the assertion to check image_format in ["pt","pil"], pass image_format as the keyword to super().__init__(...) instead of format, and set self.image_format (or chosen name) instead of self.format; also update any other references to self.format or the parameter elsewhere in the class/module to the new name to avoid the builtin shadowing.
186-208: 💤 Low valueConsider renaming
formatparameter to avoid shadowing the builtin.The
formatparameter shadows Python's built-informat()function. Consider renaming tovideo_format,output_format, ordata_format.♻️ Proposed fix
def __init__( self, num_frames: int = 10, fps: int = 30, - format: str = "pt", + output_format: str = "pt", device: str = "cpu", extract_audio: bool = False, **kwargs: Any, ) -> None: - assert format in ["pt", "pil"], "format must be either Pytorch or PIL" + assert output_format in ["pt", "pil"], "format must be either Pytorch or PIL" super().__init__( num_frames=num_frames, fps=fps, - format=format, + format=output_format, device=device, extract_audio=extract_audio, **kwargs, ) self.num_frames = num_frames self.fps = fps - self.format = format + self.format = output_format self.device = device self.extract_audio = extract_audioAs per coding guidelines, avoid shadowing builtins in Python code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/inputs/media_io.py` around lines 186 - 208, The constructor currently uses the parameter name `format`, shadowing Python's built-in `format()`; rename this parameter to something like `video_format` (or `output_format`) in the __init__ signature, update the assert to check `video_format in ["pt","pil"]`, pass `video_format=video_format` to the super().__init__ call (replacing the old format=...), and update the instance attribute assignment (self.video_format = video_format) and any internal uses of `self.format` to `self.video_format`; also update any callers/tests that construct this class to use the new parameter name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tensorrt_llm/inputs/media_io.py`:
- Around line 124-128: The constructor parameter name format shadows Python's
built-in format()—rename it (e.g., image_format) in the __init__ signature,
update the assertion to check image_format in ["pt","pil"], pass image_format as
the keyword to super().__init__(...) instead of format, and set
self.image_format (or chosen name) instead of self.format; also update any other
references to self.format or the parameter elsewhere in the class/module to the
new name to avoid the builtin shadowing.
- Around line 186-208: The constructor currently uses the parameter name
`format`, shadowing Python's built-in `format()`; rename this parameter to
something like `video_format` (or `output_format`) in the __init__ signature,
update the assert to check `video_format in ["pt","pil"]`, pass
`video_format=video_format` to the super().__init__ call (replacing the old
format=...), and update the instance attribute assignment (self.video_format =
video_format) and any internal uses of `self.format` to `self.video_format`;
also update any callers/tests that construct this class to use the new parameter
name.
In `@tensorrt_llm/serve/chat_utils.py`:
- Around line 94-105: The _make_media_io function currently types modality as
str but should use the narrower MediaModality literal type to match
MEDIA_IO_REGISTRY keys; update the signature of _make_media_io to accept
modality: MediaModality, import MediaModality from tensorrt_llm.inputs.media_io,
and adjust any callers if needed to satisfy the stronger type (or add explicit
casts where caller values are validated) so the registry lookup and create call
remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0670a46b-59a9-41e4-b7d5-fc1a3e8bcf9b
📒 Files selected for processing (4)
tensorrt_llm/inputs/media_io.pytensorrt_llm/inputs/utils.pytensorrt_llm/serve/chat_utils.pytests/unittest/llmapi/apps/test_chat_utils.py
|
|
||
| from tensorrt_llm.inputs.utils import ( | ||
| VideoData, | ||
| _get_aiohttp_session, |
There was a problem hiding this comment.
Do you think these should moved in here? Conceptually, it makes more sense, since media_io is a more descriptive module name for media loading code than utils.
It also means we don't have to import protected functions.
| import soundfile | ||
| from PIL import Image | ||
|
|
||
| from tensorrt_llm.inputs.utils import ( |
There was a problem hiding this comment.
#13800 moved this to multimodal_data instead of utils.
|
|
||
| Receives the original URL string (not a parsed path) so that | ||
| subclasses can match the modality-specific normalization done by the | ||
| legacy free functions. |
There was a problem hiding this comment.
Nit: let's not mention these "legacy free functions".
| """ | ||
| ... | ||
|
|
||
| async def async_load(self, url: str) -> _MediaT: |
| else: | ||
| raise ValueError(f"Unsupported URL scheme: {parsed.scheme!r}") | ||
|
|
||
| async def load_base64_async(self, media_type: str, data: str) -> _MediaT: |
There was a problem hiding this comment.
Is this necessary at the interface level, or can callers take care of this?
| super().__init__(**kwargs) | ||
|
|
||
| async def async_load(self, url: str) -> Tuple[np.ndarray, int]: | ||
| # Override to match `async_load_audio`, which does not accept `data:` |
There was a problem hiding this comment.
Sorry, I don't understand this part - why do we need to overload this?
| extract_audio=extract_audio, | ||
| **kwargs, | ||
| ) | ||
| self.num_frames = num_frames |
There was a problem hiding this comment.
Similar question here re whether these need to be public.
| """ | ||
|
|
||
| def __init__(self, **kwargs: Any) -> None: | ||
| self._kwargs = kwargs |
There was a problem hiding this comment.
It seems the only real use for this is to log debug statements in chat_utils.py - could the logging be moved to create instead? It can include the cls.__name__ in the debug string.
| media_io_cls = MEDIA_IO_REGISTRY.get(modality, BaseMediaIO) | ||
| return media_io_cls.merge_kwargs(default_kwargs, runtime_kwargs) | ||
| return media_io_cls.create( | ||
| (server_kwargs or {}).get(modality), |
There was a problem hiding this comment.
Nit: could just instantiate server_kwargs = ... else {} at line 100.
| "tensorrt_llm.serve.chat_utils.async_load_image", return_value=MagicMock() | ||
| ) as mock_loader: | ||
|
|
||
| from tensorrt_llm.serve import chat_utils as _chat_utils |
There was a problem hiding this comment.
Nit: move to module-level?
Summary by CodeRabbit
Release Notes
Description
This PR created the
MediaIOinterface for video, image and audio. It added just the modality-specific logic for merging media IO kwargs at the request level and the server level. This PR moves some more modality-specific logic for loading IO data (URLs, data, local file) and postprocessing the IO data into theMediaIOinterface.Test Coverage
Tests added to the test_chat_utils.py which covers IO loading, updated the tests to load with the new MediaIO functions.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.