[ContentUnderstanding] Add ContentRange samples for document, video, and audio.#45679
[ContentUnderstanding] Add ContentRange samples for document, video, and audio.#45679changjian-wang wants to merge 12 commits intomainfrom
Conversation
- Implemented ContentRange functionality in sample scripts for analyzing binary documents and URLs. - Added examples for analyzing specific pages and combined page ranges in `sample_analyze_binary.py`. - Enhanced `sample_analyze_url.py` with ContentRange examples for documents, videos, and audio, including time-based ranges. - Created unit tests for ContentRange functionality, covering various scenarios and edge cases. - Updated existing tests to validate ContentRange behavior in document and media analysis.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new ContentRange value type for the Azure AI Content Understanding SDK, enabling users to specify content ranges (document pages or audio/video time intervals) when analyzing content. The class provides factory methods for constructing ranges and is integrated into the begin_analyze_binary API.
Changes:
- Added
ContentRangeclass with factory methods (page,pages,pages_from,time_range,time_range_from,combine) and exported it in the models namespace. - Updated
begin_analyze_binary(sync and async) to acceptContentRangeobjects in addition to raw strings for thecontent_rangeparameter. - Added comprehensive sample code and tests demonstrating
ContentRangeusage for document, video, and audio analysis scenarios.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
models/_content_range.py |
New ContentRange class with factory methods, equality, hashing, and string conversion |
models/_patch.py |
Exports ContentRange in __all__ |
_patch.py |
Updates sync begin_analyze_binary to accept ContentRange and convert to string |
aio/_patch.py |
Updates async begin_analyze_binary to accept ContentRange and convert to string |
tests/test_content_range.py |
Unit tests for ContentRange construction, validation, equality, and integration with AnalysisInput |
tests/samples/test_sample_analyze_url.py |
Integration tests for ContentRange with document, video, and audio URL analysis |
tests/samples/test_sample_analyze_binary.py |
Integration tests for ContentRange with binary document analysis |
samples/sample_analyze_url.py |
Sync sample showing ContentRange usage for URL-based analysis |
samples/async_samples/sample_analyze_url_async.py |
Async sample showing ContentRange usage for URL-based analysis |
samples/sample_analyze_binary.py |
Sync sample showing ContentRange usage for binary analysis |
samples/async_samples/sample_analyze_binary_async.py |
Async sample showing ContentRange usage for async binary analysis |
assets.json |
Updated asset tag for new test recordings |
...ntentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary.py
Outdated
Show resolved
Hide resolved
...tanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_content_range.py
Outdated
Show resolved
Hide resolved
...tanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_content_range.py
Outdated
Show resolved
Hide resolved
...tanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_content_range.py
Outdated
Show resolved
Hide resolved
…validations in audio and video segments
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@copilot |
|
@changjian-wang I've opened a new pull request, #45788, to work on those changes. Once the pull request is ready, I'll request review from you. |
…s for content range (#45788) * Initial plan * Update samples and tests to pass content range as plain strings directly Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com> * Remove ContentRange class and use plain strings for content range Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com> * Remove duplicate raw content range sample sections Now that ContentRange class is removed and all content ranges are plain strings, the [START *_with_raw_content_range] sections duplicated values already shown in the typed content range sections (e.g. "1-3,5,9-" appeared twice in binary samples, "0-5000" twice in video URL samples). Removed the redundant sections from all four sample files. Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com>
|
@copilot check |
|
@changjian-wang I've opened a new pull request, #45791, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@copilot analyze audio with contentrange need implement and follow this:
|
|
@changjian-wang I've opened a new pull request, #45792, to work on those changes. Once the pull request is ready, I'll request review from you. |
…cenarios (#45792) * Initial plan * Update audio content range samples/tests to match video scenarios: 0-5000, 10000-, 1200-3651, 0-3000,30000- Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com>
|
@copilot The assersion in file |
|
@changjian-wang I've opened a new pull request, #45793, to work on those changes. Once the pull request is ready, I'll request review from you. |
…e tests (#45793) * Initial plan * Add precise page number assertions to test_sample_analyze_binary_with_content_range Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: changjian-wang <15209050+changjian-wang@users.noreply.github.com>
This pull request introduces a new
ContentRangevalue type for specifying content ranges in analysis requests, and updates both the SDK and sample code to support and demonstrate its usage. The changes improve flexibility for users to restrict analysis to specific document pages or time ranges in audio/video content, and update documentation and sample code to showcase these features.ContentRange value type introduction and integration
ContentRangeclass inmodels/_content_range.py, providing methods to construct ranges for document pages and audio/video time intervals, combine multiple ranges, and convert to string. This enables precise specification of content segments to analyze.ContentRangeinto the SDK: updatedbegin_analyze_binaryand its async variant to accept either a string or aContentRangeobject for thecontent_rangeparameter, converting it to string as needed. Documentation for these methods was updated to reflect the new parameter type and usage. [1] [2] [3] [4] [5] [6]ContentRangeinmodels/_patch.pyfor public access. [1] [2]Sample code updates
sample_analyze_binary_async.pyto demonstrate analyzing specific pages and combined page ranges using the newContentRangeclass, including example output for these scenarios. [1] [2]sample_analyze_url_async.pyto show how to restrict analysis to a single page usingContentRange, and changed the document URL to a more complex sample. Also importedtimedeltafor potential time range usage. [1] [2] [3] [4]Asset metadata update
assets.jsonto reflect the new version, ensuring asset tracking aligns with these changes.