-
Notifications
You must be signed in to change notification settings - Fork 416
feat(tool): Add video multiModal tools #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
guanxuc
wants to merge
6
commits into
agentscope-ai:main
Choose a base branch
from
guanxuc:multiModal-video-tool
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,162
−76
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
91eb485
feat(tool): Add video multiModal tools
guanxuc 7fa8b12
test: add await timeout for windows ci
guanxuc 09044fa
docs: Update java doc
guanxuc 36ef49f
style: Update docs
guanxuc f983556
chore: Compress video
guanxuc 863459e
style: spotless apply
guanxuc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
683 changes: 680 additions & 3 deletions
683
...tscope-core/src/main/java/io/agentscope/core/tool/multimodal/DashScopeMultiModalTool.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
348 changes: 330 additions & 18 deletions
348
...core/src/test/java/io/agentscope/core/tool/multimodal/DashScopeMultiModalToolE2ETest.java
Large diffs are not rendered by default.
Oops, something went wrong.
996 changes: 942 additions & 54 deletions
996
...pe-core/src/test/java/io/agentscope/core/tool/multimodal/DashScopeMultiModalToolTest.java
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
197 changes: 197 additions & 0 deletions
197
...les/quickstart/src/main/java/io/agentscope/examples/quickstart/MultiModalToolExample.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,197 @@ | ||
| /* | ||
| * Copyright 2024-2026 the original author or authors. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package io.agentscope.examples.quickstart; | ||
|
|
||
| import io.agentscope.core.ReActAgent; | ||
| import io.agentscope.core.formatter.dashscope.DashScopeChatFormatter; | ||
| import io.agentscope.core.hook.Hook; | ||
| import io.agentscope.core.hook.HookEvent; | ||
| import io.agentscope.core.hook.PostActingEvent; | ||
| import io.agentscope.core.hook.PreActingEvent; | ||
| import io.agentscope.core.memory.InMemoryMemory; | ||
| import io.agentscope.core.message.AudioBlock; | ||
| import io.agentscope.core.message.Base64Source; | ||
| import io.agentscope.core.message.ContentBlock; | ||
| import io.agentscope.core.message.ImageBlock; | ||
| import io.agentscope.core.message.Source; | ||
| import io.agentscope.core.message.TextBlock; | ||
| import io.agentscope.core.message.ToolResultBlock; | ||
| import io.agentscope.core.message.URLSource; | ||
| import io.agentscope.core.message.VideoBlock; | ||
| import io.agentscope.core.model.DashScopeChatModel; | ||
| import io.agentscope.core.tool.Toolkit; | ||
| import io.agentscope.core.tool.multimodal.DashScopeMultiModalTool; | ||
| import java.util.List; | ||
| import reactor.core.publisher.Mono; | ||
|
|
||
| /** | ||
| * MultiModalToolExample - Demonstrates how to equip an Agent with multimodal tools. | ||
| */ | ||
| public class MultiModalToolExample { | ||
|
|
||
| public static void main(String[] args) throws Exception { | ||
| // Print welcome message | ||
| ExampleUtils.printWelcome( | ||
| "MultiModal Tool Calling Example", | ||
| "This example demonstrates how to equip an Agent with multimodal tools.\n" | ||
| + "The agent has image, audio and video multimodal tools."); | ||
|
|
||
| // Get API key | ||
| String apiKey = ExampleUtils.getDashScopeApiKey(); | ||
|
|
||
| // Create and register tools | ||
| Toolkit toolkit = new Toolkit(); | ||
| toolkit.registerTool(new DashScopeMultiModalTool(apiKey)); | ||
| printRegisterTools(); | ||
|
|
||
| // Create Agent with tools | ||
| ReActAgent agent = | ||
| ReActAgent.builder() | ||
| .name("MultiModalToolAgent") | ||
| .sysPrompt( | ||
| "You are a helpful assistant with access to multimodal" | ||
| + " tools. Use tools when needed to answer questions" | ||
| + " accurately. Always explain what you're doing when using" | ||
| + " tools.") | ||
| .model( | ||
| DashScopeChatModel.builder() | ||
| .apiKey(apiKey) | ||
| .modelName("qwen-plus") | ||
| .stream(true) | ||
| .enableThinking(false) | ||
| .formatter(new DashScopeChatFormatter()) | ||
| .build()) | ||
| .hook(new ToolCallLoggingHook()) | ||
| .toolkit(toolkit) | ||
| .memory(new InMemoryMemory()) | ||
| .build(); | ||
|
|
||
| printExamplePrompts(); | ||
|
|
||
| ExampleUtils.startChat(agent); | ||
| } | ||
|
|
||
| private static void printRegisterTools() { | ||
| String registeredTools = | ||
| """ | ||
| Registered tools: | ||
| - dashscope_text_to_image: Generate image(s) based on the given text. | ||
| - dashscope_image_to_text: Generate text based on the given images. | ||
| - dashscope_text_to_audio: Convert the given text to audio. | ||
| - dashscope_audio_to_text: Convert the given audio to text. | ||
| - dashscope_text_to_video: Generate video based on the given text prompt. | ||
| - dashscope_image_to_video: Generate a video from a single input image and an optional text prompt. | ||
| - dashscope_first_and_last_frame_image_to_video: Generate video transitioning from a first frame to a last frame and an optional text prompt. | ||
| - dashscope_video_to_text: Analyze video and generate a text description or answer questions based on the video content. | ||
| """; | ||
|
|
||
| System.out.println(registeredTools); | ||
| System.out.println("\n"); | ||
| } | ||
|
|
||
| private static void printExamplePrompts() { | ||
| String examplePrompts = | ||
| """ | ||
| Example Prompts: | ||
| [dashscope_text_to_image]: | ||
| Generate a black dog image url. | ||
| [dashscope_image_to_text]: | ||
| Describe the image url of 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png'. | ||
| [dashscope_text_to_audio]: | ||
| Convert the texts of 'hello, qwen!' to audio url. | ||
| [dashscope_audio_to_text]: | ||
| Convert the audio url of 'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav' to text. | ||
| [dashscope_text_to_video]: | ||
| Generate a smart cat is running in the moonlight video. | ||
| [dashscope_image_to_video]: | ||
| Generate a video that a tiger is running in moonlight based on the image url of 'https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png'. | ||
| [dashscope_first_and_last_frame_image_to_video]: | ||
| Generate a video that a black kitten curiously looking at the sky based on the first frame image url of 'https://wanx.alicdn.com/material/20250318/first_frame.png' and the last frame image url of 'https://wanx.alicdn.com/material/20250318/last_frame.png'. | ||
| [dashscope_video_to_text]: | ||
| Describe the video url of 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4'. | ||
| """; | ||
| System.out.println(examplePrompts); | ||
| System.out.println("\n"); | ||
| } | ||
|
|
||
| static class ToolCallLoggingHook implements Hook { | ||
|
|
||
| @Override | ||
| public <T extends HookEvent> Mono<T> onEvent(T event) { | ||
| if (event instanceof PreActingEvent preActing) { | ||
| System.out.println( | ||
| "\n[HOOK] PreActingEvent - Tool: " | ||
| + preActing.getToolUse().getName() | ||
| + ", Input: " | ||
| + preActing.getToolUse().getInput()); | ||
|
|
||
| } else if (event instanceof PostActingEvent postActingEvent) { | ||
| ToolResultBlock toolResult = postActingEvent.getToolResult(); | ||
| List<ContentBlock> contentBlocks = toolResult.getOutput(); | ||
| if (contentBlocks != null && !contentBlocks.isEmpty()) { | ||
| for (ContentBlock cb : contentBlocks) { | ||
| if (cb instanceof ImageBlock ib) { | ||
| Source source = ib.getSource(); | ||
| if (source instanceof URLSource urlSource) { | ||
| System.out.println( | ||
| "\n[HOOK] PostActingEvent - Tool Result: \nImage URL: " | ||
| + urlSource.getUrl()); | ||
| } else if (source instanceof Base64Source base64Source) { | ||
| System.out.println( | ||
| "\n" | ||
| + "[HOOK] PostActingEvent - Tool Result: \n" | ||
| + "Image Base64 data: " | ||
| + base64Source.getData()); | ||
| } | ||
| } else if (cb instanceof AudioBlock ab) { | ||
| Source source = ab.getSource(); | ||
| if (source instanceof URLSource urlSource) { | ||
| System.out.println( | ||
| "\n[HOOK] PostActingEvent - Tool Result: \nAudio URL: " | ||
| + urlSource.getUrl()); | ||
| } else if (source instanceof Base64Source base64Source) { | ||
| System.out.println( | ||
| "\n" | ||
| + "[HOOK] PostActingEvent - Tool Result: \n" | ||
| + "Audio Base64 data: " | ||
| + base64Source.getData()); | ||
| } | ||
| } else if (cb instanceof VideoBlock vb) { | ||
| Source source = vb.getSource(); | ||
| if (source instanceof URLSource urlSource) { | ||
| System.out.println( | ||
| "\n[HOOK] PostActingEvent - Tool Result: \nVideo URL: " | ||
| + urlSource.getUrl()); | ||
| } else if (source instanceof Base64Source base64Source) { | ||
| System.out.println( | ||
| "\n" | ||
| + "[HOOK] PostActingEvent - Tool Result: \n" | ||
| + "Video Base64 data: " | ||
| + base64Source.getData()); | ||
guanxuc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| } else if (cb instanceof TextBlock tb) { | ||
| System.out.println( | ||
| "\n[HOOK] PostActingEvent - Tool Result: \nText: " | ||
| + tb.getText()); | ||
| } | ||
| } | ||
| System.out.println("\n"); | ||
| } | ||
| } | ||
| return Mono.just(event); | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.