AgentOpera

AgentOpera is a system designed to act as an Operating System (OS) Controlling Agent, leveraging AI models and heuristic task management techniques. The project integrates OmniParser, a vision-based GUI parsing tool, to enable seamless interaction with graphical user interfaces (GUIs) and execute tasks autonomously. AgentOpera is capable of interpreting user prompts, generating actionable plans, and performing operations such as mouse clicks, keyboard inputs, and file management, making it a versatile tool for automating complex workflows.

Demo Video

How to Use

Clone the Repository:
Ensure you have the project files on your local machine.
Navigate to the Project Directory:
Open a terminal and run:
```
cd AgentOpera
```
Create a Virtual Environment: Set up a Python virtual environment to manage dependencies:
```
python3 -m venv venv
```
Activate the Virtual Environment:

On macOS/Linux:
```
source venv/bin/activate
```
On Windows:
```
venv\Scripts\activate
```
Install Dependencies: Install the required Python packages:
```
pip install -r requirements.txt
```
Run the Application: Start the PyQt-based GUI application:
```
python3 main_ui.py
```

Key Features

OmniParser Integration:
Syshtum utilizes OmniParser, a screen parsing tool, to analyze GUI screenshots and extract structured data. This data is used to ground AI-generated actions in specific regions of the interface, enabling precise and context-aware task execution. OmniParser supports multiple models, which can be orchestrated for enhanced performance.
OS Control Capabilities:
The system can perform a variety of OS-level operations, such as:
- Simulating keyboard inputs and mouse clicks.
- Navigating GUIs based on parsed screen data.
- Automating repetitive tasks with minimal user intervention.
Modular Design:
The project is organized into modular components, making it easy to extend and customize. Key modules include operate, which handles OS-level operations, and OmniParser, which focuses on GUI parsing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
OmniParser		OmniParser
operate		operate
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cyclo.py		cyclo.py
evaluate.py		evaluate.py
main_ui.py		main_ui.py
requirements-audio.txt		requirements-audio.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AgentOpera

Demo Video

How to Use

Key Features

About

Uh oh!

Releases

Packages

Languages

License

JayDoshi2406/Opera

Folders and files

Latest commit

History

Repository files navigation

AgentOpera

Demo Video

How to Use

Key Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages