Autonomous Bioinformatics Agent

An AI-powered pipeline that takes a plain-English research question, discovers relevant NCBI GEO datasets, downloads and preprocesses them, identifies top expressed genes, and produces a Markdown report with Gemini-generated biological interpretation.

🚀 Features

Intelligent Planning: Gemini decomposes your prompt into a logical task sequence.
Smart Discovery: Uses Gemini to refine natural language questions into precise NCBI GEO search queries.
Automated Workflow:
- Fetches and decompresses GEO series matrix files.
- Cleans and normalizes expression data (log2 transformation).
- Ranks genes by expression levels across samples.
Biological Interpretation: Gemini explains the scientific significance of the top genes found.
Structured Reporting: Generates comprehensive Markdown reports in the reports/ directory.

🛠️ Installation

Clone the repository:

git clone https://github.com/yourusername/autonomous-bioinformatics-agent.git
cd autonomous-bioinformatics-agent

Set up a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Set your Gemini API Key: The agent requires a Google Gemini API key. You can get one from Google AI Studio.
```
export GEMINI_API_KEY="your_api_key_here"
```

📈 Usage

You can run the agent by passing your research question as a command-line argument:

python main.py "Which genes are most highly expressed in breast cancer GSE2034?"

Or run it interactively by just calling the script:

python main.py

🏗️ Project Structure

.
├── agents/             # AI Agents for planning, discovery, and interpretation
├── tools/              # Core bioinformatics tools for analysis and reporting
├── data/               # Local storage for raw and processed datasets (ignored by git)
├── reports/            # Generated Markdown reports (ignored by git)
├── tests/              # Pytest suite for code verification
├── config.py           # Global settings and model configuration
├── main.py             # Main entry point
└── requirements.txt    # Project dependencies

⚙️ Configuration

Defaults can be adjusted in config.py:

GEMINI_MODEL: The model version used (defaults to models/gemini-flash-latest).
TOP_GENE_COUNT: Number of top-ranked genes to include in reports.
MAX_DOWNLOAD_RETRIES: Number of attempts for fetching remote datasets.

🧪 Testing

Run the test suite to ensure everything is configured correctly:

pytest tests/

⚖️ License

This project is open-source. See the LICENSE for details (if available).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Bioinformatics Agent

🚀 Features

🛠️ Installation

📈 Usage

🏗️ Project Structure

⚙️ Configuration

🧪 Testing

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Autonomous Bioinformatics Agent

🚀 Features

🛠️ Installation

📈 Usage

🏗️ Project Structure

⚙️ Configuration

🧪 Testing

⚖️ License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages