Skip to content

psMDHamdan/Autonomous-Bioinformatics-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autonomous Bioinformatics Agent

An AI-powered pipeline that takes a plain-English research question, discovers relevant NCBI GEO datasets, downloads and preprocesses them, identifies top expressed genes, and produces a Markdown report with Gemini-generated biological interpretation.

🚀 Features

  • Intelligent Planning: Gemini decomposes your prompt into a logical task sequence.
  • Smart Discovery: Uses Gemini to refine natural language questions into precise NCBI GEO search queries.
  • Automated Workflow:
    • Fetches and decompresses GEO series matrix files.
    • Cleans and normalizes expression data (log2 transformation).
    • Ranks genes by expression levels across samples.
  • Biological Interpretation: Gemini explains the scientific significance of the top genes found.
  • Structured Reporting: Generates comprehensive Markdown reports in the reports/ directory.

🛠️ Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/autonomous-bioinformatics-agent.git
    cd autonomous-bioinformatics-agent
  2. Set up a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set your Gemini API Key: The agent requires a Google Gemini API key. You can get one from Google AI Studio.

    export GEMINI_API_KEY="your_api_key_here"

📈 Usage

You can run the agent by passing your research question as a command-line argument:

python main.py "Which genes are most highly expressed in breast cancer GSE2034?"

Or run it interactively by just calling the script:

python main.py

🏗️ Project Structure

.
├── agents/             # AI Agents for planning, discovery, and interpretation
├── tools/              # Core bioinformatics tools for analysis and reporting
├── data/               # Local storage for raw and processed datasets (ignored by git)
├── reports/            # Generated Markdown reports (ignored by git)
├── tests/              # Pytest suite for code verification
├── config.py           # Global settings and model configuration
├── main.py             # Main entry point
└── requirements.txt    # Project dependencies

⚙️ Configuration

Defaults can be adjusted in config.py:

  • GEMINI_MODEL: The model version used (defaults to models/gemini-flash-latest).
  • TOP_GENE_COUNT: Number of top-ranked genes to include in reports.
  • MAX_DOWNLOAD_RETRIES: Number of attempts for fetching remote datasets.

🧪 Testing

Run the test suite to ensure everything is configured correctly:

pytest tests/

⚖️ License

This project is open-source. See the LICENSE for details (if available).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages