Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
platform:
- ubuntu-latest
- macos-latest
- windows-latest
# - windows-latest
runs-on: ${{ matrix.platform }}
name: Python ${{ matrix.python }}, ${{ matrix.platform }}
steps:
Expand Down
6 changes: 2 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Changelog

## Version 0.1 (development)
## Version 0.0.1

- Feature A added
- FIX: nasty bug #1729 fixed
- add your changes here!
- Initial implementation to access EnsemblDb sqlite files from AnnotationHub.
105 changes: 101 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
[![PyPI-Server](https://img.shields.io/pypi/v/ensembldb.svg)](https://pypi.org/project/ensembldb/)
![Unit tests](https://github.com/YOUR_ORG_OR_USERNAME/ensembldb/actions/workflows/run-tests.yml/badge.svg)
![Unit tests](https://github.com/BiocPy/ensembldb/actions/workflows/run-tests.yml/badge.svg)

# ensembldb
# EnsemblDb

> Access EnsemblDB objects
**EnsemblDb** provides a Python interface to **Ensembl Annotation Databases (EnsDb)**. It mirrors the functionality of the Bioconductor `ensembldb` package, allowing users to efficiently query gene, transcript, and exon annotations from SQLite-based annotation files.

A longer description of your project goes here...
This package is part of the **BiocPy** ecosystem and integrates seamlessly with [GenomicRanges](https://github.com/biocpy/genomicranges).

## Install

Expand All @@ -15,6 +15,103 @@ To get started, install the package from [PyPI](https://pypi.org/project/ensembl
pip install ensembldb
```

## Usage

### 1. Connecting to an EnsDb

You can manage and download standard Ensembl databases via the registry (backed by AnnotationHub).

```py
from ensembldb import EnsDbRegistry

# Initialize the registry
registry = EnsDbRegistry()

# List available databases
available = registry.list_ensdbs()
print(available[:5])
# ['AH53211', 'AH53212', ...]

# Load a specific database (e.g., Larimichthys crocea)
# This automatically downloads and caches the SQLite file
db = registry.load_db("AH113677")

# View metadata
print(db.metadata)
```

### 2. Retrieving Genomic Features

EnsemblDb allows you to extract features as GenomicRanges objects.

#### Fetch Genes

```py
genes = db.genes()
print(genes)
# GenomicRanges with 23958 ranges and 3 metadata columns
# seqnames ranges strand gene_id gene_name gene_biotype
# <str> <IRanges> <ndarray[int8]> <list> <list> <list>
# ENSLCRG00005000002 MT 1 - 69 + | ENSLCRG00005000002 Mt_tRNA
# ENSLCRG00005000003 MT 70 - 1016 + | ENSLCRG00005000003 Mt_rRNA
# ENSLCRG00005000004 MT 1017 - 1087 + | ENSLCRG00005000004 Mt_tRNA
# ... ... ... | ... ... ...
# ENSLCRG00005023957 VI 22289079 - 22304889 - | ENSLCRG00005023957 FILIP1 protein_coding
# ENSLCRG00005023958 VI 22328118 - 22347657 + | ENSLCRG00005023958 SENP6 protein_coding
# ENSLCRG00005023959 VI 22351962 - 22451867 + | ENSLCRG00005023959 myo6a protein_coding
# ------
# seqinfo(496 sequences): I II III ... XXII XXIII XXIV
```

#### Fetch Transcripts and Exons

```py
transcripts = db.transcripts()
print(transcripts)

exons = db.exons()
print(exons)
```

### 3. Filtering

You can filter results using a dictionary passed to the filter argument. Keys should match column names in the database (e.g., gene_id, gene_name, tx_biotype).

#### Filter by Gene Name

```py
# Get coordinates for a specific gene
senp6 = db.genes(filter={"gene_name": "SENP6"})
print(senp6)
```

#### Filter by ID list

```py
# Get transcripts for a list of gene IDs
ids = ["ENSLCRG00005023958", "ENSLCRG00005000003"]
txs = db.transcripts(filter={"gene_id": ids})
print(txs)
```

#### Filter Exons by Transcript ID:

```py
# Get all exons associated with a specific transcript
tx_exons = db.exons(filter={"tx_id": "ENSLCRT00005000003"})
print(tx_exons)
```

### 4. Direct SQL Access

If you need more complex queries not covered by the standard methods, you can execute SQL directly against the underlying database.

```py
# Get a BiocFrame from a raw SQL query
df = db._query_as_biocframe("SELECT * FROM gene LIMIT 5")
print(df)
```

<!-- biocsetup-notes -->

## Note
Expand Down
18 changes: 8 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
# ensembldb
# EnsemblDb

Access EnsemblDB objects
**EnsemblDb** provides a Python interface to **Ensembl Annotation Databases (EnsDb)**. It mirrors the functionality of the Bioconductor `ensembldb` package, allowing users to efficiently query gene, transcript, and exon annotations from SQLite-based annotation files.

This package is part of the **BiocPy** ecosystem and integrates seamlessly with [GenomicRanges](https://github.com/biocpy/genomicranges).

## Note
## Install

> This is the main page of your project's [Sphinx] documentation. It is
> formatted in [Markdown]. Add additional pages by creating md-files in
> `docs` or rst-files (formatted in [reStructuredText]) and adding links to
> them in the `Contents` section below.
>
> Please check [Sphinx] and [MyST] for more information
> about how to document your project and how to configure your preferences.
To get started, install the package from [PyPI](https://pypi.org/project/ensembldb/)

```bash
pip install ensembldb
```

## Contents

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ version_scheme = "no-guess-dev"
[tool.ruff]
line-length = 120
src = ["src"]
exclude = ["tests"]
# exclude = ["tests"]
lint.extend-ignore = ["F821"]

[tool.ruff.lint.pydocstyle]
Expand Down
6 changes: 5 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

[metadata]
name = ensembldb
description = Access EnsemblDB objects
description = Access EnsemblDb resources from Bioconductors AnnotationHub
author = Jayaram Kancherla
author_email = jayaram.kancherla@gmail.com
license = MIT
Expand Down Expand Up @@ -49,6 +49,10 @@ package_dir =
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"
pybiocfilecache
biocframe
genomicranges
iranges


[options.packages.find]
Expand Down
4 changes: 4 additions & 0 deletions src/ensembldb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,7 @@
__version__ = "unknown"
finally:
del version, PackageNotFoundError

from .record import EnsDbRecord
from .registry import EnsDbRegistry
from .ensdb import EnsDb
7 changes: 7 additions & 0 deletions src/ensembldb/_ahub.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""Configuration for accessing AnnotationHub metadata for EnsDb."""

__author__ = "Jayaram Kancherla"
__copyright__ = "Jayaram Kancherla"
__license__ = "MIT"

AHUB_METADATA_URL = "https://annotationhub.bioconductor.org/metadata/annotationhub.sqlite3"
Loading