Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## Version 0.9.0

- Added support to parse .Rdata/.rda files.
- Bump the version of rds2cpp library.

## Version 0.8.0 - 0.8.1

- Implement parsers for compressed list objects.
Expand Down
38 changes: 10 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,7 @@

# rds2py

Parse and construct Python representations for datasets stored in RDS files. `rds2py` supports various base classes from R, and Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` S4 classes. **_For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp)._**

---

**Version 0.5.0** brings major changes to the package,

- Complete overhaul of the codebase using pybind11
- Streamlined readers for R data types
- Updated API for all classes and methods

Please refer to the [documentation](https://biocpy.github.io/rds2py/) for the latest usage guidelines. Previous versions may have incompatible APIs.

Parse and construct Python representations for datasets stored in **RDS or RData** files. `rds2py` supports various base classes from R, and Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` S4 classes. **_For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp)._**

## Installation

Expand All @@ -32,36 +21,29 @@ By default, the package does not install packages to convert python representati

## Usage

If you do not have an RDS object handy, feel free to download one from [single-cell-test-files](https://github.com/jkanche/random-test-files/releases).
> [!NOTE]
>
> If you do not have an RDS object handy, feel free to download one from [single-cell-test-files](https://github.com/jkanche/random-test-files/releases).

```python
from rds2py import read_rds
r_obj = read_rds("path/to/file.rds")
from rds2py import read_rds, read_rda
r_obj = read_rds("path/to/file.rds") # or read_rda("path/to/file.rda")
```

The returned `r_obj` either returns an appropriate Python class if a parser is already implemented or returns the dictionary containing the data from the RDS file.

To just get the parsed dictionary representation of the RDS file,

```python
from rds2py import parse_rds

robject_dict = parse_rds("path/to/file.rds")
print(robject_dict)
```

### Write-your-own-reader

Reading RDS files as dictionary representations allows users to write their own custom readers into appropriate Python representations.
Reading RDS or RData files as dictionary representations allows users to write their own custom readers into appropriate Python representations.

```python
from rds2py import parse_rds
from rds2py import parse_rds, parse_rda

robject = parse_rds("path/to/file.rds")
robject = parse_rds("path/to/file.rds") # or use parse_rda for rdata files
print(robject)
```

if you know this RDS file contains an `GenomicRanges` object, you can use the built-in reader or write your own reader to convert this dictionary.
If you know this RDS file contains an `GenomicRanges` object, you can use the built-in reader or write your own reader to convert this dictionary.

```python
from rds2py.read_granges import read_genomic_ranges
Expand Down
11 changes: 9 additions & 2 deletions lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,23 @@ include(FetchContent)
FetchContent_Declare(
rds2cpp
GIT_REPOSITORY https://github.com/LTLA/rds2cpp
GIT_TAG v1.1.0
GIT_TAG master
)

FetchContent_Declare(
byteme
GIT_REPOSITORY https://github.com/LTLA/byteme
GIT_TAG v1.2.2
GIT_TAG master
)

FetchContent_Declare(
sanisizer
GIT_REPOSITORY https://github.com/LTLA/sanisizer
GIT_TAG master
)

FetchContent_MakeAvailable(byteme)
FetchContent_MakeAvailable(sanisizer)
FetchContent_MakeAvailable(rds2cpp)

# Defining the targets.
Expand Down
64 changes: 63 additions & 1 deletion lib/src/rdswrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,8 @@ class RdsObject {
public:
RdsObject(const std::string& file) {
try {
parsed = std::make_unique<rds2cpp::Parsed>(rds2cpp::parse_rds(file));
rds2cpp::ParseRdsOptions options;
parsed = std::make_unique<rds2cpp::Parsed>(rds2cpp::parse_rds(file, options));
if (!parsed || !parsed->object) {
throw std::runtime_error("Failed to parse RDS file");
}
Expand All @@ -164,13 +165,74 @@ class RdsObject {
}
};

class RdaObject {
private:
std::unique_ptr<rds2cpp::RdaFile> parsed;

public:
RdaObject(const std::string& file) {
try {
rds2cpp::ParseRdaOptions options;
parsed = std::make_unique<rds2cpp::RdaFile>(rds2cpp::parse_rda(file, options));
} catch (const std::exception& e) {
throw std::runtime_error(std::string("Error in 'RdaObject' constructor: ") + e.what());
}
}

py::list get_object_names() const {
if (!parsed) throw std::runtime_error("Null parsed in 'get_object_names'");
const auto& pairlist = parsed->contents;
py::list names;
for (size_t i = 0; i < pairlist.tag_names.size(); ++i) {
if (pairlist.has_tag[i]) {
names.append(pairlist.tag_names[i]);
} else {
names.append(py::none());
}
}
return names;
}

int get_object_count() const {
if (!parsed) throw std::runtime_error("Null parsed in 'get_object_count'");
return static_cast<int>(parsed->contents.data.size());
}

RdsReader* get_object_by_index(int index) const {
if (!parsed) throw std::runtime_error("Null parsed in 'get_object_by_index'");
const auto& data = parsed->contents.data;
if (index < 0 || static_cast<size_t>(index) >= data.size()) {
throw std::out_of_range("Object index out of range");
}
return new RdsReader(data[index].get());
}

RdsReader* get_object_by_name(const std::string& name) const {
if (!parsed) throw std::runtime_error("Null parsed in 'get_object_by_name'");
const auto& pairlist = parsed->contents;
for (size_t i = 0; i < pairlist.tag_names.size(); ++i) {
if (pairlist.has_tag[i] && pairlist.tag_names[i] == name) {
return new RdsReader(pairlist.data[i].get());
}
}
throw std::runtime_error("Object not found: " + name);
}
};

PYBIND11_MODULE(lib_rds_parser, m) {
py::register_exception<std::runtime_error>(m, "RdsParserError");

py::class_<RdsObject>(m, "RdsObject")
.def(py::init<const std::string&>())
.def("get_robject", &RdsObject::get_robject, py::return_value_policy::reference_internal);

py::class_<RdaObject>(m, "RdaObject")
.def(py::init<const std::string&>())
.def("get_object_names", &RdaObject::get_object_names)
.def("get_object_count", &RdaObject::get_object_count)
.def("get_object_by_index", &RdaObject::get_object_by_index, py::return_value_policy::take_ownership, py::keep_alive<0, 1>())
.def("get_object_by_name", &RdaObject::get_object_by_name, py::return_value_policy::take_ownership, py::keep_alive<0, 1>());

py::class_<RdsReader>(m, "RdsReader")
.def(py::init<const rds2cpp::RObject*>())
.def("get_rtype", &RdsReader::get_rtype)
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ exclude =
# `pip install rds2py[PDF]` like:
# PDF = ReportLab; RXP
optional =
pandas
hdf5array
scipy
biocframe
Expand All @@ -72,6 +71,7 @@ optional =
multiassayexperiment>=0.6.0
compressed_lists>=0.4.4
biocutils>=0.3.4
compressed_lists

# Add here test requirements (semicolon/line-separated)
testing =
Expand Down
111 changes: 111 additions & 0 deletions src/rds2py/PyRdaReader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
"""Low-level interface for reading RData files.

This module provides the core functionality for parsing RData (.RData/.rda) files
and converting them into dictionary representations that can be further processed
by higher-level functions.
"""

from typing import Any, Dict

from .lib_rds_parser import RdaObject, RdsReader
from .PyRdsReader import PyRdsParser

__author__ = "jkanche"
__copyright__ = "jkanche"
__license__ = "MIT"


class PyRdaParserError(Exception):
"""Exception raised for errors during RData parsing."""

pass


class PyRdaParser:
"""Parser for reading RData files.

This class provides low-level access to RData file contents, handling the binary
format and converting it into Python data structures. It reuses the same
``RdsReader``-based object processing from :py:class:`~.PyRdsParser`.

Attributes:
rda_object:
Internal representation of the RData file.
"""

def __init__(self, file_path: str):
"""Initialize the parser.

Args:
file_path:
Path to the RData file to be read.
"""
try:
self.rda_object = RdaObject(file_path)
except Exception as e:
raise PyRdaParserError(f"Error initializing 'PyRdaParser': {str(e)}")

def get_object_names(self):
"""Get the names of all objects stored in the RData file.

Returns:
A list of object names (strings).
"""
return list(self.rda_object.get_object_names())

def get_object_count(self) -> int:
"""Get the number of objects stored in the RData file.

Returns:
Number of objects.
"""
return self.rda_object.get_object_count()

def parse(self) -> Dict[str, Dict[str, Any]]:
"""Parse all objects in the RData file.

Returns:
A dictionary mapping object names to their parsed representations.
Each value has the same structure as the output of
:py:meth:`~rds2py.PyRdsReader.PyRdsParser.parse`.
"""
try:
helper = _RdsProcessorHelper()

result = {}
names = self.get_object_names()
for i, name in enumerate(names):
reader = self.rda_object.get_object_by_index(i)
key = name if name is not None else f"__unnamed_{i}"
result[key] = helper._process_object(reader)

return result
except Exception as e:
raise PyRdaParserError(f"Error parsing RData file: {str(e)}")

def parse_object(self, name: str) -> Dict[str, Any]:
"""Parse a single named object from the RData file.

Args:
name:
Name of the object to parse.

Returns:
A dictionary containing the parsed data for the requested object.
"""
try:
helper = _RdsProcessorHelper()
reader = self.rda_object.get_object_by_name(name)
return helper._process_object(reader)
except Exception as e:
raise PyRdaParserError(f"Error parsing object '{name}': {str(e)}")


class _RdsProcessorHelper(PyRdsParser):
"""Helper that reuses PyRdsParser's object processing without requiring a file."""

def __init__(self):
self.R_MIN = -2147483648

def _process_object(self, obj: RdsReader) -> Dict[str, Any]:
return super()._process_object(obj)
5 changes: 3 additions & 2 deletions src/rds2py/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@
finally:
del version, PackageNotFoundError

from .generics import read_rds
from .rdsutils import parse_rds

from .generics import read_rds, read_rda
from .rdsutils import parse_rds, parse_rda
31 changes: 30 additions & 1 deletion src/rds2py/generics.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@
"""

from importlib import import_module
from typing import List, Optional
from warnings import warn

from .rdsutils import get_class, parse_rds
from .rdsutils import get_class, parse_rda, parse_rds

__author__ = "jkanche"
__copyright__ = "jkanche"
Expand Down Expand Up @@ -105,6 +106,34 @@ def read_rds(path: str, **kwargs):
return _dispatcher(_robj, **kwargs)


def read_rda(path: str, objects: Optional[List[str]] = None, **kwargs) -> dict:
"""Read an RData file and convert each object to an appropriate Python type.

This function parses all (or selected) objects and dispatches each one
through the same type registry used by :py:func:`~.read_rds`.

Args:
path:
Path to the RData (.RData/.rda) file to be read.

objects:
Optional list of object names to read. If ``None``,
all objects in the file are read.

**kwargs:
Additional arguments passed to specific parser functions.

Returns:
A dictionary mapping object names to their converted Python
representations.
"""
parsed = parse_rda(path=path, objects=objects)
result = {}
for name, robj in parsed.items():
result[name] = _dispatcher(robj, **kwargs)
return result


def _dispatcher(robject: dict, **kwargs):
"""Internal function to dispatch R objects to appropriate parser functions.

Expand Down
Loading
Loading