Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: CI

on:
push:
branches: [master, main]
pull_request:

jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ['3.11', '3.12', '3.13']
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: true

- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}

- name: Install dependencies
run: uv sync --python ${{ matrix.python-version }}

- name: Lint
run: uv run ruff check .

- name: Format check
run: uv run ruff format --check .

- name: Type check
run: uv run mypy

- name: Test
run: uv run pytest --cov=unit_parser --cov-report=term-missing
16 changes: 0 additions & 16 deletions .travis.yml

This file was deleted.

209 changes: 178 additions & 31 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,219 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Guidance for Claude Code (claude.ai/code) when working in this repository.

This file is the single source of truth for project conventions. The README is
written for end users; this file is written for contributors (human or agent).

## What this library is

`unit_parser` is a small, dependency-free dimensional-analysis library. Its
distinguishing feature is that **units are parsed from strings**, including
compound units like `kilogram_meter_per_second_squared`. The original use case
was parsing free-form physical quantities out of plain-text input files (JSON
configs, etc.) without forcing the author to commit to one system of
measurement — if a user wants to specify a distance in furlongs, the library
can accept it.

## Commands

```bash
# Run all tests
pytest unit_parser/test_units.py
uv run pytest

# Run a single test
pytest unit_parser/test_units.py::test_feet_to_meters
uv run pytest tests/test_units.py::test_feet_to_meters

# Install dev environment
uv sync

# CLI
uv run convert 5 feet meters
uv run convert 5 feet to inches # the "to" filler word is accepted
```

# Install in editable mode
pip install -e .
There is no build step and no runtime dependencies.

# Use the CLI
convert 5 feet meters
convert 5 feet to inches
## Modernization roadmap

The codebase predates current Python packaging conventions and is mid-cleanup.
Treat the following as the **target state** when making changes — prefer to
move the code toward this state rather than entrench what exists today.

| Area | Status |
| ----------------- | -------------------------------- |
| Python version | Done — `requires-python = ">=3.11"` in `pyproject.toml` |
| Dependency mgmt | Done — `uv` + `uv.lock`; dev deps in `[dependency-groups]` |
| Linting | Done — `ruff` (lint + format), single-quote style |
| Type checking | Done — `mypy --strict`; pyre headers removed |
| CI | Done — GitHub Actions on 3.11/3.12/3.13 (`.github/workflows/ci.yml`) |
| Packaging | Done — PEP 621 only; `MANIFEST.in` and `requirements.txt` deleted |
| Tests | Done — `tests/` directory; coverage threshold 95 % |
| Class naming | Done — `README.md` updated to `UnitParser`; CI badge points at GitHub Actions |

Local commands:

```bash
uv sync # install dev deps + editable install
uv run pytest # tests
uv run ruff check . # lint
uv run ruff format . # format
uv run mypy # type-check
```

There is no build step; this is a pure Python package with no external runtime dependencies.
Don't introduce a runtime dependency just to satisfy a lint rule; the
zero-dependency property is a feature.

## Architecture

`unit_parser` is a dimensional-analysis library for parsing and converting physical quantities. The main class is `unit_parser` in `unit_parser/units.py`.
Everything lives in `unit_parser/units.py`. The package is small enough that a
single-module layout is correct — resist splitting it up prematurely.

### Core representation

Every unit is stored in an internal dict `self._units` with two fields:
Each unit is stored in `UnitParser._units` as a `_UnitSpec` dataclass with two
fields:

- **`signature`**: tuple of integers (length 6 by default) holding dimensional
exponents in the order `[length, mass, time, angle, temperature, charge]`.
Example: `newton` is `(1, 1, -2, 0, 0, 0)`.
- **`quantity`**: float conversion factor relative to the SI base unit for
that dimension.

- **`signature`**: a list of integers (length 6) representing dimensional exponents — `[length, mass, time, angle, temperature, charge]`. For example, `newton` has signature `[1, 1, -2, 0, 0, 0]`.
- **`quantity`**: a float giving the conversion factor relative to the SI base unit for that dimension.
Conversion between two compatible units is `value * from.quantity /
to.quantity`. Two units are *compatible* iff their signatures are equal.

Conversion between two compatible units is `result = value * from_quantity / to_quantity`. Two units are compatible iff their signatures are equal.
The signature length is **not hardcoded to 6** — it is set by the first
signature-form line encountered in the unit file (`self._sig_len`). All
subsequent signature-form lines must agree. A custom unit file can therefore
use any consistent dimension count.

### Compound unit parsing
Both `_UnitSpec` fields are immutable, so the cache lookup in
`_signature_and_quantity_for_unit` returns the stored spec directly without
defensive copying. Anything that needs to mutate (the per-call accumulators
inside that method) builds its own local `list[int]` and packages the result
back into a `tuple` before returning.

`_signature_and_quantity_for_unit` handles underscore-separated compound unit strings such as `kilogram_meter_per_second_squared`. It splits on underscores and processes tokens left-to-right, maintaining separate numerator and denominator buffers. Special tokens:
### Compound unit parsing (`_signature_and_quantity_for_unit`)

- `per` — switches subsequent tokens to the denominator (allowed at most once)
- `squared` / `cubed` — multiply exponents and quantity of the preceding unit by 2 or 3
Splits the input on `_` and walks tokens left-to-right, maintaining `signature`
/ `quantity` accumulators plus a `sig_buffer` / `quantity_buffer` holding the
*most recently seen unit token* (so that `squared` / `cubed` can re-apply it).

### Unit definition file
Special tokens:

`unit_parser/units/units.txt` is the built-in unit database. Lines are either:
- `per` — switches subsequent tokens to the denominator. **Allowed at most
once** per specification.
- `squared` — re-applies the previous unit token once more (so `meter_squared`
contributes the meter exponent twice).
- `cubed` — re-applies it twice more. Implemented by doubling `sig_buffer` and
squaring `quantity_buffer` before merging, so that `unit + 2*unit = 3*unit`.
Be careful when refactoring: if you change the order of "merge buffer into
accumulator" vs. "scale buffer", you will silently get the wrong exponent.
- `squared`/`cubed` cannot follow `per` directly or another modifier
(`squared_squared` is rejected).

- **Signature form**: `second: [0 0 1 0 0 0]` — defines a base unit with its dimensional signature
- **Quantity form**: `minute: 60 seconds` — defines a derived unit in terms of an existing one
`squared`/`cubed` bind only to the immediately preceding token —
`second_meter_squared` means "second times meter²", not "(second·meter)²". For
the latter, write `second_squared_meter_squared`.

Custom unit files can be passed to the `unit_parser` constructor.
### Unit definition file (`unit_parser/units/units.txt`)

Two line forms:

- **Signature form**: `second: [0 0 1 0 0 0]` — defines a base unit with
implicit quantity 1.0.
- **Quantity form**: `minute: 60 seconds` — defines a derived unit in terms of
any previously-defined unit (or compound expression).

`#` introduces a comment. Custom unit files can be passed to the
`UnitParser(unit_definitions=...)` constructor.

#### Known data-file quirks (worth fixing during modernization)

These are not bugs in the parser — they're choices in `units.txt`:

- `teaspoon: 0.3333333 tablespoons` — should be exactly 1/3. Consider allowing
rational expressions in the quantity form, or just use more digits.
- `year: 365 days` — uses the common-year convention, not Julian (365.25).
Document the choice or change it.
- `degF: 0.5555555555555 degC` — temperature **differences** only. No offset
is supported anywhere in the code, so converting an absolute temperature
through this unit will produce nonsense. Document this loudly, or extend
`_UnitSpec` with an offset and teach `convert` to refuse mixing absolute and
relative temperatures.
- Aliases like `seconds: 1 second` are how plurals/abbreviations work — there
is no built-in plural handling.

### Public API

All public names are re-exported from `unit_parser/__init__.py`. The key entry point is the `unit_parser` class:
Re-exported from `unit_parser/__init__.py`:

```python
from unit_parser import unit_parser
up = unit_parser()
from unit_parser import UnitParser
up = UnitParser()
up.convert("5 feet", "meters") # 2-arg form
up.convert(5, "feet", "meters") # 3-arg form
up.add("3 meters", "2 feet", "meters")
up.subtract(...)
up.subtract("3 meters", "2 feet", "meters")
up.multiply("2 kg", "3 meter_per_second_squared", "newtons")
up.divide(...)
up.divide("10 meters", "2 seconds", "meter_per_second")
```

All arithmetic operations verify that the result unit's signature matches the expected signature for the operation (e.g., multiply checks that output signature equals sum of input signatures).
Every arithmetic method takes the **desired result units** as its last
argument and validates that the dimensional signature of the result matches
that unit. Mismatches raise `ValueError` — they don't silently coerce.

### CLI
`convert` uses `*args: str | float` to accept both 2-arg and 3-arg forms. When
modernizing, replace this with `@typing.overload` so callers get accurate
types.

### CLI (`unit_parser/convert.py`)

Registered as the `convert` console script via `pyproject.toml`. Accepts:

```
convert <value> <from_unit> [to] <to_unit>
```

`unit_parser/convert.py` defines a `main()` entry point registered as the `convert` command via `setup.py`. It accepts `convert <value> <from_unit> [to] <to_unit>`.
The CLI accepts either two trailing positionals (`<from> <to>`) or three
where the middle one is the literal word `"to"`. Anything else (a stray
filler word, an extra argument) is rejected via `argparse.ArgumentParser.error`,
which prints the usage and exits 2.

## Conventions

- **No new runtime dependencies.** Test/dev dependencies are fine; runtime
deps are not.
- **Public surface = `unit_parser/__init__.py`.** Anything not re-exported
there is private and may change without notice. Tests reach into `_`-prefixed
methods on purpose; new external callers should not.
- **Errors are `ValueError`.** Don't introduce custom exception types unless
there's a real need to discriminate them at a call site.
- **String quoting**: existing code uses single quotes. Ruff's default is
double quotes — pick one in `pyproject.toml` and let the formatter enforce
it; don't mix.
- **Docstrings** are NumPy-style. Keep that consistent if you add new ones.
- **Don't write throwaway planning docs.** Work from this file and the code.

## Test coverage

The suite covers `cubed` (signature, conversion, and denominator placement),
loading a valid custom unit file with a non-default dimension count and a
compound-expression definition, the `_parse_physical_quantity` happy path
(simple, decimal, no-space, compound units), parametrized round-trip
conversions across length / mass / volume / velocity / force, and
case-sensitive unit lookup. Coverage sits at 99 %; the two missing lines are
the `if __name__ == '__main__'` guard in `convert.py` and a
defense-in-depth `ValueError` in `_parse_unit_file` that the outer regex
already prevents from firing.

Conventions for new tests:

- Prefer `pytest.approx` over exact float equality for any conversion that
goes through more than one multiplication.
- New custom-unit-file fixtures live in `unit_parser/test_files/`. Use the
`get_cwd()` helper for paths.
- Reach into `_`-prefixed methods on `UnitParser` only from tests; that's the
documented escape hatch for white-box checks (signature exponents, dimension
count, etc.).
4 changes: 0 additions & 4 deletions MANIFEST.in

This file was deleted.

28 changes: 10 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,8 @@
<tr>
<td>Build Status</td>
<td>
<a href="https://travis-ci.org/rwilson4/unit_parser">
<img src="https://travis-ci.org/rwilson4/unit_parser.svg?branch=master&label=Travis%20CI" alt="travis build status" />
</a>
</td>
</tr>
<tr>
<td>Code Coverage</td>
<td>
<a href="https://codecov.io/gh/rwilson4/unit_parser">
<img src="https://codecov.io/gh/rwilson4/unit_parser/branch/master/graph/badge.svg" />
<a href="https://github.com/rwilson4/unit_parser/actions/workflows/ci.yml">
<img src="https://github.com/rwilson4/unit_parser/actions/workflows/ci.yml/badge.svg" alt="CI status" />
</a>
</td>
</tr>
Expand All @@ -42,8 +34,8 @@ operations.
The parsing function does double duty as a method for converting
between units and is thus called "convert".
```sh
>>> from unit_parser import unit_parser
>>> up = unit_parser()
>>> from unit_parser import UnitParser
>>> up = UnitParser()
>>> up.convert("3 gallons", "liters")
11.356235352
```
Expand All @@ -56,10 +48,10 @@ works as well:
11.356235352
```
Note the unit parser must be initialized before being used by calling
the unit_parser() function without any arguments. That uses the
the `UnitParser()` constructor without any arguments. That uses the
built-in unit specification file to define the units recognized by
this library. If a unit is not supported, you can create your own unit
specification file and provide the file name to this function.
specification file and pass its path to the constructor.

The next thing we see is that physical quantities and units are
represented by strings. I find this to be the most intuitive way of
Expand All @@ -77,8 +69,8 @@ gallons". The code that is parsing this input might call:

```sh
>>> import json
>>> from unit_parser import unit_parser
>>> up = unit_parser()
>>> from unit_parser import UnitParser
>>> up = UnitParser()
>>> config = json.load(open('example.json', 'r'))
>>> water_volume = config['water_volume']
>>> water_volume_liters = up.convert(water_volume, "liters")
Expand Down Expand Up @@ -123,8 +115,8 @@ takes three arguments: two physical quantities, and the desired units
of the answer.

```sh
>>> from unit_parser import unit_parser
>>> up = unit_parser()
>>> from unit_parser import UnitParser
>>> up = UnitParser()
>>> up.add("5 meters", "2 feet", "yards")
6.13473315836
>>> up.subtract("5 meters", "2 feet", "yards")
Expand Down
Loading
Loading