Skip to content

Schema for variable-length registers #220

@glopesdev

Description

@glopesdev
  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Summary

This proposal concretizes the schema-level work needed for variable-length registers in device.yml, folding the variable-length-registers thread at #116 with the constraints introduced by the ExtendedLength proposal at #218. It specifies how a register declares a variable length and a maximum bound, and defines how wire-format encoding is derived from the schema rather than declared.

Motivation

Two threads converge here.

The first thread, #116, has been open since 2025-02-20 and proposes lifting the fixed-length restriction on registers in device.yml. Several candidate registers benefit directly: R_TAG, R_SERIAL_NUMBER, R_DEVICE_NAME, and any future register storing strings, error messages, semantic version strings, or device-self-description payloads. The thread has substantial discussion but no merged spec text.

The second thread, #218, introduces the ExtendedLength flag for messages whose payload exceeds the 254-byte regular-format ceiling. For controllers to know per-register whether to use regular or extended framing, the schema must declare each register's maximum payload size. Without a schema-level bound, controllers would have to probe by attempting writes and recovering from errors.

These two needs are best addressed together. Variable-length register declarations subsume the schema work needed for ExtendedLength-using registers, and the codegen pipelines that produce parsers from device.yml are touched once rather than twice.

Detailed Design

Schema additions to registers.json

A new optional attribute on the register definition:

  • maxLength (integer, minimum 1): specifies the maximum number of payload elements the register can hold. A register declared with maxLength is variable-length and may carry any number of elements between 0 and maxLength. The element type is given by the existing type attribute. maxLength is mutually exclusive with the existing length attribute.

The existing length attribute is unchanged: registers declared with length: <integer> continue to be fixed-length and carry exactly that many elements. Registers with neither length nor maxLength continue to default to a single element, as today.

Example device.yml declarations:

# Existing fixed-length register (no change)
DeviceName:
  address: 12
  type: U8
  length: 25
  access: Read, Write

# New variable-length register
Tag:
  address: 17
  type: U8
  maxLength: 64
  access: Read

# New variable-length register requiring ExtendedLength framing
FirmwareImage:
  address: 200
  type: U8
  maxLength: 1048576
  access: Write

# Fixed-length register that requires ExtendedLength framing
WaveformPreset:
  address: 100
  type: U16
  length: 4096
  access: Read, Write

Encoding strategy is derived, not declared

A register's wire-format encoding is determined from its declared payload size. For fixed-length registers, the size is length × sizeof(type) bytes. For variable-length registers, the maximum size is maxLength × sizeof(type) bytes.

If the resulting size, plus header overhead and the optional timestamp, fits within the regular-format Length field's 254-byte capacity, the register uses regular framing. Otherwise, the register uses the ExtendedLength framing specified in #218. The rule applies symmetrically: declaring a fixed-length register with a large length is valid and results in ExtendedLength framing on every message to and from that register, just as it would for a variable-length register whose maxLength × sizeof(type) exceeds the regular-format ceiling.

The schema does not declare which framing to use. Codegen and validation tools derive it at code-generation time. This keeps the schema axes minimal at (type, length-mode, maximum-size) and avoids redundant declarations.

Backwards compatibility

The change is additive. Every existing device.yml file that uses length: <integer> continues to validate and produce identical generated code. Variable-length support is opt-in per-register through the new maxLength attribute.

Drawbacks

  • One additional schema attribute. Adds a small amount of validation surface, mitigated by the mutual-exclusivity constraint between length and maxLength being expressible directly in JSON Schema.
  • Variable-length payloads complicate bulk-data parsing and random access into logged binary streams, as flagged in the discussion at Support variable-length registers in device.yml register specifications #116. Random access into a sequence of variable-length messages requires sequential parsing of headers to locate boundaries, unlike fixed-length sequences. This is a downstream tooling concern rather than a schema-level concern; downstream analysis tools that need indexed access can demux per-register or switch to a database-backed store.
  • Downstream codegen and tooling pipelines that consume device.yml will need updates to handle variable-length registers, including but not limited to harp-tech/generators, bonsai-rx/harp, harp-tech/harp-python, and harp-tech/toolkit. Each pipeline already handles length for fixed registers; the variable-length path is a small extension rather than a rewrite. Scope and timing of those updates is out of scope for this proposal and left to the respective maintainers.

Alternatives

Use length: variable sentinel

This is the form originally proposed at #116. It allows expressing variable length without introducing a new attribute and is concise in device.yml. Rejected for this proposal because expressing the bound (maximum length) still requires a second attribute, so the saving is illusory. It also requires loosening the JSON Schema type for length from a clean integer constraint to a union with a string sentinel, which complicates validation.

Use length: null sentinel (bruno-f-cruz suggestion)

Similar to the sentinel approach. null is slightly less self-documenting than variable. Same trade-offs as above; rejected for the same reason.

Use object-form length: { max: 256 }

Replaces the integer with an object when variable. Allows declaring length-mode and maxLength in one attribute. Rejected because the union of integer and object complicates JSON Schema validation, and existing fixed-length declarations would not require any change anyway, so the union exists only to accommodate the variable case.

Bundle bruno-f-cruz's payloadSpec variadic extension

The discussion at #116 includes a proposal for payloadSpec entries with their own variable-length elements, enabling typed heterogeneous variable payloads. This is a strict superset of the current proposal. Rejected for this issue because it expands the schema design surface significantly and was raised as an open exploration rather than a concrete commitment. The current proposal is forward-compatible: a future extension can add variable-length payloadSpec members on top of register-level maxLength.

Unresolved Questions

  • Attribute naming. maxLength follows JSON Schema's own vocabulary (maxLength is a standard validation keyword for strings and arrays) but might collide with that meaning in interpreter tooling. Alternative names: lengthMax, lengthCap, lengthBound. Worth confirming at SRM.
  • Default upper bound when maxLength is omitted but the register is somehow declared variable. If we keep the strict either-length-or-maxLength rule, this never arises. If we allow a register to be variable without an explicit bound, we need a default (presumably the 4 GB ceiling from Add ExtendedLength flag to the binary protocol #218). Recommendation: require maxLength for any variable-length register and disallow unbounded.
  • Logging-format compatibility for variable-length messages. The existing Harp logging pattern is agnostic to register semantics: per-register messages are demuxed by Address and stored flat in a binary file, then bulk-loaded into an N-dimensional matrix on read. The bulk-load step relies on per-register payloads sharing the same shape. Variable-length registers break this assumption: each message in a variable-length per-register stream must be parsed sequentially to locate boundaries, and the resulting stream is not shape-aligned. The existing bulk-load workflow that downstream Harp tooling depends on does not apply unmodified. The decision point for SRM is whether this proposal should commit to a logging strategy for variable-length registers (e.g., a per-register stream format that retains random access, a separate per-register index file, or some other approach), or defer the choice to downstream tooling without prescription. Comments at Support variable-length registers in device.yml register specifications #116 raised earlier versions of this concern.
  • Payload spec extension. Whether to follow up with bruno-f-cruz's payloadSpec variadic proposal as a separate issue once this one lands.

Related Issues

Design Meetings

To be populated as this proposal progresses through SRM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalRequest for a new feature

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions