Skip to content

Add support for Component specified parsing and return type typing #291

@nrfulton

Description

@nrfulton

Discussed in #236

Originally posted by jakelorocco November 20, 2025

Proposal

If we define Components as a generic, we can use that typing information to encode the expected parsed_repr type of the ModelOutputThunk returned when generating from that component. For example:

class IntComponent(Component[int]):
    ...

ic = IntComponent()
mot = generate(ic, ...) # mot is typed as a ModelOutputThunk[int]

Each component will then be required to specify a parsing function to achieve the return type. This parsing function will be called after a generation request is done automatically. We can include helper functions to get this value by default (ie instead of mot.avalue(), we could define mot.aparsed_value() which returns the parsed / typed value).

This parsing function will also be responsible for handling any errors that arise during the parsing. Any errors raised during the parsing function will be propagated by Mellea.

As a part of this, we should also consider changing the naming of these fields within ModelOutputThunks so that it's clear which value is the parsed/typed value and which is the underlying, raw string.

Here is a simplified implementation of what this might look like in Mellea with the inferred types as comments:

from typing import TypeVar, Generic # For Python < 3.12
import typing_extensions

T = TypeVar("T")
S = typing_extensions.TypeVar("S", default=str)
# S = TypeVar("S", default=str) # Need the typing_extensions version for default values.


# Syntax needed for < 3.12
class Component(Generic[S]):
    def parse(self, text: str) -> S:
        ...

class UndefinedComponent(Component): # ie we don't populate the Generic[S]
    def parse(self, text: str) -> str:
        return text

class Grade(Component[float]):
    def parse(self, text: str) -> float:
        return 99.5

class LowGrade(Grade):
    def parse(self, text: str) -> float:
        return 85.5

class Essay(Component[Grade]):
    def parse(self, text: str) -> Grade:
        return Grade()

class BadEssay(Essay):
    def parse(self, text: str) -> Grade:
        return LowGrade()

class Coursework(Component[list[Essay]]):
    def parse(self, text: str) -> list[Essay]:
        return [
            Essay(),
            BadEssay()
        ]

def gen(c: Component[T]) -> T:
    return c.parse("Hello.")

# All this works for parsing.
coursework = Coursework()
essays = gen(coursework) # typed as list[Essay]

for essay in essays:
    grade = gen(essay) # typed as Grade

bad_essay = BadEssay()
low_grade = gen(bad_essay) # typed as Grade; cannot easily subclass a component subclass and change it's type

out = gen(UndefinedComponent()) # typed correctly as str

# Works for ModelOutputThunks as well.
M = TypeVar("M")
class ModelOutputThunk(Generic[M]):
    def __init__(self, raw_value: str) -> None:
        self.raw_value = raw_value
        self.value: M = None

    def parsed_repr(self) -> M:
        return self.value
        # return 1.0

mot_out = ModelOutputThunk[float]("text").parsed_repr() # typed as float

# Putting it all together.
def generate(c: Component[T]) -> ModelOutputThunk[T]:
    mot = ModelOutputThunk("hello") # correctly typed as ModelOutputThunk[Unknown]
    mot.value = c.parse("hello") # TODO: I think we can parse this and init our ModelOutputThunk with it...
    return mot # typed as ModelOutputThunk[T@generate]

mot = generate(bad_essay) # typed as ModelOutputThunk[Grade]
grade = mot.value # typed as Grade

mot_with_default_component = generate(UndefinedComponent()) # typed as ModelOutputThunk[str]
string_val = mot_with_default_component.value # typed as str

Potential Issues:

  1. Does not allow for changing the return type when subclassing a Component subclass. See the low_grade = gen(bad_essay) line in the example above. In other words, subclasses are constrained to their parent class' return type.
  2. This method will not allow us to use the format parameter as a return type. Instead, a generation that utilizes a generic component and a structured output format directive will still show its return type as the default str. To get a typed output, you will have to define a new component that parses that structured output.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions