-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Discussed in #236
Originally posted by jakelorocco November 20, 2025
Proposal
If we define Components as a generic, we can use that typing information to encode the expected parsed_repr type of the ModelOutputThunk returned when generating from that component. For example:
class IntComponent(Component[int]):
...
ic = IntComponent()
mot = generate(ic, ...) # mot is typed as a ModelOutputThunk[int]Each component will then be required to specify a parsing function to achieve the return type. This parsing function will be called after a generation request is done automatically. We can include helper functions to get this value by default (ie instead of mot.avalue(), we could define mot.aparsed_value() which returns the parsed / typed value).
This parsing function will also be responsible for handling any errors that arise during the parsing. Any errors raised during the parsing function will be propagated by Mellea.
As a part of this, we should also consider changing the naming of these fields within ModelOutputThunks so that it's clear which value is the parsed/typed value and which is the underlying, raw string.
Here is a simplified implementation of what this might look like in Mellea with the inferred types as comments:
from typing import TypeVar, Generic # For Python < 3.12
import typing_extensions
T = TypeVar("T")
S = typing_extensions.TypeVar("S", default=str)
# S = TypeVar("S", default=str) # Need the typing_extensions version for default values.
# Syntax needed for < 3.12
class Component(Generic[S]):
def parse(self, text: str) -> S:
...
class UndefinedComponent(Component): # ie we don't populate the Generic[S]
def parse(self, text: str) -> str:
return text
class Grade(Component[float]):
def parse(self, text: str) -> float:
return 99.5
class LowGrade(Grade):
def parse(self, text: str) -> float:
return 85.5
class Essay(Component[Grade]):
def parse(self, text: str) -> Grade:
return Grade()
class BadEssay(Essay):
def parse(self, text: str) -> Grade:
return LowGrade()
class Coursework(Component[list[Essay]]):
def parse(self, text: str) -> list[Essay]:
return [
Essay(),
BadEssay()
]
def gen(c: Component[T]) -> T:
return c.parse("Hello.")
# All this works for parsing.
coursework = Coursework()
essays = gen(coursework) # typed as list[Essay]
for essay in essays:
grade = gen(essay) # typed as Grade
bad_essay = BadEssay()
low_grade = gen(bad_essay) # typed as Grade; cannot easily subclass a component subclass and change it's type
out = gen(UndefinedComponent()) # typed correctly as str
# Works for ModelOutputThunks as well.
M = TypeVar("M")
class ModelOutputThunk(Generic[M]):
def __init__(self, raw_value: str) -> None:
self.raw_value = raw_value
self.value: M = None
def parsed_repr(self) -> M:
return self.value
# return 1.0
mot_out = ModelOutputThunk[float]("text").parsed_repr() # typed as float
# Putting it all together.
def generate(c: Component[T]) -> ModelOutputThunk[T]:
mot = ModelOutputThunk("hello") # correctly typed as ModelOutputThunk[Unknown]
mot.value = c.parse("hello") # TODO: I think we can parse this and init our ModelOutputThunk with it...
return mot # typed as ModelOutputThunk[T@generate]
mot = generate(bad_essay) # typed as ModelOutputThunk[Grade]
grade = mot.value # typed as Grade
mot_with_default_component = generate(UndefinedComponent()) # typed as ModelOutputThunk[str]
string_val = mot_with_default_component.value # typed as strPotential Issues:
- Does not allow for changing the return type when subclassing a Component subclass. See the
low_grade = gen(bad_essay)line in the example above. In other words, subclasses are constrained to their parent class' return type. - This method will not allow us to use the
formatparameter as a return type. Instead, a generation that utilizes a generic component and a structured outputformatdirective will still show its return type as the defaultstr. To get a typed output, you will have to define a new component that parses that structured output.