Skip to content

feat: implement templating system based on tstrings#409

Open
NickCrews wants to merge 29 commits intoduckdb:mainfrom
NickCrews:t-strings
Open

feat: implement templating system based on tstrings#409
NickCrews wants to merge 29 commits intoduckdb:mainfrom
NickCrews:t-strings

Conversation

@NickCrews
Copy link
Copy Markdown
Contributor

@NickCrews NickCrews commented Mar 29, 2026

This PR implements a SQL templating system for duckdb-python based on Python t-strings. Discussion started in #370

It is still a WIP, but it is now good enough that I think the shape of the API is 90% there. I wanted to open up this PR so @evertlammerts can take a look at it and give some high level comments. Once we iron the large scale things out then I can fix those up and do more polishing before I ask you for a more detailed review.

Example Usage

Simple bound parameters

This is one of the most basic use cases, but I imagine also would be one of the more popular:

untrusted_input_from_api = "DROP TABLE USERS"
conn.sql(t"SELECT FROM users WHERE user_name ILIKE '%{untrusted_input_from_api}%'")`
# This results in actually executing
# "SELECT FROM users WHERE user_name ILIKE '%$p0_untrusted_input_from_api%'"
# with the params
# {"p0_untrusted_input_from_api": "DROP TABLE USERS"}

More complex bound params

By default, we generate IDs for the anonymous params.
If you want more control over the naming, you can explicitly create a Param object.
which is just a simple dataclass:

@dataclasses.dataclass(frozen=True, slots=True)
class Param:
    value: object
    name: str | None = None
    exact: bool = False

You can also create this with the duckdb.param() factory function:

conn.sql(t"SELECT FROM users WHERE user_name ILIKE '%{duckdb.param(untrusted_input_from_api, 'my_param', exact=True)}%'")
# "SELECT FROM users WHERE user_name ILIKE '%$my_param%'"
# {"my_param": "DROP TABLE USERS"}

Builtin duck objects are interpolated correctly

Types from the duckdb package, such as DuckdbPyRelation, datatype constants, expressions, etc, are converted to SQL and params correctly. This allows you to easily build up analysis chains, for easily commenting out individual lines or re-ordering, common to analysis:

t = duckdb.sql("SELECT * FROM read_parquet('data.parquet')")
t = duckdb.sql(t"SELECT a::{duckdb.list_type(int)}, {duckdb.ColumnExpression("b", "c")} FROM {t}")
t = duckdb.sql(t"SELECT * FROM {t} WHERE age >= {duckdb.param(config.min_age, "min_age", exact=True)};")

A template is just sequence of interleaving str's and param-ish things

Using the t-string literal syntax of t"foo{bar}" is simply just syntactic sugar for python 3.14+.
For older versions of python, or for programmatic construction, or for any other reason, you can create a SqlTemplate object with the duckdb.template() function:

duckdb.sql(
   duckdb.template("SELECT * FROM ", t, " WHERE age >= ", duckdb.param(config.min_age, "min_age", exact=True))
)

Build Higher-Order components using __duckdb_template__()

We define a Protocol that makes an implementor become compatible with duckdb.tempate():

@runtime_checkable
class SupportsDuckdbTemplate(Protocol):
    """Something that can be converted into a SqlTemplate by implementing the __duckdb_template__ method."""

    def __duckdb_template__(
        self, /, **future_kwargs
    ) -> (
        str
        | IntoInterpolation # A Protocol that looks like string.templatelib.Interpolation
        | Param
        | SupportsDuckdbTemplate
        | object # will be treated as a Param
        | Iterable[str | IntoInterpolation | Param | SupportsDuckdbTemplate | object]
    ):
        """Convert self into something that template() understands."""

Here is an example usage where someone can define

conn = duckdb.connect()

class Users:
    def __init__(self, columns: list[str], active: bool):
        self.columns = columns
        self.active = active

    def __duckdb_template__(self, **kwargs):
        active_str = "active" if self.active else "not active"
        # Note the !s format which means to treat the given str not as a param, but as raw SQL
        return t"SELECT {duckdb.ColumnExpression(*self.columns)} FROM users WHERE {active_str!s}"
        # Or, for python <3.14:
        # return "SELECT ", duckdb.ColumnExpression(*self.columns), " FROM users WHERE " + active_str

@dataclasses.dataclass
class Age:
    age: int

    def __post_init__(self):
        if self.age < 0:
            raise ValueError()

    def __duckdb_template__(self, **kwargs):
        return duckdb.param(self.age, "age")

@dataclasses.dataclass
class UserFilters:
    min_age: Age | None = None
    max_age: Age | None = None
    name_ilike: str | None = None

    def __duckdb_template__(self, **kwargs):
        parts = ["true"]
        if self.min_age is not None:
            parts.extend([" and age >= ", self.min_age])
        if self.max_age is not None:
            parts.extend([" and age <= ", self.max_age])
        if self.name_ilike is not None:
            parts.extend([" and name ilike '%", self.name_ilike, "%'"])
        return parts

inactive_users = conn.sql(Users(['name', 'age'], False))
filters = UserFilters(min_age=Age(18))
duckdb.sql(t"SELECT * FROM ({inactive_users}) WHERE {filters}")
# "SELECT * FROM (SELECT name, age FROM users WHERE not active) WHERE age >= $p1_age"
# using the params:
# {"p1_age": 18}

There is a well-defined lifecycle of IntoSqlTemplate -> SqlTemplate -> ResolvedSqlTemplate -> CompiledSql

Most users will just go straight from IntoSqlTemplate (anything that duckdb.template() understands) straight to executing, but the intermediate processing is well defined, and a public API, and the user can get in the middle of the process and view/modify the intermediate data structures.

  • An IntoSqlTemplate isn't actually a concrete type, but is the typing union of all the things that can get turned into a SqlTemplate, eg all the things accepted by the duckdb.tempate() function.

  • A SqlTemplate is what you get back from the duckdb.template() function. It is quite similar to the actual string.templatelib.Template built in to python 3.14. It contains a sequence of str's and Interpolation objects, just like string.templatelib.Template. This is potentially nested, with the Interpolation objects containing other SqlTemplates, Params, str, or anything else. It contains an additional .resolve() method that recursively resolves all the inner Interpolations into str's and Params, resulting in a ResolvedSqlTemplate.

  • A ResolvedSqlTemplate is semantically a Sequence[str | Param]. The actual final param IDs haven't been resolved yet, but any nesting has been flattened. You can call the SqlTemplate.compile() method to combine adjacent str's and to resolve the param IDs to their final form, resulting in a CompiledSql object.

  • The final object is a CompiledSql object, which is just a simple dataclass like CompiledSql(sql: str, params: dict[str, object]). This gets used as conn.execute(compiled.sql, compiled.params).

Behavior notes:

  • Nested templates are resolved recursively.
  • I made the __duckdb_template__() protocol accept unused kwarg args for future-proofing. If in the future we decide that we need to pass config or other metadata during the compilation step, we can start passing that in, and all the handlers won't break.
  • Friendly param naming, either with exact param names, autogenerating names, or autogenerating with a suffix.
  • Duplicate param names are rejected.
  • params are always used as named params, never positional. I chose this because the generated SQL/params are friendlier to debug. I don't THINK there is a downside that I see?
  • Compiled params are merged with user-provided params:
    • dict + dict merge supported
    • duplicate key collisions raise
    • mixing compiled named params with non-empty positional params raises

Query API integration

The main SQL entry points (sql, query, from_query, execute, executemany) now accept template-ish inputs (SqlTemplate / CompiledSql) in addition to existing string/statement inputs. This behavior could use some careful thought, as this is one of the biggest ways we are painting ourselves into a corner for future API changes, or we cause footguns from unexpected behavior. Everyone already uses these APIs, and always will.

  • If object has .sql/.params, those are consumed directly.
  • If object has .compile(), it is compiled before statement parsing.

This allows passing compiled/template objects directly into execution/query paths without requiring manual .sql/.params unpacking by users.

I'm not sure if we should be EVEN more coercive, and support accepting any of the things that template() accepts. Eg should we accept conn.sql(["SELECT * FROM users WHERE id = ", 123])?

Testing

Adds broad coverage across:

  • pure-Python template internals (construction, parsing, resolution, compilation, naming, protocol behavior, nesting, conversion semantics)
  • Python 3.14 t-string-specific behavior. This module can only be parsed in python 3.14 so I need to wire up the harness to only attempt to load this test file on these new versions.
  • end-to-end integration with connection/module SQL APIs. These rely on the compiled C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant