Add inline SQL support with configurable sanitization#6
Conversation
- Engine/AsyncEngine/TransactionManagers now accept raw SQL strings
alongside registry keys; whitespace presence distinguishes the two
- Positional params supported: scalar, list, or tuple are coerced to
tuple before binding (dict remains named-param style)
- New SQLSanitizer dataclass (strip_comments, block_multiple_statements,
allowed_verbs) applied only to inline SQL; registry queries are trusted
- SQLSanitizationError added to exception hierarchy
- SQLite adapter: params or {} → params if params is not None else {}
so positional tuples are not coerced to empty dict
- 65 new unit tests covering all sanitizer code paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds inline SQL support to the row_query library, allowing users to pass raw SQL strings directly to engine/transaction methods alongside the existing registry-based approach. The implementation includes a configurable SQL sanitizer that provides defense-in-depth measures for inline SQL through comment stripping, statement blocking, and verb allow-listing. The sanitizer is only applied to inline SQL strings (detected by presence of whitespace), while registry queries remain trusted and unsanitized.
Changes:
- Added
SQLSanitizerclass with configurable sanitization options (comment stripping, multi-statement blocking, verb restrictions) - Extended all engine and transaction methods to accept inline SQL strings in addition to registry keys
- Added
is_raw_sql()function to distinguish between inline SQL and registry keys based on whitespace presence - Added
coerce_params()function to flexibly handle dict, tuple/list, and scalar parameter types - Added 65 comprehensive unit tests covering sanitizer functionality and edge cases
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_sanitizer.py |
Comprehensive test suite (65 tests) covering SQL sanitizer, parameter coercion, and raw SQL detection |
row_query/core/sanitizer.py |
New SQL sanitizer with tokenization, comment stripping, statement validation, and verb checking |
row_query/core/params.py |
Added is_raw_sql() and coerce_params() helpers for inline SQL support |
row_query/core/exceptions.py |
Added SQLSanitizationError exception as subclass of ExecutionError |
row_query/core/engine.py |
Updated all query methods to support inline SQL via _resolve_sql() helper and flexible params |
row_query/core/transaction.py |
Updated all transaction methods to support inline SQL (mirrors engine changes) |
row_query/adapters/sqlite.py |
Changed params or {} to params if params is not None else {} for correctness with falsy dicts |
row_query/__init__.py |
Exported SQLSanitizer and SQLSanitizationError in public API |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def execute( | ||
| self, | ||
| query_name: str, | ||
| params: dict[str, Any] | None = None, | ||
| query: str, | ||
| params: Any = None, | ||
| ) -> int: | ||
| """Execute a write query within this transaction.""" | ||
| """Execute a write query within this transaction. | ||
|
|
||
| *query* may be a registry key or an inline SQL string. | ||
| *params* may be a dict, tuple/list, or scalar. | ||
| """ | ||
| self._check_active() | ||
| sql = self._registry.get(query_name) | ||
| sql, _label = _resolve_sql(query, self._registry, self._sanitizer) | ||
| sql = normalize_params(sql, self._paramstyle) | ||
| cursor = self._adapter.execute(self._connection, sql, params) | ||
| cursor = self._adapter.execute(self._connection, sql, coerce_params(params)) | ||
| return int(cursor.rowcount) | ||
|
|
||
| def fetch_one( | ||
| self, | ||
| query_name: str, | ||
| params: dict[str, Any] | None = None, | ||
| query: str, | ||
| params: Any = None, | ||
| ) -> dict[str, Any] | None: | ||
| """Fetch a single row within transaction context.""" | ||
| """Fetch a single row within transaction context. | ||
|
|
||
| *query* may be a registry key or an inline SQL string. | ||
| *params* may be a dict, tuple/list, or scalar. | ||
| """ | ||
| self._check_active() | ||
| sql = self._registry.get(query_name) | ||
| sql, _label = _resolve_sql(query, self._registry, self._sanitizer) | ||
| sql = normalize_params(sql, self._paramstyle) | ||
| cursor = self._adapter.execute(self._connection, sql, params) | ||
| cursor = self._adapter.execute(self._connection, sql, coerce_params(params)) | ||
| rows = _rows_to_dicts(cursor) | ||
| if not rows: | ||
| return None | ||
| return rows[0] | ||
|
|
||
| def fetch_all( | ||
| self, | ||
| query_name: str, | ||
| params: dict[str, Any] | None = None, | ||
| query: str, | ||
| params: Any = None, | ||
| ) -> list[dict[str, Any]]: | ||
| """Fetch all rows within transaction context.""" | ||
| """Fetch all rows within transaction context. | ||
|
|
||
| *query* may be a registry key or an inline SQL string. | ||
| *params* may be a dict, tuple/list, or scalar. | ||
| """ | ||
| self._check_active() | ||
| sql = self._registry.get(query_name) | ||
| sql, _label = _resolve_sql(query, self._registry, self._sanitizer) | ||
| sql = normalize_params(sql, self._paramstyle) | ||
| cursor = self._adapter.execute(self._connection, sql, params) | ||
| cursor = self._adapter.execute(self._connection, sql, coerce_params(params)) | ||
| return _rows_to_dicts(cursor) |
There was a problem hiding this comment.
The transaction methods (execute, fetch_one, fetch_all) don't wrap adapter.execute calls in try-except blocks to convert driver exceptions to ParameterBindingError, unlike the corresponding engine methods. This creates an inconsistency where driver exceptions can leak through transaction methods but are properly wrapped in engine methods. Additionally, the label returned from _resolve_sql is unused in transaction methods (prefixed with underscore), while it's used in engine methods for error messages. Consider adding the same error handling pattern used in the engine methods to maintain consistency and provide better error messages for inline SQL.
|
|
||
|
|
||
| def _resolve_sql( | ||
| query: str, | ||
| registry: SQLRegistry, | ||
| sanitizer: SQLSanitizer | None = None, | ||
| ) -> tuple[str, str]: | ||
| """Return ``(sql_text, label)`` for *query* (raw SQL or registry key).""" | ||
| if is_raw_sql(query): | ||
| sql = sanitizer.sanitize(query) if sanitizer is not None else query | ||
| return sql, "<inline>" | ||
| return registry.get(query), query | ||
|
|
||
|
|
There was a problem hiding this comment.
The _resolve_sql function is duplicated in both row_query/core/engine.py and row_query/core/transaction.py. This violates the DRY (Don't Repeat Yourself) principle and makes maintenance harder. Consider moving this function to a shared location such as row_query/core/params.py or creating a new utility module, since both files need to import and use the same logic.
| def _resolve_sql( | |
| query: str, | |
| registry: SQLRegistry, | |
| sanitizer: SQLSanitizer | None = None, | |
| ) -> tuple[str, str]: | |
| """Return ``(sql_text, label)`` for *query* (raw SQL or registry key).""" | |
| if is_raw_sql(query): | |
| sql = sanitizer.sanitize(query) if sanitizer is not None else query | |
| return sql, "<inline>" | |
| return registry.get(query), query | |
| from row_query.core.engine import _resolve_sql |
| def _tokenize(sql: str) -> list[tuple[str, str]]: | ||
| """Split *sql* into ``('string', …)`` and ``('code', …)`` tokens. | ||
|
|
||
| String literals (single-quoted, with ``''`` escapes) are preserved as-is. | ||
| Everything else is a ``'code'`` token. | ||
| """ | ||
| tokens: list[tuple[str, str]] = [] | ||
| i = 0 | ||
| n = len(sql) | ||
| last = 0 | ||
|
|
||
| while i < n: | ||
| if sql[i] == "'": | ||
| if i > last: | ||
| tokens.append(("code", sql[last:i])) | ||
| j = i + 1 | ||
| while j < n: | ||
| if sql[j] == "'": | ||
| j += 1 | ||
| if j >= n or sql[j] != "'": | ||
| break # end of literal | ||
| j += 1 # '' escape — continue | ||
| else: | ||
| j += 1 | ||
| tokens.append(("string", sql[i:j])) | ||
| last = j | ||
| i = j | ||
| else: | ||
| i += 1 | ||
|
|
||
| if last < n: | ||
| tokens.append(("code", sql[last:])) | ||
|
|
||
| return tokens |
There was a problem hiding this comment.
The _tokenize function only handles single-quoted string literals but does not handle double-quoted identifiers. In PostgreSQL, MySQL (with ANSI_QUOTES mode), and other databases, double quotes are used for identifiers (table/column names) and can contain comment-like syntax. For example, a column named my--column or my/*comment*/column would be written as "my--column" or "my/*comment*/column" and the current tokenizer would incorrectly strip these as comments. Consider extending the tokenizer to also preserve double-quoted identifiers to avoid incorrect comment stripping.
| def _tokenize(sql: str) -> list[tuple[str, str]]: | ||
| """Split *sql* into ``('string', …)`` and ``('code', …)`` tokens. | ||
|
|
||
| String literals (single-quoted, with ``''`` escapes) are preserved as-is. | ||
| Everything else is a ``'code'`` token. | ||
| """ | ||
| tokens: list[tuple[str, str]] = [] | ||
| i = 0 | ||
| n = len(sql) | ||
| last = 0 | ||
|
|
||
| while i < n: | ||
| if sql[i] == "'": | ||
| if i > last: | ||
| tokens.append(("code", sql[last:i])) | ||
| j = i + 1 | ||
| while j < n: | ||
| if sql[j] == "'": | ||
| j += 1 | ||
| if j >= n or sql[j] != "'": | ||
| break # end of literal | ||
| j += 1 # '' escape — continue | ||
| else: | ||
| j += 1 | ||
| tokens.append(("string", sql[i:j])) | ||
| last = j | ||
| i = j | ||
| else: | ||
| i += 1 | ||
|
|
||
| if last < n: | ||
| tokens.append(("code", sql[last:])) | ||
|
|
||
| return tokens |
There was a problem hiding this comment.
MySQL uses backticks () for quoting identifiers, which can also contain comment-like syntax. For example, ``SELECT my--columnFROM table`` would have the identifier incorrectly processed if it contains--or/* */`. The tokenizer should also preserve backtick-quoted identifiers to ensure correct handling across different SQL dialects.
| def is_raw_sql(query: str) -> bool: | ||
| """Return True if query is an inline SQL string rather than a registry key. | ||
|
|
||
| Registry keys use dot-notation (e.g. ``users.get_by_id``) and never | ||
| contain whitespace. Any SQL statement will contain at least one space. | ||
| """ | ||
| return any(c.isspace() for c in query) |
There was a problem hiding this comment.
The is_raw_sql function determines whether a string is inline SQL or a registry key by checking for whitespace characters. However, this approach could be fragile if a registry key accidentally contains whitespace (e.g., due to user error or copy-paste). This would cause the registry key to be treated as inline SQL and potentially sanitized, leading to unexpected behavior. Consider adding validation when registry keys are registered to ensure they don't contain whitespace, or use a more explicit mechanism to distinguish between inline SQL and registry keys (e.g., requiring a prefix or using a separate parameter).
| def coerce_params( | ||
| params: dict[str, Any] | tuple[Any, ...] | list[Any] | Any, | ||
| ) -> dict[str, Any] | tuple[Any, ...] | None: | ||
| """Normalize *params* to a dict, tuple, or None. | ||
|
|
||
| * ``None`` / ``dict`` → returned as-is (named parameter binding). | ||
| * ``tuple`` / ``list`` → converted to ``tuple`` (positional binding). | ||
| * Any other scalar → wrapped in a single-element tuple. | ||
| """ | ||
| if params is None or isinstance(params, dict): | ||
| return params | ||
| if isinstance(params, (tuple, list)): | ||
| return tuple(params) | ||
| return (params,) |
There was a problem hiding this comment.
The new inline SQL feature allows users to write SQL with ? placeholders and pass positional parameters (tuples), but the registry queries use :name placeholders. This mixing of parameter styles could be confusing for users. While technically it works (SQLite and other databases support both), it creates an inconsistency in the API where registry queries use named parameters and inline SQL can use positional parameters. Consider documenting this clearly or normalizing inline SQL to also use the same parameter style as registry queries for consistency.
| class SQLSanitizer: | ||
| """Configurable sanitizer for inline SQL strings. | ||
|
|
||
| Applied only to raw SQL passed directly to engine/transaction methods. | ||
| Registry-loaded queries are always trusted and never sanitized. | ||
|
|
||
| Attributes: | ||
| strip_comments: Strip ``--`` and ``/* */`` comments before execution. | ||
| block_multiple_statements: Reject SQL that contains a statement- | ||
| terminating ``;`` followed by additional content (prevents query | ||
| stacking such as ``SELECT 1; DROP TABLE users``). | ||
| allowed_verbs: If not ``None``, only SQL statements whose first keyword | ||
| appears in this set are permitted. ``None`` means no restriction. | ||
| Example: ``frozenset({"SELECT", "INSERT", "UPDATE", "DELETE"})``. | ||
| """ |
There was a problem hiding this comment.
The SQLSanitizer class documentation should include a prominent warning that it does not protect against SQL injection if user-provided data is concatenated directly into the SQL string. Users must still use parameterized queries (with ? or :name placeholders) to prevent SQL injection. The sanitizer only provides defense-in-depth measures (comment stripping, statement blocking, verb restrictions) but is not a substitute for proper parameterization. Consider adding this warning to the docstring to prevent misuse.
| def _tokenize(sql: str) -> list[tuple[str, str]]: | ||
| """Split *sql* into ``('string', …)`` and ``('code', …)`` tokens. | ||
|
|
||
| String literals (single-quoted, with ``''`` escapes) are preserved as-is. | ||
| Everything else is a ``'code'`` token. | ||
| """ | ||
| tokens: list[tuple[str, str]] = [] | ||
| i = 0 | ||
| n = len(sql) | ||
| last = 0 | ||
|
|
||
| while i < n: | ||
| if sql[i] == "'": | ||
| if i > last: | ||
| tokens.append(("code", sql[last:i])) | ||
| j = i + 1 | ||
| while j < n: | ||
| if sql[j] == "'": | ||
| j += 1 | ||
| if j >= n or sql[j] != "'": | ||
| break # end of literal | ||
| j += 1 # '' escape — continue | ||
| else: | ||
| j += 1 | ||
| tokens.append(("string", sql[i:j])) | ||
| last = j | ||
| i = j | ||
| else: | ||
| i += 1 | ||
|
|
||
| if last < n: | ||
| tokens.append(("code", sql[last:])) | ||
|
|
||
| return tokens |
There was a problem hiding this comment.
The _tokenize function does not detect or handle unterminated string literals. If a user provides SQL with an unclosed string literal (e.g., SELECT 'unclosed text), the tokenizer will scan to the end of the input looking for the closing quote, potentially treating legitimate SQL code as part of the string literal. This could bypass sanitization checks (e.g., comment stripping, statement checking) for the portion incorrectly identified as a string. Consider adding validation to detect unterminated string literals and raise an error, or at minimum, add test coverage for this edge case to document the current behavior.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@MaksimShevtsov I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you. |
…ction methods Co-authored-by: MaksimShevtsov <20194438+MaksimShevtsov@users.noreply.github.com>
Address PR review feedback: DRY violations, error handling, and sanitizer edge cases
…ting SQL execution calls
Summary
engine.fetch_one("SELECT * FROM users WHERE id = ?", 1)) in addition to registry keys. A whitespace check distinguishes the two — registry keys likeusers.get_by_idnever contain spaces.paramsargument now accepts adict(named binding),tuple/list(positional binding), or a scalar that is automatically wrapped in a single-element tuple.SQLSanitizer: A new opt-in sanitizer applied only to inline SQL (registry queries remain trusted). Three independent flags:strip_comments— removes--and/* */comments while preserving string literalsblock_multiple_statements— rejects SQL with a;followed by more contentallowed_verbs— restricts the leading SQL keyword to an explicit allow-listSQLSanitizationError: New exception in the hierarchy (ExecutionErrorsubclass).Test plan
uv run pytest tests/unit/test_sanitizer.py— all 65 sanitizer tests passuv run pytest tests/— full suite passes (190 tests, 1 pre-existing skip for missingaiosqlite)engine.fetch_one("SELECT * FROM t WHERE id = ?", 1)SELECT 1; DROP TABLE twhenblock_multiple_statements=True🤖 Generated with Claude Code