Skip to content

Add inline SQL support with configurable sanitization#6

Merged
MaksimShevtsov merged 8 commits into
mainfrom
001-sql-projection-engine
Feb 17, 2026
Merged

Add inline SQL support with configurable sanitization#6
MaksimShevtsov merged 8 commits into
mainfrom
001-sql-projection-engine

Conversation

@MaksimShevtsov
Copy link
Copy Markdown
Owner

Summary

  • Inline SQL strings: All fetch/execute methods now accept raw SQL directly (e.g. engine.fetch_one("SELECT * FROM users WHERE id = ?", 1)) in addition to registry keys. A whitespace check distinguishes the two — registry keys like users.get_by_id never contain spaces.
  • Flexible params: The params argument now accepts a dict (named binding), tuple/list (positional binding), or a scalar that is automatically wrapped in a single-element tuple.
  • Configurable SQLSanitizer: A new opt-in sanitizer applied only to inline SQL (registry queries remain trusted). Three independent flags:
    • strip_comments — removes -- and /* */ comments while preserving string literals
    • block_multiple_statements — rejects SQL with a ; followed by more content
    • allowed_verbs — restricts the leading SQL keyword to an explicit allow-list
  • SQLSanitizationError: New exception in the hierarchy (ExecutionError subclass).
  • 65 new unit tests covering all sanitizer code paths, including edge cases around string literals, comment ordering, and param coercion.

Test plan

  • uv run pytest tests/unit/test_sanitizer.py — all 65 sanitizer tests pass
  • uv run pytest tests/ — full suite passes (190 tests, 1 pre-existing skip for missing aiosqlite)
  • Verify inline SQL with positional scalar: engine.fetch_one("SELECT * FROM t WHERE id = ?", 1)
  • Verify sanitizer blocks SELECT 1; DROP TABLE t when block_multiple_statements=True
  • Verify registry queries bypass sanitizer entirely

🤖 Generated with Claude Code

MaksimShevtsov and others added 3 commits February 17, 2026 22:30
- Engine/AsyncEngine/TransactionManagers now accept raw SQL strings
  alongside registry keys; whitespace presence distinguishes the two
- Positional params supported: scalar, list, or tuple are coerced to
  tuple before binding (dict remains named-param style)
- New SQLSanitizer dataclass (strip_comments, block_multiple_statements,
  allowed_verbs) applied only to inline SQL; registry queries are trusted
- SQLSanitizationError added to exception hierarchy
- SQLite adapter: params or {} → params if params is not None else {}
  so positional tuples are not coerced to empty dict
- 65 new unit tests covering all sanitizer code paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds inline SQL support to the row_query library, allowing users to pass raw SQL strings directly to engine/transaction methods alongside the existing registry-based approach. The implementation includes a configurable SQL sanitizer that provides defense-in-depth measures for inline SQL through comment stripping, statement blocking, and verb allow-listing. The sanitizer is only applied to inline SQL strings (detected by presence of whitespace), while registry queries remain trusted and unsanitized.

Changes:

  • Added SQLSanitizer class with configurable sanitization options (comment stripping, multi-statement blocking, verb restrictions)
  • Extended all engine and transaction methods to accept inline SQL strings in addition to registry keys
  • Added is_raw_sql() function to distinguish between inline SQL and registry keys based on whitespace presence
  • Added coerce_params() function to flexibly handle dict, tuple/list, and scalar parameter types
  • Added 65 comprehensive unit tests covering sanitizer functionality and edge cases

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/unit/test_sanitizer.py Comprehensive test suite (65 tests) covering SQL sanitizer, parameter coercion, and raw SQL detection
row_query/core/sanitizer.py New SQL sanitizer with tokenization, comment stripping, statement validation, and verb checking
row_query/core/params.py Added is_raw_sql() and coerce_params() helpers for inline SQL support
row_query/core/exceptions.py Added SQLSanitizationError exception as subclass of ExecutionError
row_query/core/engine.py Updated all query methods to support inline SQL via _resolve_sql() helper and flexible params
row_query/core/transaction.py Updated all transaction methods to support inline SQL (mirrors engine changes)
row_query/adapters/sqlite.py Changed params or {} to params if params is not None else {} for correctness with falsy dicts
row_query/__init__.py Exported SQLSanitizer and SQLSanitizationError in public API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 99 to 148
def execute(
self,
query_name: str,
params: dict[str, Any] | None = None,
query: str,
params: Any = None,
) -> int:
"""Execute a write query within this transaction."""
"""Execute a write query within this transaction.

*query* may be a registry key or an inline SQL string.
*params* may be a dict, tuple/list, or scalar.
"""
self._check_active()
sql = self._registry.get(query_name)
sql, _label = _resolve_sql(query, self._registry, self._sanitizer)
sql = normalize_params(sql, self._paramstyle)
cursor = self._adapter.execute(self._connection, sql, params)
cursor = self._adapter.execute(self._connection, sql, coerce_params(params))
return int(cursor.rowcount)

def fetch_one(
self,
query_name: str,
params: dict[str, Any] | None = None,
query: str,
params: Any = None,
) -> dict[str, Any] | None:
"""Fetch a single row within transaction context."""
"""Fetch a single row within transaction context.

*query* may be a registry key or an inline SQL string.
*params* may be a dict, tuple/list, or scalar.
"""
self._check_active()
sql = self._registry.get(query_name)
sql, _label = _resolve_sql(query, self._registry, self._sanitizer)
sql = normalize_params(sql, self._paramstyle)
cursor = self._adapter.execute(self._connection, sql, params)
cursor = self._adapter.execute(self._connection, sql, coerce_params(params))
rows = _rows_to_dicts(cursor)
if not rows:
return None
return rows[0]

def fetch_all(
self,
query_name: str,
params: dict[str, Any] | None = None,
query: str,
params: Any = None,
) -> list[dict[str, Any]]:
"""Fetch all rows within transaction context."""
"""Fetch all rows within transaction context.

*query* may be a registry key or an inline SQL string.
*params* may be a dict, tuple/list, or scalar.
"""
self._check_active()
sql = self._registry.get(query_name)
sql, _label = _resolve_sql(query, self._registry, self._sanitizer)
sql = normalize_params(sql, self._paramstyle)
cursor = self._adapter.execute(self._connection, sql, params)
cursor = self._adapter.execute(self._connection, sql, coerce_params(params))
return _rows_to_dicts(cursor)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction methods (execute, fetch_one, fetch_all) don't wrap adapter.execute calls in try-except blocks to convert driver exceptions to ParameterBindingError, unlike the corresponding engine methods. This creates an inconsistency where driver exceptions can leak through transaction methods but are properly wrapped in engine methods. Additionally, the label returned from _resolve_sql is unused in transaction methods (prefixed with underscore), while it's used in engine methods for error messages. Consider adding the same error handling pattern used in the engine methods to maintain consistency and provide better error messages for inline SQL.

Copilot uses AI. Check for mistakes.
Comment thread row_query/core/transaction.py Outdated
Comment on lines 16 to 29


def _resolve_sql(
query: str,
registry: SQLRegistry,
sanitizer: SQLSanitizer | None = None,
) -> tuple[str, str]:
"""Return ``(sql_text, label)`` for *query* (raw SQL or registry key)."""
if is_raw_sql(query):
sql = sanitizer.sanitize(query) if sanitizer is not None else query
return sql, "<inline>"
return registry.get(query), query


Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _resolve_sql function is duplicated in both row_query/core/engine.py and row_query/core/transaction.py. This violates the DRY (Don't Repeat Yourself) principle and makes maintenance harder. Consider moving this function to a shared location such as row_query/core/params.py or creating a new utility module, since both files need to import and use the same logic.

Suggested change
def _resolve_sql(
query: str,
registry: SQLRegistry,
sanitizer: SQLSanitizer | None = None,
) -> tuple[str, str]:
"""Return ``(sql_text, label)`` for *query* (raw SQL or registry key)."""
if is_raw_sql(query):
sql = sanitizer.sanitize(query) if sanitizer is not None else query
return sql, "<inline>"
return registry.get(query), query
from row_query.core.engine import _resolve_sql

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +56
def _tokenize(sql: str) -> list[tuple[str, str]]:
"""Split *sql* into ``('string', …)`` and ``('code', …)`` tokens.

String literals (single-quoted, with ``''`` escapes) are preserved as-is.
Everything else is a ``'code'`` token.
"""
tokens: list[tuple[str, str]] = []
i = 0
n = len(sql)
last = 0

while i < n:
if sql[i] == "'":
if i > last:
tokens.append(("code", sql[last:i]))
j = i + 1
while j < n:
if sql[j] == "'":
j += 1
if j >= n or sql[j] != "'":
break # end of literal
j += 1 # '' escape — continue
else:
j += 1
tokens.append(("string", sql[i:j]))
last = j
i = j
else:
i += 1

if last < n:
tokens.append(("code", sql[last:]))

return tokens
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _tokenize function only handles single-quoted string literals but does not handle double-quoted identifiers. In PostgreSQL, MySQL (with ANSI_QUOTES mode), and other databases, double quotes are used for identifiers (table/column names) and can contain comment-like syntax. For example, a column named my--column or my/*comment*/column would be written as "my--column" or "my/*comment*/column" and the current tokenizer would incorrectly strip these as comments. Consider extending the tokenizer to also preserve double-quoted identifiers to avoid incorrect comment stripping.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +56
def _tokenize(sql: str) -> list[tuple[str, str]]:
"""Split *sql* into ``('string', …)`` and ``('code', …)`` tokens.

String literals (single-quoted, with ``''`` escapes) are preserved as-is.
Everything else is a ``'code'`` token.
"""
tokens: list[tuple[str, str]] = []
i = 0
n = len(sql)
last = 0

while i < n:
if sql[i] == "'":
if i > last:
tokens.append(("code", sql[last:i]))
j = i + 1
while j < n:
if sql[j] == "'":
j += 1
if j >= n or sql[j] != "'":
break # end of literal
j += 1 # '' escape — continue
else:
j += 1
tokens.append(("string", sql[i:j]))
last = j
i = j
else:
i += 1

if last < n:
tokens.append(("code", sql[last:]))

return tokens
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL uses backticks () for quoting identifiers, which can also contain comment-like syntax. For example, ``SELECT my--columnFROM table`` would have the identifier incorrectly processed if it contains--or/* */`. The tokenizer should also preserve backtick-quoted identifiers to ensure correct handling across different SQL dialects.

Copilot uses AI. Check for mistakes.
Comment thread row_query/core/params.py
Comment on lines +59 to +65
def is_raw_sql(query: str) -> bool:
"""Return True if query is an inline SQL string rather than a registry key.

Registry keys use dot-notation (e.g. ``users.get_by_id``) and never
contain whitespace. Any SQL statement will contain at least one space.
"""
return any(c.isspace() for c in query)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_raw_sql function determines whether a string is inline SQL or a registry key by checking for whitespace characters. However, this approach could be fragile if a registry key accidentally contains whitespace (e.g., due to user error or copy-paste). This would cause the registry key to be treated as inline SQL and potentially sanitized, leading to unexpected behavior. Consider adding validation when registry keys are registered to ensure they don't contain whitespace, or use a more explicit mechanism to distinguish between inline SQL and registry keys (e.g., requiring a prefix or using a separate parameter).

Copilot uses AI. Check for mistakes.
Comment thread row_query/core/params.py
Comment on lines +68 to +81
def coerce_params(
params: dict[str, Any] | tuple[Any, ...] | list[Any] | Any,
) -> dict[str, Any] | tuple[Any, ...] | None:
"""Normalize *params* to a dict, tuple, or None.

* ``None`` / ``dict`` → returned as-is (named parameter binding).
* ``tuple`` / ``list`` → converted to ``tuple`` (positional binding).
* Any other scalar → wrapped in a single-element tuple.
"""
if params is None or isinstance(params, dict):
return params
if isinstance(params, (tuple, list)):
return tuple(params)
return (params,)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new inline SQL feature allows users to write SQL with ? placeholders and pass positional parameters (tuples), but the registry queries use :name placeholders. This mixing of parameter styles could be confusing for users. While technically it works (SQLite and other databases support both), it creates an inconsistency in the API where registry queries use named parameters and inline SQL can use positional parameters. Consider documenting this clearly or normalizing inline SQL to also use the same parameter style as registry queries for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +144
class SQLSanitizer:
"""Configurable sanitizer for inline SQL strings.

Applied only to raw SQL passed directly to engine/transaction methods.
Registry-loaded queries are always trusted and never sanitized.

Attributes:
strip_comments: Strip ``--`` and ``/* */`` comments before execution.
block_multiple_statements: Reject SQL that contains a statement-
terminating ``;`` followed by additional content (prevents query
stacking such as ``SELECT 1; DROP TABLE users``).
allowed_verbs: If not ``None``, only SQL statements whose first keyword
appears in this set are permitted. ``None`` means no restriction.
Example: ``frozenset({"SELECT", "INSERT", "UPDATE", "DELETE"})``.
"""
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQLSanitizer class documentation should include a prominent warning that it does not protect against SQL injection if user-provided data is concatenated directly into the SQL string. Users must still use parameterized queries (with ? or :name placeholders) to prevent SQL injection. The sanitizer only provides defense-in-depth measures (comment stripping, statement blocking, verb restrictions) but is not a substitute for proper parameterization. Consider adding this warning to the docstring to prevent misuse.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +56
def _tokenize(sql: str) -> list[tuple[str, str]]:
"""Split *sql* into ``('string', …)`` and ``('code', …)`` tokens.

String literals (single-quoted, with ``''`` escapes) are preserved as-is.
Everything else is a ``'code'`` token.
"""
tokens: list[tuple[str, str]] = []
i = 0
n = len(sql)
last = 0

while i < n:
if sql[i] == "'":
if i > last:
tokens.append(("code", sql[last:i]))
j = i + 1
while j < n:
if sql[j] == "'":
j += 1
if j >= n or sql[j] != "'":
break # end of literal
j += 1 # '' escape — continue
else:
j += 1
tokens.append(("string", sql[i:j]))
last = j
i = j
else:
i += 1

if last < n:
tokens.append(("code", sql[last:]))

return tokens
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _tokenize function does not detect or handle unterminated string literals. If a user provides SQL with an unclosed string literal (e.g., SELECT 'unclosed text), the tokenizer will scan to the end of the input looking for the closing quote, potentially treating legitimate SQL code as part of the string literal. This could bypass sanitization checks (e.g., comment stripping, statement checking) for the portion incorrectly identified as a string. Consider adding validation to detect unterminated string literals and raise an error, or at minimum, add test coverage for this edge case to document the current behavior.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 17, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 61.29032% with 84 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
row_query/core/sanitizer.py 71.42% 34 Missing ⚠️
row_query/core/engine.py 40.00% 24 Missing ⚠️
row_query/core/transaction.py 37.14% 22 Missing ⚠️
row_query/core/params.py 85.71% 2 Missing ⚠️
row_query/core/registry.py 33.33% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@MaksimShevtsov
Copy link
Copy Markdown
Owner Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 17, 2026

@MaksimShevtsov I've opened a new pull request, #7, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits February 17, 2026 22:04
…ction methods

Co-authored-by: MaksimShevtsov <20194438+MaksimShevtsov@users.noreply.github.com>
Address PR review feedback: DRY violations, error handling, and sanitizer edge cases
@MaksimShevtsov MaksimShevtsov merged commit 5e44da2 into main Feb 17, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants