Skip to content

Commit f499a30

Browse files
author
Samson Gebre
committed
Block SELECT * queries with ValidationError — intentional design decision to prevent expensive wildcard selects on wide entities
1 parent 5942c4a commit f499a30

9 files changed

Lines changed: 131 additions & 410 deletions

File tree

README.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -400,9 +400,6 @@ results = client.query.sql(
400400
"GROUP BY a.name"
401401
)
402402

403-
# SELECT * is auto-expanded by the SDK
404-
results = client.query.sql("SELECT * FROM account")
405-
406403
# SQL results directly as a DataFrame
407404
df = client.dataframe.sql(
408405
"SELECT name, revenue FROM account ORDER BY revenue DESC"

examples/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ Deep-dive into production-ready patterns and specialized functionality:
4545
- Full SQL capabilities: SELECT, WHERE, TOP, ORDER BY, LIKE, IN, BETWEEN
4646
- JOINs (INNER, LEFT, multi-table), GROUP BY, DISTINCT, aggregates
4747
- OFFSET FETCH for server-side pagination
48-
- SELECT * auto-expansion (SDK rewrites for server compatibility)
4948
- Polymorphic lookups via SQL (ownerid, customerid, createdby)
5049
- SQL read -> DataFrame transform -> SDK write-back (full round-trip)
5150
- SQL-driven bulk create, update, and delete patterns

examples/advanced/sql_examples.py

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
based on extensive testing of the Dataverse SQL endpoint (353 test queries).
1010
1111
Capabilities PROVEN to work:
12-
- SELECT with specific columns, SELECT * (auto-expanded by SDK)
12+
- SELECT with specific columns
1313
- INNER JOIN, LEFT JOIN (up to 6+ tables)
1414
- COUNT(*), SUM(), AVG(), MIN(), MAX() aggregates
1515
- GROUP BY, DISTINCT, DISTINCT TOP
@@ -304,19 +304,23 @@ def _run_examples(client):
304304
print(f" {r.get('new_code', ''):<12s} Budget={r.get('new_budget')} Active={r.get('new_active')}")
305305

306306
# ==============================================================
307-
# 5. SELECT * (auto-expanded by SDK)
307+
# 5. SELECT * -- Rejected by Design
308308
# ==============================================================
309-
heading(5, "SELECT * (Auto-Expanded by SDK)")
309+
heading(5, "SELECT * -- Rejected by Design")
310310
print(
311-
"The server blocks SELECT * directly. The SDK auto-resolves\n"
312-
"all column names via list_columns() and rewrites the query."
311+
"SELECT * is deliberately rejected -- not a server workaround,\n"
312+
"but an intentional design decision. Wide entities (e.g. account\n"
313+
"has 307 columns) make SELECT * extremely expensive on shared\n"
314+
"infrastructure. Specify columns explicitly instead.\n"
315+
"Use client.query.sql_columns('account') to discover column names."
313316
)
314-
sql = f"SELECT * FROM {parent_table}"
315-
log_call(f'client.query.sql("{sql}")')
316-
results = backoff(lambda: client.query.sql(sql))
317-
if results:
318-
keys = [k for k in results[0].keys() if not k.startswith("@")]
319-
print(f"[OK] {len(results)} rows, {len(keys)} columns")
317+
from PowerPlatform.Dataverse.core.errors import ValidationError as _VE
318+
319+
try:
320+
client.query.sql(f"SELECT * FROM {parent_table}")
321+
print("[UNEXPECTED] SELECT * did not raise -- check SDK version")
322+
except _VE as exc:
323+
print(f"[OK] ValidationError raised as expected: {exc}")
320324

321325
# ==============================================================
322326
# 6. WHERE clause
@@ -1102,19 +1106,15 @@ def _run_examples(client):
11021106
can have 5000+ rows. Unfiltered queries return max rows.
11031107
FIX: Always add WHERE filters and TOP when querying system tables.
11041108
1105-
4. SELECT * ON WIDE TABLES
1106-
BAD: SELECT * FROM account (307 columns!)
1107-
WHY: The SDK auto-expands * into all 260+ non-virtual columns.
1108-
Every column is transferred over the network.
1109-
NOTE: With JOINs, SELECT * only expands the FIRST (FROM) table's
1110-
columns -- joined table columns will NOT be included.
1111-
Example: SELECT * FROM account a JOIN contact c ON ...
1112-
expands to account columns only; contact columns are missing.
1109+
4. SELECT * (BLOCKED -- ValidationError)
1110+
BAD: SELECT * FROM account
1111+
WHY: SELECT * is intentionally rejected -- not a technical limitation.
1112+
Wide entities (account has 307 columns) make wildcard selects
1113+
extremely expensive on shared database infrastructure.
11131114
FIX: List only the columns you need: SELECT name, revenue FROM account
1114-
Or use the SDK helper:
1115-
cols = client.query.sql_select("account")
1116-
sql = f"SELECT TOP 10 {{cols}} FROM account"
1117-
For JOINs, always specify columns from each table explicitly:
1115+
Or discover columns first:
1116+
cols = client.query.sql_columns("account")
1117+
For JOINs, always qualify columns from each table:
11181118
SELECT a.name, c.fullname FROM account a JOIN contact c ON ...
11191119
11201120
5. DEEP JOINS WITHOUT TOP
@@ -1125,10 +1125,10 @@ def _run_examples(client):
11251125
FIX: Always include TOP N for multi-table JOINs.
11261126
11271127
SDK guardrails:
1128-
- Patterns #1 (writes) and unsupported syntax (CROSS/RIGHT/FULL JOIN,
1129-
UNION, HAVING, CTE, subqueries) -> ValidationError (blocked).
1130-
- Pattern #2 (cartesian FROM a, b) and #4 (SELECT * + JOIN)
1131-
-> UserWarning (advisory).
1128+
- Patterns #1 (writes), unsupported syntax (CROSS/RIGHT/FULL JOIN,
1129+
UNION, HAVING, CTE, subqueries), and #4 (SELECT *)
1130+
-> ValidationError (blocked).
1131+
- Pattern #2 (cartesian FROM a, b) -> UserWarning (advisory).
11321132
- Server enforces 5000-row cap on all queries (#3, #5).
11331133
- Use sql_columns() or sql_select() to discover valid column names.
11341134
- Use sql_joins() or sql_join() to discover valid JOIN clauses.
@@ -1143,7 +1143,7 @@ def _run_examples(client):
11431143
| Feature | SQL | Notes / SDK Fallback |
11441144
+-------------------------------+----------+----------------------------------------+
11451145
| SELECT col1, col2 | YES | Use LogicalName (lowercase) |
1146-
| SELECT * | YES (*) | SDK auto-expands via list_columns() |
1146+
| SELECT * | NO | Specify columns explicitly |
11471147
| WHERE =, !=, >, <, LIKE, IN | YES | |
11481148
| AND, OR, parentheses | YES | Full boolean logic |
11491149
| NOT IN, NOT LIKE | YES | |

src/PowerPlatform/Dataverse/data/_odata.py

Lines changed: 34 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -791,12 +791,6 @@ def _do_request(url: str, *, params: Optional[Dict[str, Any]] = None) -> Dict[st
791791
yield [x for x in items if isinstance(x, dict)]
792792
next_link = data.get("@odata.nextLink") or data.get("odata.nextLink") if isinstance(data, dict) else None
793793

794-
# ----------------------- SELECT * detection -----------------------
795-
_SELECT_STAR_RE = re.compile(
796-
r"\bSELECT\b(\s+(?:DISTINCT\s+)?(?:TOP\s+\d+(?:\s+PERCENT)?\s+)?)\*\s",
797-
re.IGNORECASE,
798-
)
799-
800794
# ----------------------- SQL guardrail patterns --------------------
801795
_SQL_WRITE_RE = re.compile(
802796
r"^\s*(?:INSERT|UPDATE|DELETE|DROP|TRUNCATE|ALTER|CREATE|EXEC|GRANT|REVOKE|BULK)\b",
@@ -808,7 +802,6 @@ def _do_request(url: str, *, params: Optional[Dict[str, Any]] = None) -> Dict[st
808802
r"\bFROM\s+[A-Za-z0-9_]+(?:\s+[A-Za-z0-9_]+)?\s*,\s*[A-Za-z0-9_]+",
809803
re.IGNORECASE,
810804
)
811-
_SQL_HAS_JOIN_RE = re.compile(r"\bJOIN\b", re.IGNORECASE)
812805
# Server-blocked SQL patterns (save the round-trip by catching early)
813806
_SQL_UNSUPPORTED_JOIN_RE = re.compile(
814807
r"\b(?:CROSS\s+JOIN|RIGHT\s+(?:OUTER\s+)?JOIN|FULL\s+(?:OUTER\s+)?JOIN)\b",
@@ -821,44 +814,14 @@ def _do_request(url: str, *, params: Optional[Dict[str, Any]] = None) -> Dict[st
821814
r"\bIN\s*\(\s*SELECT\b|\bEXISTS\s*\(\s*SELECT\b|\(\s*SELECT\b.*\bFROM\b",
822815
re.IGNORECASE,
823816
)
824-
825-
def _expand_select_star(self, sql: str, table: str) -> str:
826-
"""Replace ``SELECT *`` with explicit column names.
827-
828-
When the Dataverse SQL endpoint receives ``SELECT *`` it returns
829-
an error ("SELECT * is not supported"). This helper resolves all
830-
columns via ``_list_columns`` and rewrites the query so the user
831-
never has to know the server limitation.
832-
833-
For JOIN queries, the expansion only includes columns from the first
834-
(FROM) table. A warning is emitted so the user knows to specify
835-
columns explicitly for multi-table queries.
836-
"""
837-
if not self._SELECT_STAR_RE.search(sql):
838-
return sql
839-
840-
# Warn on SELECT * with JOINs -- expansion uses only the FROM table
841-
if self._SQL_HAS_JOIN_RE.search(sql):
842-
warnings.warn(
843-
"SELECT * with JOIN: the SDK expands * using columns from "
844-
"the first table only. Columns from joined tables will not "
845-
"be included. Specify columns explicitly for JOINs "
846-
"(e.g. SELECT a.name, c.fullname FROM account a "
847-
"JOIN contact c ON ...).",
848-
UserWarning,
849-
stacklevel=4,
850-
)
851-
852-
cols = self._list_columns(
853-
table,
854-
select=["LogicalName"],
855-
filter="AttributeType ne 'Virtual'",
856-
)
857-
col_names = sorted({c["LogicalName"] for c in cols if "LogicalName" in c})
858-
if not col_names:
859-
return sql # Fallback: let the server decide
860-
col_list = ", ".join(col_names)
861-
return self._SELECT_STAR_RE.sub(lambda m: f"SELECT{m.group(1)}{col_list} ", sql, count=1)
817+
# SELECT * is intentionally rejected -- not a technical limitation but a
818+
# deliberate design decision. Wide entities (e.g. account has 307 columns)
819+
# make SELECT * extremely expensive on shared database infrastructure.
820+
# COUNT(*) is NOT matched because COUNT appears before the *.
821+
_SQL_SELECT_STAR_RE = re.compile(
822+
r"\bSELECT\b\s+(?:DISTINCT\s+)?(?:TOP\s+\d+(?:\s+PERCENT)?\s+)?\*\s",
823+
re.IGNORECASE,
824+
)
862825

863826
def _sql_guardrails(self, sql: str) -> str:
864827
"""Apply safety guardrails to a SQL query before sending to the server.
@@ -873,19 +836,22 @@ def _sql_guardrails(self, sql: str) -> str:
873836
4. HAVING clause (server rejects)
874837
5. CTE / WITH clause (server rejects)
875838
6. Subqueries -- IN (SELECT ...), EXISTS (SELECT ...) (server rejects)
839+
7. SELECT * -- intentional design decision, not a technical limitation.
840+
Wide entities make wildcard selects extremely expensive on shared
841+
database infrastructure. ``COUNT(*)`` is not affected.
876842
877843
**Warned** (``UserWarning`` -- query still executes):
878844
879-
7. Leading-wildcard LIKE (full table scan)
880-
8. Implicit cross join FROM a, b (cartesian product)
845+
8. Leading-wildcard LIKE (full table scan)
846+
9. Implicit cross join FROM a, b (cartesian product)
881847
882848
All blocked patterns are also blocked by the server, but catching
883849
them here saves the network round-trip and provides clearer error
884850
messages. To bypass a specific check (e.g., if the server adds
885851
support in the future), all checks are in this single method.
886852
887853
:param sql: The SQL string (already stripped).
888-
:return: The SQL string (unchanged unless rewritten).
854+
:return: The SQL string (unchanged).
889855
:raises ValidationError: If the SQL contains a blocked pattern.
890856
"""
891857
# --- BLOCKED (save server round-trip) ---
@@ -944,9 +910,22 @@ def _sql_guardrails(self, sql: str) -> str:
944910
subcode=VALIDATION_SQL_UNSUPPORTED_SYNTAX,
945911
)
946912

913+
# 7. Block SELECT * -- intentional design decision.
914+
# Wide entities (e.g. account has 307 columns) make wildcard selects
915+
# extremely expensive on shared database infrastructure.
916+
# COUNT(*) is NOT matched: _SQL_SELECT_STAR_RE requires * to be the
917+
# first token after SELECT/DISTINCT/TOP N, so COUNT appears before *.
918+
if self._SQL_SELECT_STAR_RE.search(sql):
919+
raise ValidationError(
920+
"SELECT * is not supported. Specify column names explicitly "
921+
"(e.g. SELECT name, revenue FROM account). "
922+
"Use client.query.sql_columns('account') to discover available columns.",
923+
subcode=VALIDATION_SQL_UNSUPPORTED_SYNTAX,
924+
)
925+
947926
# --- WARNED (query still executes) ---
948927

949-
# 7. Warn on leading-wildcard LIKE
928+
# 8. Warn on leading-wildcard LIKE
950929
if self._SQL_LEADING_WILDCARD_RE.search(sql):
951930
warnings.warn(
952931
"Query contains a leading-wildcard LIKE pattern "
@@ -957,7 +936,7 @@ def _sql_guardrails(self, sql: str) -> str:
957936
stacklevel=4,
958937
)
959938

960-
# 8. Warn on implicit cross joins (server allows but risky)
939+
# 9. Warn on implicit cross joins (server allows but risky)
961940
if self._SQL_IMPLICIT_CROSS_JOIN_RE.search(sql):
962941
warnings.warn(
963942
"Query uses an implicit cross join (FROM table1, table2). "
@@ -987,8 +966,9 @@ def _query_sql(self, sql: str) -> list[dict[str, Any]]:
987966
.. note::
988967
Endpoint form: ``GET /{entity_set}?sql=<encoded select>``. The client
989968
extracts the logical table name, resolves the entity set (metadata
990-
cached), then issues the request. ``SELECT *`` is automatically
991-
expanded into explicit column names because the server blocks it.
969+
cached), then issues the request. ``SELECT *`` raises
970+
:class:`~PowerPlatform.Dataverse.core.errors.ValidationError` --
971+
it is deliberately rejected, not silently rewritten.
992972
"""
993973
if not isinstance(sql, str):
994974
raise ValidationError("sql must be a string", subcode=VALIDATION_SQL_NOT_STRING)
@@ -1008,13 +988,8 @@ def _query_sql(self, sql: str) -> list[dict[str, Any]]:
1008988
subcode=VALIDATION_SQL_WRITE_BLOCKED,
1009989
)
1010990

1011-
# Extract logical table name via helper (robust to identifiers ending with 'from')
1012-
logical = self._extract_logical_table(sql)
1013-
1014-
# Auto-expand SELECT * into explicit column names
1015-
sql = self._expand_select_star(sql, logical)
1016-
1017-
# Apply safety guardrails (block unsupported syntax, warn on risky patterns)
991+
# Apply safety guardrails (block unsupported syntax, warn on risky patterns).
992+
# SELECT * raises ValidationError here before any table resolution.
1018993
sql = self._sql_guardrails(sql)
1019994

1020995
r = self._execute_raw(self._build_sql(sql))

src/PowerPlatform/Dataverse/operations/query.py

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -107,12 +107,13 @@ def sql(self, sql: str) -> List[Record]:
107107
OFFSET n ROWS FETCH NEXT m ROWS ONLY
108108
COUNT(*), SUM(), AVG(), MIN(), MAX()
109109
110-
``SELECT *`` is automatically expanded into explicit column names
111-
by the SDK (the server rejects ``*`` directly).
110+
``SELECT *`` is not supported -- specify column names explicitly.
111+
Use :meth:`sql_columns` to discover available column names for a table.
112112
113-
Not supported: subqueries, CTE, HAVING, UNION, RIGHT/FULL/CROSS
114-
JOIN, CASE, COALESCE, window functions, string/date/math functions,
115-
INSERT/UPDATE/DELETE. For writes, use ``client.records`` methods.
113+
Not supported: SELECT *, subqueries, CTE, HAVING, UNION,
114+
RIGHT/FULL/CROSS JOIN, CASE, COALESCE, window functions,
115+
string/date/math functions, INSERT/UPDATE/DELETE. For writes, use
116+
``client.records`` methods.
116117
117118
:param sql: Supported SQL SELECT statement.
118119
:type sql: :class:`str`
@@ -140,11 +141,6 @@ def sql(self, sql: str) -> List[Record]:
140141
"GROUP BY a.name"
141142
)
142143
143-
SELECT * (auto-expanded by SDK)::
144-
145-
rows = client.query.sql(
146-
"SELECT * FROM account"
147-
)
148144
"""
149145
with self._client._scoped_odata() as od:
150146
rows = od._query_sql(sql)

0 commit comments

Comments
 (0)