Skip to content

Commit 7b233c5

Browse files
ShapeParser: accept naics(...) / psc(...) as canonical (tango#2266)
Tango server (makegov/tango#2259) now accepts both spellings as expand aliases: - `naics(code,description)` — canonical - `naics_code(code,description)` — alias rewritten to `naics` at parse time - Same pair for `psc(...)` / `psc_code(...)`. Pre-fix the SDK's ShapeParser rejected the canonical server form and demanded the `_code` alias — exactly opposite to the server. Mirror the server's `_EXPAND_ALIASES` map locally so both spellings are accepted client-side when used as expansions (with parens or wildcards). Bare scalar leaves `naics_code` / `psc_code` are untouched, matching server semantics. Also added explicit `naics` / `psc` expand entries to Contract, Forecast, Opportunity, Notice, and Vehicle schemas in `tango/shapes/explicit_schemas.py` so the canonical form validates locally too (IDV already had them). Tests: 11 new cases in `TestShapeParserExpandAliases`. 261/261 unit tests pass on Python 3.12 and 3.13. (Python 3.14 in the worktree's default venv has pre-existing typing-related failures unrelated to this change.) Refs: makegov/tango#2266, makegov/tango#2259 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c7c7731 commit 7b233c5

4 files changed

Lines changed: 245 additions & 2 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
> Combined release: API parity (formerly tracked as PR #25) + subject-based
11-
> webhook removal (formerly tracked as PR #27 / issue #2275). Bumped to
10+
> Combined release: API parity (formerly tracked as PR #25), subject-based
11+
> webhook removal (formerly tracked as PR #27 / issue #2275), and shape-validator
12+
> alias support (formerly tracked as PR #28 / issue #2266). Bumped to
1213
> `0.7.0` because removing `create_webhook_subscription` /
1314
> `update_webhook_subscription` / `delete_webhook_subscription` /
1415
> `list_webhook_subscriptions` / `get_webhook_subscription` and the
@@ -45,6 +46,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
4546
- `TangoClient._post()` and `_patch()` now accept both `json_data=` (positional) and `json=` (keyword) for backward compatibility. Internal callers and docs examples that use `json=` no longer fail with `TypeError`.
4647
- `tango webhooks endpoints create` CLI now accepts and requires `--name` (passed through to `create_webhook_endpoint(name=...)`). Previously the option was absent, meaning the CLI could never set a custom endpoint name and every call would 400 server-side (the server enforces `unique(user, name)`).
4748
- `WebhookAlert.query_type` and `WebhookAlert.filters` tightened from `Optional` to non-optional (`str` and `dict[str, Any]` respectively). Legacy nullable rows were purged by the tango#2275 migration; the server model and serializer guarantee non-null values for all current data. `WebhookAlert.status` narrowed from `str` to `Literal["active", "paused"]` — the server serializer produces exactly those two values.
49+
- **Shape validator agrees with server on `naics(...)` / `psc(...)` expansions.** The client-side `ShapeParser.validate()` previously rejected the canonical `shape=naics(code,description)` form (which the server has always accepted) and also rejected the alias `shape=naics_code(code,description)`. The parser now mirrors the server's `_EXPAND_ALIASES` (introduced in Tango PR makegov/tango#2259) and rewrites `naics_code(...)` / `psc_code(...)` to their canonical `naics(...)` / `psc(...)` form at parse time. Bare scalar leaves (`shape=naics_code` / `shape=psc_code`) are left untouched and still return the raw column value, matching the server. Schemas for `Contract`, `Forecast`, `Opportunity`, `Notice`, and `Vehicle` gained explicit `naics` / `psc` expand entries backed by the existing `CodeDescription` nested model. Fixes makegov/tango#2266.
4850

4951
## [0.6.0] - 2026-05-07
5052

tango/shapes/explicit_schemas.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,11 @@
327327
),
328328
"major_program": FieldSchema(name="major_program", type=str, is_optional=True, is_list=False),
329329
"naics_code": FieldSchema(name="naics_code", type=int, is_optional=True, is_list=False),
330+
# Expand form: shape=naics(code,description). Server PR #2259 also accepts
331+
# naics_code(...) as an alias which the SDK parser normalizes to naics.
332+
"naics": FieldSchema(
333+
name="naics", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
334+
),
330335
"number_of_actions": FieldSchema(
331336
name="number_of_actions", type=int, is_optional=True, is_list=False
332337
),
@@ -355,6 +360,11 @@
355360
name="price_evaluation_percent_difference", type=str, is_optional=True, is_list=False
356361
),
357362
"psc_code": FieldSchema(name="psc_code", type=str, is_optional=True, is_list=False),
363+
# Expand form: shape=psc(code,description). Server PR #2259 also accepts
364+
# psc_code(...) as an alias which the SDK parser normalizes to psc.
365+
"psc": FieldSchema(
366+
name="psc", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
367+
),
358368
"purchase_card_as_payment_method": FieldSchema(
359369
name="purchase_card_as_payment_method", type=str, is_optional=True, is_list=False
360370
),
@@ -572,6 +582,11 @@
572582
"id": FieldSchema(name="id", type=int, is_optional=False, is_list=False),
573583
"is_active": FieldSchema(name="is_active", type=bool, is_optional=True, is_list=False),
574584
"naics_code": FieldSchema(name="naics_code", type=str, is_optional=True, is_list=False),
585+
# Expand form: shape=naics(code,description). Server PR #2259 also accepts
586+
# naics_code(...) as an alias which the SDK parser normalizes to naics.
587+
"naics": FieldSchema(
588+
name="naics", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
589+
),
575590
"place_of_performance": FieldSchema(
576591
name="place_of_performance", type=str, is_optional=False, is_list=False
577592
),
@@ -611,6 +626,11 @@
611626
),
612627
"meta": FieldSchema(name="meta", type=dict, is_optional=False, is_list=False),
613628
"naics_code": FieldSchema(name="naics_code", type=int, is_optional=True, is_list=False),
629+
# Expand form: shape=naics(code,description). Server PR #2259 also accepts
630+
# naics_code(...) as an alias which the SDK parser normalizes to naics.
631+
"naics": FieldSchema(
632+
name="naics", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
633+
),
614634
"notice_history": FieldSchema(
615635
name="notice_history",
616636
type=dict,
@@ -631,6 +651,11 @@
631651
name="primary_contact", type=dict, is_optional=True, is_list=False, nested_model="Contact"
632652
),
633653
"psc_code": FieldSchema(name="psc_code", type=str, is_optional=True, is_list=False),
654+
# Expand form: shape=psc(code,description). Server PR #2259 also accepts
655+
# psc_code(...) as an alias which the SDK parser normalizes to psc.
656+
"psc": FieldSchema(
657+
name="psc", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
658+
),
634659
"response_deadline": FieldSchema(
635660
name="response_deadline", type=datetime, is_optional=False, is_list=False
636661
),
@@ -654,10 +679,20 @@
654679
name="last_updated", type=datetime, is_optional=False, is_list=False
655680
),
656681
"naics_code": FieldSchema(name="naics_code", type=str, is_optional=True, is_list=False),
682+
# Expand form: shape=naics(code,description). Server PR #2259 also accepts
683+
# naics_code(...) as an alias which the SDK parser normalizes to naics.
684+
"naics": FieldSchema(
685+
name="naics", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
686+
),
657687
"notice_id": FieldSchema(name="notice_id", type=str, is_optional=False, is_list=False),
658688
"opportunity": FieldSchema(name="opportunity", type=dict, is_optional=False, is_list=False),
659689
"posted_date": FieldSchema(name="posted_date", type=datetime, is_optional=False, is_list=False),
660690
"psc_code": FieldSchema(name="psc_code", type=str, is_optional=False, is_list=False),
691+
# Expand form: shape=psc(code,description). Server PR #2259 also accepts
692+
# psc_code(...) as an alias which the SDK parser normalizes to psc.
693+
"psc": FieldSchema(
694+
name="psc", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
695+
),
661696
"response_deadline": FieldSchema(
662697
name="response_deadline", type=datetime, is_optional=False, is_list=False
663698
),
@@ -1120,7 +1155,17 @@
11201155
),
11211156
"opportunity_id": FieldSchema(name="opportunity_id", type=str, is_optional=True, is_list=False),
11221157
"naics_code": FieldSchema(name="naics_code", type=int, is_optional=True, is_list=False),
1158+
# Expand form: shape=naics(code,description). Server PR #2259 also accepts
1159+
# naics_code(...) as an alias which the SDK parser normalizes to naics.
1160+
"naics": FieldSchema(
1161+
name="naics", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
1162+
),
11231163
"psc_code": FieldSchema(name="psc_code", type=str, is_optional=True, is_list=False),
1164+
# Expand form: shape=psc(code,description). Server PR #2259 also accepts
1165+
# psc_code(...) as an alias which the SDK parser normalizes to psc.
1166+
"psc": FieldSchema(
1167+
name="psc", type=dict, is_optional=True, is_list=False, nested_model="CodeDescription"
1168+
),
11241169
"set_aside": FieldSchema(name="set_aside", type=str, is_optional=True, is_list=False),
11251170
# Shaping expansions
11261171
"awardees": FieldSchema(

tango/shapes/parser.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,62 @@
2626
from tango.shapes.models import FieldSpec, ShapeSpec
2727
from tango.shapes.schema import SchemaRegistry
2828

29+
# Global expand-name aliases. Mirrors the server's `_EXPAND_ALIASES` in
30+
# `api/shaping/grammar.py` (Tango PR #2259, issue #2266). Keys are user-typed
31+
# alias names; values are the canonical names returned by the API and used by
32+
# the schema registry. The rewrite only fires when the alias is used as an
33+
# *expansion* (has nested fields or a wildcard) — bare scalars like
34+
# `shape=naics_code` are left alone and continue to return the raw column.
35+
#
36+
# The canonical name (``naics`` / ``psc``) becomes the output key on the
37+
# response regardless of which spelling the caller used. Keep this list short:
38+
# aliases are for well-known historical spellings, not naming inconsistencies.
39+
_EXPAND_ALIASES: dict[str, str] = {
40+
"naics_code": "naics",
41+
"psc_code": "psc",
42+
}
43+
44+
45+
def _normalize_expand_aliases(fields: list[FieldSpec]) -> None:
46+
"""Rewrite expand-form alias names to their canonical form, in place.
47+
48+
Walks ``fields`` recursively. A field is treated as an "expansion" (and
49+
therefore eligible for alias rewriting) when it has ``nested_fields`` or
50+
``is_wildcard`` set. Bare scalar leaves are left untouched so callers can
51+
still request the raw column value via ``shape=naics_code``.
52+
53+
If both the alias and its canonical name appear as expansions at the same
54+
level (e.g. ``shape=naics(code),naics_code(description)``), the canonical
55+
wins and the alias entry is dropped silently — this matches the server's
56+
behavior and avoids emitting two output keys for the same data.
57+
58+
Args:
59+
fields: List of FieldSpec objects to normalize (mutated in place).
60+
"""
61+
# First pass: collect names of expand-form fields at this level so we can
62+
# detect canonical/alias collisions before rewriting.
63+
expand_names = {f.name for f in fields if f.nested_fields or f.is_wildcard}
64+
65+
# Second pass: rewrite or drop aliases, then recurse into nested fields.
66+
rewritten: list[FieldSpec] = []
67+
for field in fields:
68+
is_expand = bool(field.nested_fields) or field.is_wildcard
69+
canonical = _EXPAND_ALIASES.get(field.name) if is_expand else None
70+
71+
if canonical is not None:
72+
if canonical in expand_names and canonical != field.name:
73+
# Canonical already requested at this level — drop the alias.
74+
continue
75+
field.name = canonical
76+
77+
if field.nested_fields:
78+
_normalize_expand_aliases(field.nested_fields)
79+
80+
rewritten.append(field)
81+
82+
# Replace the contents of the input list (caller holds the reference).
83+
fields[:] = rewritten
84+
2985

3086
def _suggest_field_correction(invalid_field: str, valid_fields: list[str]) -> str | None:
3187
"""Suggest a correction for an invalid field name
@@ -148,6 +204,10 @@ def parse(self, shape: str) -> ShapeSpec:
148204
# Parse the shape
149205
try:
150206
fields = self._parse_field_list(shape, 0)[0]
207+
# Rewrite expand-form aliases (e.g. naics_code(...) -> naics(...))
208+
# to their canonical names. Mirrors server's `_EXPAND_ALIASES` so
209+
# both spellings are accepted client-side. See issue #2266.
210+
_normalize_expand_aliases(fields)
151211
shape_spec = ShapeSpec(fields=fields)
152212

153213
# Cache the result

tests/test_shapes.py

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,142 @@ def test_validate_wildcard_always_valid(self):
465465
parser.validate(spec, MockModel) # Should not raise
466466

467467

468+
class TestShapeParserExpandAliases:
469+
"""Test naics_code/psc_code -> naics/psc expand-alias normalization.
470+
471+
Mirrors the server's `_EXPAND_ALIASES` map (Tango PR #2259, issue #2266):
472+
when used as an expansion (with parens / wildcard), `naics_code(...)` is
473+
rewritten to `naics(...)` and `psc_code(...)` to `psc(...)`. Bare scalar
474+
leaves are left alone so `shape=naics_code` still returns the raw column.
475+
"""
476+
477+
def test_canonical_naics_expand_accepted_on_contract(self):
478+
"""Canonical `naics(code,description)` validates against Contract."""
479+
parser = ShapeParser()
480+
spec = parser.parse("naics(code,description)")
481+
482+
assert spec.fields[0].name == "naics"
483+
assert spec.fields[0].nested_fields is not None
484+
assert [f.name for f in spec.fields[0].nested_fields] == ["code", "description"]
485+
486+
parser.validate(spec, Contract) # Should not raise
487+
488+
def test_alias_naics_code_expand_rewritten_to_naics(self):
489+
"""Alias form `naics_code(code,description)` is rewritten to `naics`."""
490+
parser = ShapeParser()
491+
spec = parser.parse("naics_code(code,description)")
492+
493+
# Name is rewritten at parse time so downstream type generation and
494+
# factory parsing see the canonical key the server returns.
495+
assert spec.fields[0].name == "naics"
496+
assert [f.name for f in spec.fields[0].nested_fields] == ["code", "description"]
497+
498+
parser.validate(spec, Contract) # Should not raise
499+
500+
def test_canonical_psc_expand_accepted_on_contract(self):
501+
"""Canonical `psc(code,description)` validates against Contract."""
502+
parser = ShapeParser()
503+
spec = parser.parse("psc(code,description)")
504+
505+
assert spec.fields[0].name == "psc"
506+
parser.validate(spec, Contract) # Should not raise
507+
508+
def test_alias_psc_code_expand_rewritten_to_psc(self):
509+
"""Alias form `psc_code(code,description)` is rewritten to `psc`."""
510+
parser = ShapeParser()
511+
spec = parser.parse("psc_code(code,description)")
512+
513+
assert spec.fields[0].name == "psc"
514+
parser.validate(spec, Contract) # Should not raise
515+
516+
def test_bare_naics_code_scalar_is_not_rewritten(self):
517+
"""Bare scalar `naics_code` keeps its name (returns raw column)."""
518+
parser = ShapeParser()
519+
spec = parser.parse("naics_code")
520+
521+
# Scalar form is NOT touched — the alias only fires for expansions.
522+
assert spec.fields[0].name == "naics_code"
523+
assert spec.fields[0].nested_fields is None
524+
assert spec.fields[0].is_wildcard is False
525+
526+
parser.validate(spec, Contract) # naics_code is a real scalar field
527+
528+
def test_bare_psc_code_scalar_is_not_rewritten(self):
529+
"""Bare scalar `psc_code` keeps its name (returns raw column)."""
530+
parser = ShapeParser()
531+
spec = parser.parse("psc_code")
532+
533+
assert spec.fields[0].name == "psc_code"
534+
assert spec.fields[0].nested_fields is None
535+
536+
parser.validate(spec, Contract)
537+
538+
def test_alias_naics_code_wildcard_expand_rewritten(self):
539+
"""Wildcard expansion `naics_code(*)` is rewritten to `naics(*)`."""
540+
parser = ShapeParser()
541+
spec = parser.parse("naics_code(*)")
542+
543+
assert spec.fields[0].name == "naics"
544+
assert spec.fields[0].is_wildcard is True
545+
546+
parser.validate(spec, Contract) # Should not raise
547+
548+
def test_alias_collision_drops_alias_keeps_canonical(self):
549+
"""When both `naics(...)` and `naics_code(...)` appear, canonical wins.
550+
551+
Matches server behavior — emitting two output keys for the same data
552+
would surprise callers, so the alias entry is dropped silently.
553+
"""
554+
parser = ShapeParser()
555+
spec = parser.parse("naics(code),naics_code(description)")
556+
557+
# Only one entry should remain — the canonical `naics` one.
558+
names = [f.name for f in spec.fields]
559+
assert names == ["naics"]
560+
assert [f.name for f in spec.fields[0].nested_fields] == ["code"]
561+
562+
def test_scalar_and_expand_alias_coexist(self):
563+
"""Scalar `naics_code` and expand `naics_code(...)` both survive.
564+
565+
The expand gets rewritten to `naics`; the scalar stays as
566+
`naics_code`. They're now distinct keys with distinct meanings —
567+
the scalar returns the raw int/str, the expand returns the dict.
568+
"""
569+
parser = ShapeParser()
570+
spec = parser.parse("key,naics_code,naics_code(code,description)")
571+
572+
names = [f.name for f in spec.fields]
573+
assert names == ["key", "naics_code", "naics"]
574+
assert spec.fields[1].nested_fields is None # scalar
575+
assert spec.fields[2].nested_fields is not None # expand
576+
577+
def test_alias_rewrite_applies_in_nested_expansions(self):
578+
"""Aliases nested inside another expansion are also rewritten.
579+
580+
The parent expansion field is unrelated; we just want to confirm the
581+
normalization walks recursively through ``nested_fields``.
582+
"""
583+
parser = ShapeParser()
584+
# `recipient` is a valid expansion on Contract; nest a naics_code
585+
# alias inside to confirm the walk recurses.
586+
spec = parser.parse("recipient(uei,display_name),naics_code(code)")
587+
588+
assert [f.name for f in spec.fields] == ["recipient", "naics"]
589+
assert spec.fields[1].nested_fields[0].name == "code"
590+
591+
def test_alias_accepted_on_opportunity(self):
592+
"""Server accepts the alias on opportunities too — schema covers it."""
593+
# Use a model class that exists; Contract is already covered above.
594+
# Smoke-test that the validator finds `naics` on a couple of schemas
595+
# that previously only had `naics_code`.
596+
from tango.models import Opportunity
597+
598+
parser = ShapeParser()
599+
spec = parser.parse("naics_code(code,description)")
600+
assert spec.fields[0].name == "naics"
601+
parser.validate(spec, Opportunity) # Should not raise
602+
603+
468604
class TestShapeParserCaching:
469605
"""Test shape parser caching"""
470606

0 commit comments

Comments
 (0)