Skip to content

docs: Add DSQL loader operations reference#176

Open
amaksimo wants to merge 2 commits into
mainfrom
improve-dsql-loader-docs
Open

docs: Add DSQL loader operations reference#176
amaksimo wants to merge 2 commits into
mainfrom
improve-dsql-loader-docs

Conversation

@amaksimo
Copy link
Copy Markdown
Contributor

@amaksimo amaksimo commented May 27, 2026

Add a comprehensive data-loading reference for the aurora-dsql-loader, covering:

  • Fresh-vs-warm partition behavior and throughput expectations
  • Resume/retry mechanics (--manifest-dir, --resume-job-id, --keep-manifest)
  • Conflict handling with --on-conflict do-nothing
  • CSV/TSV header handling (--header flag)
  • Schema inference caveats and --dry-run validation
  • Index count impact on throughput
  • Diagnostic decision tree for slow loads

Also adds:

  • Workflow 3 (Bulk Data Loading) to SKILL.md
  • Data loading trigger keywords to skill description
  • Trigger evals for data loading scenarios
  • Cross-reference from connectivity-tools.md

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@amaksimo amaksimo requested review from a team as code owners May 27, 2026 23:25
@amaksimo amaksimo marked this pull request as draft May 27, 2026 23:30
@amaksimo amaksimo changed the title Add DSQL loader operations reference docs: Add DSQL loader operations reference May 28, 2026
@amaksimo amaksimo marked this pull request as ready for review May 28, 2026 22:13
@amaksimo amaksimo force-pushed the improve-dsql-loader-docs branch 2 times, most recently from 90abd0d to 4161f36 Compare May 28, 2026 22:30
@amaksimo
Copy link
Copy Markdown
Contributor Author

Functional Eval Results (with-skill, live run)

Ran evals 10-12 with the skill loaded. All 11/11 expectations pass.

Eval Scenario Pass Rate Key Behaviors Verified
10 Loader stuck at 3K rec/s 4/4 ✅ Identifies partition-constrained fresh table, explains warming, does NOT recommend more workers
11 Loader crash, lost manifest 4/4 ✅ Identifies /tmp tmpfs, recommends --on-conflict do-nothing recovery, --manifest-dir prevention
12 Header row parse error 3/3 ✅ Identifies missing --header flag, explains default behavior, gives correct fix

The skill teaches DSQL-loader-specific operational knowledge (partition warming, tmpfs defaults, header flag semantics) that the agent cannot infer from general training data.

--dry-run
```

### Going Deeper
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going deeper feels like a strange header, how will the agent understand when it should do this?


---

## Related References
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the nested benefit with related references in the file as a separate section?


### [data-loading.md](references/data-loading.md)

**When:** Load when planning or running bulk loads with `aurora-dsql-loader`, or diagnosing loads that come in slower than expected.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or diagnosing slow load times"

slower than expected is a weird statement structure. subjective eval by the agent.

@anwesham-lab anwesham-lab self-requested a review May 29, 2026 18:44
@anwesham-lab
Copy link
Copy Markdown
Member

Code review

DSQL skill PR adds references/data-loading.md (242 lines) + Workflow 3 (Bulk Data Loading) + 3 functional evals + 5 trigger evals. The reference content is well-structured, the SKILL.md routing is coherent, the cross-ref to connectivity-tools.md#data-loading-tools resolves cleanly, and the v3.0.0 --header default-flip is accurately documented. CI fully green at head 4161f369b4927e749153fa28621e44b1251da5d5 (24 checks). One blocking concern (workflow-number drift); rest are content tightening.

20-agent fleet ran per dsql-skill-author Workflow 2. Findings post 5-gate validation:

# Confidence Area Finding Suggestion Reviewed SHA
1 95 references/dsql-lint.md#L70-L75 + #L104-L108correctness / cross-ref drift The new Workflow 3 (Bulk Data Loading) shifts every later workflow by +1 (3→4, 4→5, …, 8→9). SKILL.md updates its own self-reference at L111 ("Workflow 9 Phase 0") but dsql-lint.md still says "Workflow 6's confirmation gate" (L72) and "Use Table Recreation Pattern — see …and Workflow 6" (L106). Table Recreation is now Workflow 7, not 6. The agent will follow the wrong cross-reference for the destructive-DDL confirmation gate. Independently flagged by 4 sub-agents (slots 2, 3, 12, 18). Update both dsql-lint.md references from "Workflow 6" to "Workflow 7". Also see #2 — same renumber issue affects the eval files. Long-term: cross-reference workflows by name (e.g., "the Table Recreation Pattern") rather than by number, so future inserts don't silently desync.
2 90 tools/evals/databases-on-aws/README.md#L18-L25 + #L156-L160, query_explainability_evals.json#L9-L11, run_query_explainability_evals.py#L191-L211eval / cross-ref drift Five additional places still say "Workflow 8" for query plan explainability, which is now Workflow 9: README L20 (# Workflow 8: query plan diagnostics), L23 (# Runner/grader for Workflow 8), L158 (heading "Query Plan Explainability Functional Evals (Workflow 8)"), query_explainability_evals.json L10 (assertion text "Reads all four Workflow 8 reference files…"), and run_query_explainability_evals.py L192-193, L210 (comments + evidence = "All four Workflow 8 references loaded"). The eval still passes (the L10 assertion is descriptive evidence text, not gate logic), but the eval narrative misnames the workflow. Bulk-replace "Workflow 8" → "Workflow 9" across all 5 sites, OR rename to slug ("Query Plan Explainability") to break the dependence on the integer.
3 85 references/data-loading.md#L141-L164silent failure / guidance gap The "Schema Inference Caveats" section lists three real failure modes (mixed nullability → TEXT, ZIP/phone leading-zero loss, non-ISO date → TEXT) but never tells the reader these produce a successful load with no error or warning. The agent treats --dry-run as optional and may dismiss the section as "the loader will fall back to TEXT, that's fine." Compare against the "Symptoms of a missing --header" section which explicitly states "the most common source of failure" — header errors are loud, schema-inference errors are silent, but the doc treats them with the same weight. Add an explicit callout at the top of "Schema Inference Caveats": "These produce successful loads with no error or warning — verify via --dry-run against any new table." Mirror the structure of the --header "Symptoms" subsection.
4 80 references/data-loading.md#L95-L108 + #L226-L234silent failure / --on-conflict semantics The --on-conflict do-nothing section says it skips rows "whose primary key already exists" and lists "typically a primary key" as the unique-constraint requirement. PostgreSQL ON CONFLICT DO NOTHING without a target catches any unique-constraint violation, not just the PK. If the target has additional UNIQUE indexes (email, slug, external_id), rows that conflict on those will silently drop. Worse, the recovery path at L226-234 (manifest gone → re-run with --on-conflict do-nothing) silently keeps old values if the source has been corrected/updated since the original run — no error, user thinks recovery succeeded. Add: (a) explicit statement that do-nothing triggers on any unique constraint, not just the PK; (b) precondition for the recovery path: "safe only if the source has not changed since the original run; if rows were updated, do-nothing keeps the old values." Cross-link to the idempotency requirement in the conflict-handling section.
5 75 references/data-loading.md#L132 + #L170-L172, #L32rot risk / unsourced quantitative claims Three pinned numbers without citation: (a) --batch-size (typically 2000) — pinned to a loader default that may change; (b) "~2× slower with 15 indexes vs 3" + "1 + num_indexes index-entry writes" formula — no source corroborates this in auth/scaling-guide.md or upstream loader docs; (c) "fresh table absorbs roughly 3-4K rec/s from a single client" — not in scaling-guide.md or DSQL public docs. The 3K rec/s number is even hard-coded into eval 10's grader, so future DSQL improvements that raise the ceiling would mark a correct agent answer wrong. (a) Drop "(typically 2000)" — say "equal to your --batch-size" without the literal. (b) Either cite the source or hedge ("noticeably slower with many indexes"). (c) Soften eval 10 to credit "a few thousand rec/s, partition-constrained" rather than the literal "3K". Hard-coded magic numbers in agent-facing prose decay fastest.
6 75 tools/evals/databases-on-aws/dsql/data_loading_eval_results.md#L4-L5eval rigor / baseline not actually run The eval-results doc is internally contradictory: line 5 says "agent run with skill loaded vs. agent run without skill (subagent invocation with skill content loaded)" but immediately downgrades baseline to "represents typical LLM behavior without DSQL-loader-specific training data." No iteration-N/without_skill/{transcript.md, grading.json} artifacts ship in the diff — only the markdown summary. The dsql-skill-author harness (eval-harnesses.md) requires both with-skill and without-skill be subagent runs, with stored transcripts. The "baseline FAIL (2 errors)" verdicts in the table are author-estimated, not measured. The PR comment claiming "11/11 expectations pass" is correct for with-skill (4+4+3=11 verified) but the comparison delta is fictional. Either (a) commit the actual baseline subagent transcript+grading per iteration under iteration-N/without_skill/, OR (b) reword the doc to remove the implied empirical comparison — say "expected baseline behavior" rather than "FAIL (2 errors)". For non-trivial changes like this, also add agent-efficacy benchmark coverage (AxdbLLMsBenchmarkingExperiments CodingTask + checks.py rules) — the skill prescribes both harnesses for substantive additions.
7 70 tools/evals/databases-on-aws/README.md#L82count drift README says "12 eval prompts, 43 assertions total." Actual count at head is 12 prompts and 42 assertions (verified: 4+4+4+4+4+2+2+4+3+4+4+3 = 42). Update "43" → "42".
8 70 SKILL.md#L3trigger / description bloat description grew from 539 to 644 chars (+19.5%) by adding 5 partly-redundant trigger phrases: aurora-dsql-loader, bulk load DSQL, DSQL data loading, load CSV into DSQL, plus load data in the capability sentence. Three of four contain "DSQL" verbatim, so disambiguation risk is contained, and the new should-not-trigger evals (RDS/Redshift COPY) are well-targeted defenses — but the description is now keyword-stuffed past the agentskills.io ~250-300 char guidance, and trigger-match scoring tends to dilute at this length. Same author received the same feedback on PR #157 (anwesham-lab review threads on redundant trigger phrases). Drop bulk load DSQL and DSQL data loading (covered by the bulk data loading capability phrase + existing Aurora DSQL triggers). Keep aurora-dsql-loader and load CSV into DSQL.
9 70 SKILL.md#L218-L225 + references/data-loading.md#L56 + #L118authoring style / RFC 2119 inconsistency The rest of the skill uses bolded MUST / SHOULD / MAY densely; the new content uses softer "(strongly recommended)" / "Always set this explicitly" / "If your file has a header row, pass --header". Per dsql-skill-author/authoring-style.md, RFC 2119 keywords should be bolded and unambiguous — and the doc itself argues --manifest-dir and --header are critical. Same author was flagged for the same RFC 2119 lapse in past reviews on PR #157 and PR #168. data-loading.md L56: "(strongly recommended)" → "(MUST set explicitly)". L118: "pass --header" → "MUST pass --header". connectivity-tools.md L66 "ALWAYS use the loader's schema inference" — bold and qualify, since data-loading.md's three-failure-mode caveats contradict an unqualified ALWAYS.
10 65 PR description / commit body — missing test plan + missing migration callout PR body has no test plan section, no link to the data_loading_eval_results.md artifact added in the same diff, and no callout that Workflow 3-7 → 4-8 + Workflow 8 → 9 renumbering may break external cross-references (per #1, #2 above, this is real). The author's separate comment posts eval results, but a reviewer reading only the body cannot tell what was verified. Add to PR body: (a) "Test plan" section listing trigger evals, functional evals 10-12, and link to data_loading_eval_results.md; (b) "Breaking" callout: "Workflow 3-8 renumbered to 4-9 — see findings #1, #2 for stale references that need follow-up."

Items considered and dropped (audit trail)

  • Slot 11 claimed plugins/sagemaker-ai/README.md and tools/markdownlint-skill-length.cjs were in the diff — REFUTED. git diff origin/main...HEAD --name-only shows only the 7 expected files. Slot 11 hallucinated; finding dropped.
  • Slot 12 description-length numbers (391→559) — corrected by slot 10 to 539→644 chars. Direction stands; finding folded into chore(docs): update documentation #8.
  • Slot 7 "Legacy behavior (v2.x)" rot warning — fair point but the v2/v3 framing is still useful for migrating users; not blocking.
  • Slot 8 "drop the TOC" simplification — TOC is useful for a 242-line reference file; author followed convention from PR feat(dsql): Add PostgreSQL schema conversion and migration references #168 review feedback (TOCs required for files >150 lines). Drop.
  • Slot 18 "Part of [DSQL Development Guide]" header convention drift — minor, not blocking.
  • Slot 13 multi-tenant scoping note + /var/lib/dsql-loader/manifests permissions note — both nice-to-have but not blocking; loader is bulk-import, not query, and the directory permission angle is a thin attack surface.
  • Slot 14's deletion of "PostgreSQL Compatibility" + "CloudFormation Resource" links from SKILL.md — minor scope drift that's defensible as part of the description-tightening rewrite.
  • Slot 9 baseline-rigor concern — covered by finding chore(deps): update github-actions: Bump jdx/mise-action from 2 to 3 #6.
  • Slot 17 second-person hedging in data-loading.md — minor, acceptable tone for descriptive-empirical content.

🤖 Generated with Claude Code — 20-agent fleet per dsql-skill-author Workflow 2 §1 roster, all findings 5-gate validated at head SHA 4161f369b4927e749153fa28621e44b1251da5d5. Confidence threshold ≥ 60.

If this code review was useful, please react with 👍. Otherwise, react with 👎.

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated
Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated
Add references/data-loading.md covering aurora-dsql-loader operations:
- Fresh-vs-warm partition behavior and throughput expectations
- Resume/retry mechanics (--manifest-dir, --resume-job-id)
- Conflict handling (--on-conflict do-nothing)
- CSV/TSV header handling (--header flag, v3.0.0 default)
- Schema inference caveats and --dry-run validation
- Index count impact on throughput
- Diagnostic decision tree for slow loads

SKILL.md changes:
- Add Workflow 3: Bulk Data Loading with key constraints
- Add data loading to overview and Quick Start
- Add trigger keywords (aurora-dsql-loader, bulk load, etc.)
- Add data-loading.md reference entry with When/Contains
- Add cross-reference from connectivity-tools.md

Eval coverage:
- 3 should-trigger + 2 should-not-trigger entries in trigger_evals.json
- 3 functional evals (IDs 10-12) in evals.json with LLM judge grading
- data_loading_eval_results.md with expected with-skill vs baseline
  comparison demonstrating the skill teaches operational knowledge
  not in general training data (partition warming, tmpfs defaults,
  header flag semantics)
@anwesham-lab anwesham-lab force-pushed the improve-dsql-loader-docs branch from 4161f36 to d093231 Compare May 29, 2026 18:55
Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated
@amaksimo amaksimo force-pushed the improve-dsql-loader-docs branch from 5687cab to 3e1d908 Compare May 29, 2026 19:20
- Condense data-loading.md from 242 to 166 lines (remove verbose
  explanations per reviewer feedback)
- Add RFC 2119 directives (MUST/SHOULD) for --manifest-dir, --header,
  --on-conflict preconditions, and schema inference validation
- Add silent-failure callout for schema inference caveats
- Fix --on-conflict semantics: triggers on any unique constraint, not
  just PK; add precondition that source must not have changed for
  crash recovery
- Rename 'Going Deeper' header to 'When to load the full reference'
  with clear agent trigger condition
- Fix 'slower than expected' to 'slow load times' in SKILL.md
- Trim redundant description triggers (bulk load DSQL, DSQL data
  loading)
- Fix workflow-number drift: Workflow 6 → 7 in dsql-lint.md,
  Workflow 8 → 9 in README/evals/runner
- Fix README assertion count 43 → 42
- Remove standalone 'Related References' section (cross-ref inlined
  at top)
- Add eval results with baseline vs with-skill comparison
@amaksimo amaksimo force-pushed the improve-dsql-loader-docs branch from 3e1d908 to 86ec0da Compare May 29, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants