docs: Add DSQL loader operations reference by amaksimo · Pull Request #176 · awslabs/agent-plugins

amaksimo · 2026-05-27T23:25:43Z

Add a comprehensive data-loading reference for the aurora-dsql-loader, covering:

Fresh-vs-warm partition behavior and throughput expectations
Resume/retry mechanics (--manifest-dir, --resume-job-id, --keep-manifest)
Conflict handling with --on-conflict do-nothing
CSV/TSV header handling (--header flag)
Schema inference caveats and --dry-run validation
Index count impact on throughput
Diagnostic decision tree for slow loads

Also adds:

Workflow 3 (Bulk Data Loading) to SKILL.md
Data loading trigger keywords to skill description
Trigger evals for data loading scenarios
Cross-reference from connectivity-tools.md

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

amaksimo · 2026-05-28T22:30:54Z

Functional Eval Results (with-skill, live run)

Ran evals 10-12 with the skill loaded. All 11/11 expectations pass.

Eval	Scenario	Pass Rate	Key Behaviors Verified
10	Loader stuck at 3K rec/s	4/4 ✅	Identifies partition-constrained fresh table, explains warming, does NOT recommend more workers
11	Loader crash, lost manifest	4/4 ✅	Identifies /tmp tmpfs, recommends --on-conflict do-nothing recovery, --manifest-dir prevention
12	Header row parse error	3/3 ✅	Identifies missing --header flag, explains default behavior, gives correct fix

The skill teaches DSQL-loader-specific operational knowledge (partition warming, tmpfs defaults, header flag semantics) that the agent cannot infer from general training data.

anwesham-lab · 2026-05-29T18:34:46Z

  --dry-run
 ```
+
+### Going Deeper


going deeper feels like a strange header, how will the agent understand when it should do this?

anwesham-lab · 2026-05-29T18:35:23Z

+
+---
+
+## Related References


what was the nested benefit with related references in the file as a separate section?

anwesham-lab · 2026-05-29T18:43:55Z


+### [data-loading.md](references/data-loading.md)
+
+**When:** Load when planning or running bulk loads with `aurora-dsql-loader`, or diagnosing loads that come in slower than expected.


"or diagnosing slow load times"

slower than expected is a weird statement structure. subjective eval by the agent.

anwesham-lab · 2026-05-29T18:46:23Z

Code review

DSQL skill PR adds references/data-loading.md (242 lines) + Workflow 3 (Bulk Data Loading) + 3 functional evals + 5 trigger evals. The reference content is well-structured, the SKILL.md routing is coherent, the cross-ref to connectivity-tools.md#data-loading-tools resolves cleanly, and the v3.0.0 --header default-flip is accurately documented. CI fully green at head 4161f369b4927e749153fa28621e44b1251da5d5 (24 checks). One blocking concern (workflow-number drift); rest are content tightening.

20-agent fleet ran per dsql-skill-author Workflow 2. Findings post 5-gate validation:

#	Confidence	Area	Finding	Suggestion
1	95	`references/dsql-lint.md#L70-L75` + `#L104-L108` — correctness / cross-ref drift	The new Workflow 3 (Bulk Data Loading) shifts every later workflow by +1 (3→4, 4→5, …, 8→9). SKILL.md updates its own self-reference at L111 ("Workflow 9 Phase 0") but `dsql-lint.md` still says "Workflow 6's confirmation gate" (L72) and "Use Table Recreation Pattern — see …and Workflow 6" (L106). Table Recreation is now Workflow 7, not 6. The agent will follow the wrong cross-reference for the destructive-DDL confirmation gate. Independently flagged by 4 sub-agents (slots 2, 3, 12, 18).	Update both `dsql-lint.md` references from "Workflow 6" to "Workflow 7". Also see #2 — same renumber issue affects the eval files. Long-term: cross-reference workflows by name (e.g., "the Table Recreation Pattern") rather than by number, so future inserts don't silently desync.
2	90	`tools/evals/databases-on-aws/README.md#L18-L25` + `#L156-L160`, `query_explainability_evals.json#L9-L11`, `run_query_explainability_evals.py#L191-L211` — eval / cross-ref drift	Five additional places still say "Workflow 8" for query plan explainability, which is now Workflow 9: README L20 (`# Workflow 8: query plan diagnostics`), L23 (`# Runner/grader for Workflow 8`), L158 (heading "Query Plan Explainability Functional Evals (Workflow 8)"), `query_explainability_evals.json` L10 (assertion text "Reads all four Workflow 8 reference files…"), and `run_query_explainability_evals.py` L192-193, L210 (comments + `evidence = "All four Workflow 8 references loaded"`). The eval still passes (the L10 assertion is descriptive evidence text, not gate logic), but the eval narrative misnames the workflow.	Bulk-replace "Workflow 8" → "Workflow 9" across all 5 sites, OR rename to slug ("Query Plan Explainability") to break the dependence on the integer.
3	85	`references/data-loading.md#L141-L164` — silent failure / guidance gap	The "Schema Inference Caveats" section lists three real failure modes (mixed nullability → TEXT, ZIP/phone leading-zero loss, non-ISO date → TEXT) but never tells the reader these produce a successful load with no error or warning. The agent treats `--dry-run` as optional and may dismiss the section as "the loader will fall back to TEXT, that's fine." Compare against the "Symptoms of a missing `--header`" section which explicitly states "the most common source of failure" — header errors are loud, schema-inference errors are silent, but the doc treats them with the same weight.	Add an explicit callout at the top of "Schema Inference Caveats": "These produce successful loads with no error or warning — verify via `--dry-run` against any new table." Mirror the structure of the `--header` "Symptoms" subsection.
4	80	`references/data-loading.md#L95-L108` + `#L226-L234` — silent failure / `--on-conflict` semantics	The `--on-conflict do-nothing` section says it skips rows "whose primary key already exists" and lists "typically a primary key" as the unique-constraint requirement. PostgreSQL `ON CONFLICT DO NOTHING` without a target catches any unique-constraint violation, not just the PK. If the target has additional UNIQUE indexes (email, slug, external_id), rows that conflict on those will silently drop. Worse, the recovery path at L226-234 (manifest gone → re-run with `--on-conflict do-nothing`) silently keeps old values if the source has been corrected/updated since the original run — no error, user thinks recovery succeeded.	Add: (a) explicit statement that `do-nothing` triggers on any unique constraint, not just the PK; (b) precondition for the recovery path: "safe only if the source has not changed since the original run; if rows were updated, `do-nothing` keeps the old values." Cross-link to the idempotency requirement in the conflict-handling section.
5	75	`references/data-loading.md#L132` + `#L170-L172`, `#L32` — rot risk / unsourced quantitative claims	Three pinned numbers without citation: (a) `--batch-size (typically 2000)` — pinned to a loader default that may change; (b) "~2× slower with 15 indexes vs 3" + "1 + num_indexes index-entry writes" formula — no source corroborates this in `auth/scaling-guide.md` or upstream loader docs; (c) "fresh table absorbs roughly 3-4K rec/s from a single client" — not in `scaling-guide.md` or DSQL public docs. The 3K rec/s number is even hard-coded into eval 10's grader, so future DSQL improvements that raise the ceiling would mark a correct agent answer wrong.	(a) Drop "(typically 2000)" — say "equal to your `--batch-size`" without the literal. (b) Either cite the source or hedge ("noticeably slower with many indexes"). (c) Soften eval 10 to credit "a few thousand rec/s, partition-constrained" rather than the literal "3K". Hard-coded magic numbers in agent-facing prose decay fastest.
6	75	`tools/evals/databases-on-aws/dsql/data_loading_eval_results.md#L4-L5` — eval rigor / baseline not actually run	The eval-results doc is internally contradictory: line 5 says "agent run with skill loaded vs. agent run without skill (subagent invocation with skill content loaded)" but immediately downgrades baseline to "represents typical LLM behavior without DSQL-loader-specific training data." No `iteration-N/without_skill/{transcript.md, grading.json}` artifacts ship in the diff — only the markdown summary. The dsql-skill-author harness (`eval-harnesses.md`) requires both with-skill and without-skill be subagent runs, with stored transcripts. The "baseline FAIL (2 errors)" verdicts in the table are author-estimated, not measured. The PR comment claiming "11/11 expectations pass" is correct for with-skill (4+4+3=11 verified) but the comparison delta is fictional.	Either (a) commit the actual baseline subagent transcript+grading per iteration under `iteration-N/without_skill/`, OR (b) reword the doc to remove the implied empirical comparison — say "expected baseline behavior" rather than "FAIL (2 errors)". For non-trivial changes like this, also add agent-efficacy benchmark coverage (`AxdbLLMsBenchmarkingExperiments` `CodingTask` + `checks.py` rules) — the skill prescribes both harnesses for substantive additions.
7	70	`tools/evals/databases-on-aws/README.md#L82` — count drift	README says "12 eval prompts, 43 assertions total." Actual count at head is 12 prompts and 42 assertions (verified: `4+4+4+4+4+2+2+4+3+4+4+3 = 42`).	Update "43" → "42".
8	70	`SKILL.md#L3` — trigger / description bloat	`description` grew from 539 to 644 chars (+19.5%) by adding 5 partly-redundant trigger phrases: `aurora-dsql-loader`, `bulk load DSQL`, `DSQL data loading`, `load CSV into DSQL`, plus `load data` in the capability sentence. Three of four contain "DSQL" verbatim, so disambiguation risk is contained, and the new should-not-trigger evals (RDS/Redshift COPY) are well-targeted defenses — but the description is now keyword-stuffed past the agentskills.io ~250-300 char guidance, and trigger-match scoring tends to dilute at this length. Same author received the same feedback on PR #157 (anwesham-lab review threads on redundant trigger phrases).	Drop `bulk load DSQL` and `DSQL data loading` (covered by the `bulk data loading` capability phrase + existing `Aurora DSQL` triggers). Keep `aurora-dsql-loader` and `load CSV into DSQL`.
9	70	`SKILL.md#L218-L225` + `references/data-loading.md#L56` + `#L118` — authoring style / RFC 2119 inconsistency	The rest of the skill uses bolded MUST / SHOULD / MAY densely; the new content uses softer "(strongly recommended)" / "Always set this explicitly" / "If your file has a header row, pass `--header`". Per `dsql-skill-author/authoring-style.md`, RFC 2119 keywords should be bolded and unambiguous — and the doc itself argues `--manifest-dir` and `--header` are critical. Same author was flagged for the same RFC 2119 lapse in past reviews on PR #157 and PR #168.	data-loading.md L56: "(strongly recommended)" → "(MUST set explicitly)". L118: "pass `--header`" → "MUST pass `--header`". connectivity-tools.md L66 "ALWAYS use the loader's schema inference" — bold and qualify, since data-loading.md's three-failure-mode caveats contradict an unqualified ALWAYS.
10	65	PR description / commit body — missing test plan + missing migration callout	PR body has no test plan section, no link to the `data_loading_eval_results.md` artifact added in the same diff, and no callout that Workflow 3-7 → 4-8 + Workflow 8 → 9 renumbering may break external cross-references (per #1, #2 above, this is real). The author's separate comment posts eval results, but a reviewer reading only the body cannot tell what was verified.	Add to PR body: (a) "Test plan" section listing trigger evals, functional evals 10-12, and link to `data_loading_eval_results.md`; (b) "Breaking" callout: "Workflow 3-8 renumbered to 4-9 — see findings #1, #2 for stale references that need follow-up."

Items considered and dropped (audit trail)

Slot 11 claimed plugins/sagemaker-ai/README.md and tools/markdownlint-skill-length.cjs were in the diff — REFUTED. git diff origin/main...HEAD --name-only shows only the 7 expected files. Slot 11 hallucinated; finding dropped.
Slot 12 description-length numbers (391→559) — corrected by slot 10 to 539→644 chars. Direction stands; finding folded into chore(docs): update documentation #8.
Slot 7 "Legacy behavior (v2.x)" rot warning — fair point but the v2/v3 framing is still useful for migrating users; not blocking.
Slot 8 "drop the TOC" simplification — TOC is useful for a 242-line reference file; author followed convention from PR feat(dsql): Add PostgreSQL schema conversion and migration references #168 review feedback (TOCs required for files >150 lines). Drop.
Slot 18 "Part of [DSQL Development Guide]" header convention drift — minor, not blocking.
Slot 13 multi-tenant scoping note + /var/lib/dsql-loader/manifests permissions note — both nice-to-have but not blocking; loader is bulk-import, not query, and the directory permission angle is a thin attack surface.
Slot 14's deletion of "PostgreSQL Compatibility" + "CloudFormation Resource" links from SKILL.md — minor scope drift that's defensible as part of the description-tightening rewrite.
Slot 9 baseline-rigor concern — covered by finding chore(deps): update github-actions: Bump jdx/mise-action from 2 to 3 #6.
Slot 17 second-person hedging in data-loading.md — minor, acceptable tone for descriptive-empirical content.

🤖 Generated with Claude Code — 20-agent fleet per dsql-skill-author Workflow 2 §1 roster, all findings 5-gate validated at head SHA 4161f369b4927e749153fa28621e44b1251da5d5. Confidence threshold ≥ 60.

_{If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Add references/data-loading.md covering aurora-dsql-loader operations: - Fresh-vs-warm partition behavior and throughput expectations - Resume/retry mechanics (--manifest-dir, --resume-job-id) - Conflict handling (--on-conflict do-nothing) - CSV/TSV header handling (--header flag, v3.0.0 default) - Schema inference caveats and --dry-run validation - Index count impact on throughput - Diagnostic decision tree for slow loads SKILL.md changes: - Add Workflow 3: Bulk Data Loading with key constraints - Add data loading to overview and Quick Start - Add trigger keywords (aurora-dsql-loader, bulk load, etc.) - Add data-loading.md reference entry with When/Contains - Add cross-reference from connectivity-tools.md Eval coverage: - 3 should-trigger + 2 should-not-trigger entries in trigger_evals.json - 3 functional evals (IDs 10-12) in evals.json with LLM judge grading - data_loading_eval_results.md with expected with-skill vs baseline comparison demonstrating the skill teaches operational knowledge not in general training data (partition warming, tmpfs defaults, header flag semantics)

- Condense data-loading.md from 242 to 166 lines (remove verbose explanations per reviewer feedback) - Add RFC 2119 directives (MUST/SHOULD) for --manifest-dir, --header, --on-conflict preconditions, and schema inference validation - Add silent-failure callout for schema inference caveats - Fix --on-conflict semantics: triggers on any unique constraint, not just PK; add precondition that source must not have changed for crash recovery - Rename 'Going Deeper' header to 'When to load the full reference' with clear agent trigger condition - Fix 'slower than expected' to 'slow load times' in SKILL.md - Trim redundant description triggers (bulk load DSQL, DSQL data loading) - Fix workflow-number drift: Workflow 6 → 7 in dsql-lint.md, Workflow 8 → 9 in README/evals/runner - Fix README assertion count 43 → 42 - Remove standalone 'Related References' section (cross-ref inlined at top) - Add eval results with baseline vs with-skill comparison

amaksimo requested review from a team, krokoko, scottschreckengaust and theagenticguy May 27, 2026 23:25

amaksimo requested review from a team as code owners May 27, 2026 23:25

amaksimo requested review from Benjscho, Morlej, anwesham-lab, gxjx-x, jaichabria, pkale and praba2210 May 27, 2026 23:25

amaksimo marked this pull request as draft May 27, 2026 23:30

amaksimo changed the title ~~Add DSQL loader operations reference~~ docs: Add DSQL loader operations reference May 28, 2026

amaksimo marked this pull request as ready for review May 28, 2026 22:13

amaksimo assigned anwesham-lab May 28, 2026

amaksimo force-pushed the improve-dsql-loader-docs branch 2 times, most recently from 90abd0d to 4161f36 Compare May 28, 2026 22:30

amaksimo mentioned this pull request May 28, 2026

feat(dsql): Add PostgreSQL schema conversion and migration references #168

Open

anwesham-lab reviewed May 29, 2026

View reviewed changes

anwesham-lab self-requested a review May 29, 2026 18:44

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

anwesham-lab force-pushed the improve-dsql-loader-docs branch from 4161f36 to d093231 Compare May 29, 2026 18:55

anwesham-lab reviewed May 29, 2026

View reviewed changes

Comment thread plugins/databases-on-aws/skills/dsql/references/data-loading.md Outdated

amaksimo force-pushed the improve-dsql-loader-docs branch from 5687cab to 3e1d908 Compare May 29, 2026 19:20

amaksimo force-pushed the improve-dsql-loader-docs branch from 3e1d908 to 86ec0da Compare May 29, 2026 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add DSQL loader operations reference#176

docs: Add DSQL loader operations reference#176
amaksimo wants to merge 2 commits into
mainfrom
improve-dsql-loader-docs

amaksimo commented May 27, 2026 •

edited

Loading

Uh oh!

amaksimo commented May 28, 2026

Uh oh!

anwesham-lab May 29, 2026

Uh oh!

anwesham-lab May 29, 2026

Uh oh!

anwesham-lab May 29, 2026

Uh oh!

anwesham-lab commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### [data-loading.md](references/data-loading.md)

		When: Load when planning or running bulk loads with `aurora-dsql-loader`, or diagnosing loads that come in slower than expected.

Conversation

amaksimo commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amaksimo commented May 28, 2026

Functional Eval Results (with-skill, live run)

Uh oh!

anwesham-lab May 29, 2026

Choose a reason for hiding this comment

Uh oh!

anwesham-lab May 29, 2026

Choose a reason for hiding this comment

Uh oh!

anwesham-lab May 29, 2026

Choose a reason for hiding this comment

Uh oh!

anwesham-lab commented May 29, 2026

Code review

Items considered and dropped (audit trail)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amaksimo commented May 27, 2026 •

edited

Loading