Skip to content

Commit 77868ad

Browse files
committed
docs: explicitly describe ABC audit pipeline in reports
1 parent 4ed88cc commit 77868ad

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

docs/WHITE_PAPER_REPORT_V2.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1100,6 +1100,8 @@ The negative correlation between output tokens and reward (-0.187) suggests that
11001100
| **Infrastructure confounds** | Error fingerprinting (12 patterns) separates infra failures from task failures |
11011101
| **Preamble effects** | V5 preamble isolated: leads with truncation constraint, avoids prescriptive workflow |
11021102
1103+
In addition to the six-dimension QA framework, we run an explicit ABC audit via `scripts/abc_audit.py` that scores criteria across three dimensions: **Task Validity**, **Outcome Validity**, and **Reporting**. The ABC audit is used as a structured benchmark-quality gate and complements pre-flight/runtime task validation.
1104+
11031105
### 12.2 External Validity
11041106
11051107
| Threat | Mitigation |
@@ -1257,6 +1259,8 @@ Major architectural decisions emerged through iterative dialogue:
12571259
12581260
### Appendix E: QA Audit Framework (6 Dimensions)
12591261
1262+
This report uses two complementary audit layers: (1) the operational six-dimension QA audit below, and (2) an explicit ABC audit (`scripts/abc_audit.py`) across Task Validity, Outcome Validity, and Reporting.
1263+
12601264
| Dimension | Focus | Example Finding |
12611265
| --------------------------------------- | --------------------------------------------- | ----------------------------------------- |
12621266
| **1. Instruction Contamination** | MCP/SG refs in baseline instructions | 30/156 instructions had SG refs (cleaned) |

docs/technical_reports/TECHNICAL_REPORT_V1.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1100,6 +1100,8 @@ The negative correlation between output tokens and reward (-0.187) suggests that
11001100
| **Infrastructure confounds** | Error fingerprinting (12 patterns) separates infra failures from task failures |
11011101
| **Preamble effects** | V5 preamble isolated: leads with truncation constraint, avoids prescriptive workflow |
11021102
1103+
In addition to the six-dimension QA framework, we run an explicit ABC audit via `scripts/abc_audit.py` that scores criteria across three dimensions: **Task Validity**, **Outcome Validity**, and **Reporting**. The ABC audit is used as a structured benchmark-quality gate and complements pre-flight/runtime task validation.
1104+
11031105
### 12.2 External Validity
11041106
11051107
| Threat | Mitigation |
@@ -1257,6 +1259,8 @@ Major architectural decisions emerged through iterative dialogue:
12571259
12581260
### Appendix E: QA Audit Framework (6 Dimensions)
12591261
1262+
This report uses two complementary audit layers: (1) the operational six-dimension QA audit below, and (2) an explicit ABC audit (`scripts/abc_audit.py`) across Task Validity, Outcome Validity, and Reporting.
1263+
12601264
| Dimension | Focus | Example Finding |
12611265
| --------------------------------------- | --------------------------------------------- | ----------------------------------------- |
12621266
| **1. Instruction Contamination** | MCP/SG refs in baseline instructions | 30/156 instructions had SG refs (cleaned) |

0 commit comments

Comments
 (0)