Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .claude/rules/unit-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,19 @@
3. Re-run CI to verify it's not caused by your changes
4. Open an issue to track the flaky test if not already tracked

### No Excuses for Test Failures (BLOCKING)

**"Pre-existing failure" is NOT a valid excuse.** Any broken test in your branch IS your responsibility:

- Do not claim "tests were already failing before my changes"
- Do not dismiss failures as "not caused by my change"
- Do not proceed with commits or pushes when tests fail

**When encountering test failures:**
1. Fix them, even if they predate your changes
2. If truly unrelated, stash your work, fix on a separate branch, and merge
3. The 100% pass rate policy has no exceptions

## Core Principles

### What NOT to Test
Expand Down
201 changes: 201 additions & 0 deletions PRDs/20251231-saxon-jdom2-removal/PRD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# PRD: Remove Saxon and JDOM2 Dependencies

## Problem Statement

The `schemagen` module depends on Saxon-HE (~5MB) and JDOM2 + jaxen (~500KB) for XML schema generation. These dependencies add significant size to the distribution and introduce external library dependencies where standard Java XML APIs would suffice.

### Current State

- **Saxon-HE** is used solely for an XSLT identity transform that adds indentation to generated XML schemas
- **xmlresolver** is a transitive dependency of Saxon (comment in pom.xml: "for saxon")
- **JDOM2** is used for:
- Parsing XSD resource files containing datatype definitions
- Executing XPath queries to extract schema elements
- Writing DOM elements to XMLStreamWriter via StAXStreamOutputter
- **jaxen** provides XPath support for JDOM2

### XSD Resource Files

The following XSD files are loaded and processed via JDOM2 XPath:

| Resource | Loaded By | XPath Query |
|----------|-----------|-------------|
| `/schema/xml/metaschema-datatypes.xsd` | `XmlCoreDatatypeProvider` | `/xs:schema/xs:simpleType` |
| `/schema/xml/metaschema-markup-line.xsd` | `XmlMarkupLineDatatypeProvider` | `/xs:schema/*` |
| `/schema/xml/metaschema-markup-multiline.xsd` | `XmlMarkupMultilineDatatypeProvider` | `/xs:schema/*` |
| `/schema/xml/metaschema-prose-base.xsd` | `XmlProseBaseDatatypeProvider` | `/xs:schema/xs:simpleType` |

These files are packaged in the `core` module and accessed via `IModule.class.getResourceAsStream()`.

### Why This Matters

1. **Dependency bloat**: Saxon-HE alone is ~5MB, larger than many core modules
2. **Maintenance burden**: External dependencies require version updates and security monitoring
3. **Standard alternatives exist**: Java's built-in DOM, XPath, and Transformer APIs provide equivalent functionality
4. **Performance**: Current approach buffers entire schema for XSLT post-processing; streaming indentation is more efficient

## Goals

1. Remove Saxon-HE dependency from the project
2. Remove JDOM2 and jaxen dependencies from the project
3. Replace with standard Java XML APIs (no new dependencies)
4. Maintain identical schema generation output (except minor whitespace differences in documentation)
5. Improve performance through streaming indentation

## Non-Goals

- Changing the structure or content of generated schemas
- Modifying the public API of the schemagen module
- Adding new XML processing capabilities

## Development Methodology

### Test-Driven Development (MANDATORY)

All implementation MUST follow strict TDD:

1. **Write tests first** - Before any implementation code exists
2. **Watch tests fail** - Verify tests fail for the expected reason (not compilation errors)
3. **Write minimal code** - Implement just enough to make tests pass
4. **Refactor** - Clean up while keeping tests green

### TDD Sequence for This PRD

| Component | Test First | Then Implement |
|-----------|------------|----------------|
| `IndentingXMLStreamWriter` | Test indentation behavior with mock writer | Implement wrapper class |
| `XmlSchemaLoader` | Test XPath queries return expected elements | Implement DOM/XPath loader |
| `DomDatatypeContent` | Test DOM element serialization to XMLStreamWriter | Implement serialization |
| `XmlSchemaGenerator` changes | Verify existing tests pass with new approach | Remove Saxon, use IndentingXMLStreamWriter |

### Test Requirements

- **Characterization tests first**: Before replacing any existing code, write tests that capture current behavior
- **Verify tests pass**: With existing JDOM2/Saxon implementation
- **New classes**: 100% test coverage for public methods
- **Behavioral equivalence**: New implementations must pass the same tests as old implementations
- **Edge cases**: Empty documents, missing elements, malformed XML handling
- **Integration**: End-to-end schema generation tests must produce equivalent output

### Text Production Testing (CRITICAL)

The `IndentingXMLStreamWriter` must be tested against ALL XML text productions to ensure content is not corrupted:

| Production | Test Requirement |
|------------|------------------|
| Element content | Proper indentation at each nesting level |
| Text content | NO added whitespace - text must be preserved exactly |
| Mixed content | Text + child elements must not have spurious whitespace |
| CDATA sections | Content must not be modified |
| Comments | Properly indented, content preserved |
| Processing instructions | Properly indented |
| Attributes | No indentation effect |
| XHTML in xs:documentation | Inline elements (`<b>`, `<i>`) must not gain whitespace |

**Why this matters**: The original Saxon XSLT used `suppress-indentation="xhtml:b xhtml:p"` specifically to prevent whitespace corruption in schema documentation. Our replacement must handle this correctly through mixed content detection.

## Requirements

### Functional Requirements

1. **FR-1**: XML schemas generated after the change must be semantically equivalent to those generated before
2. **FR-2**: All existing schemagen tests must pass without modification (except whitespace assertions if any)
3. **FR-3**: XSD datatype resources must continue to be loaded and processed correctly
4. **FR-4**: Generated schemas must be properly indented for readability

### Technical Requirements

1. **TR-1**: Create `IndentingXMLStreamWriter` wrapper for streaming indentation
2. **TR-2**: Replace JDOM2 XML parsing with `javax.xml.parsers.DocumentBuilder`
3. **TR-3**: Replace JDOM2 XPath with `javax.xml.xpath.XPath`
4. **TR-4**: Replace JDOM2 element serialization with `javax.xml.transform.Transformer`
5. **TR-5**: Update `module-info.java` to remove Saxon.HE and org.jdom2 requirements
6. **TR-6**: Remove dependency declarations from pom.xml files

Comment on lines +97 to +114
Copy link

@coderabbitai coderabbitai bot Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add technical requirements for namespace handling, encoding, and StAX configuration.

The technical requirements (TR-1 through TR-6) are functional but omit critical implementation details:

  1. TR-3 (XPath): The XSD files use namespace prefixes (xs:schema, xs:simpleType). How will the XPath instance be configured with a NamespaceContext to resolve these prefixes? This must be documented before implementation begins.
  2. Missing TR: Character encoding strategy—will output be UTF-8? How will the XMLStreamWriter factory be configured to enforce encoding?
  3. Missing TR: StAX provider selection—will the implementation use the JDK default or explicitly depend on Woodstox?

These are blocking details for the XmlSchemaLoader and IndentingXMLStreamWriter implementations.

🤖 Prompt for AI Agents
In PRDs/20251231-saxon-jdom2-removal/PRD.md around lines 97 to 114, the
Technical Requirements (TR-1..TR-6) omit critical implementation details for
namespace resolution, character encoding, and StAX provider selection; update
the PRD to (a) add a TR that mandates how XPath will be configured with a
NamespaceContext mapping the XSD prefixes (e.g., "xs") to the XML Schema
namespace and document the location where that mapping will be implemented for
XmlSchemaLoader, (b) add a TR that specifies the character encoding for all
schema output (use UTF-8) and requires XmlStreamWriter/Transformer factories to
be configured to enforce that encoding, and (c) add a TR specifying the StAX
provider policy—either rely on the JDK default or require an explicit dependency
(e.g., Woodstox) and document how the XMLInputFactory/XMLStreamWriter will be
created to guarantee consistent behavior; make these new TRs concise and
reference the affected components (XmlSchemaLoader, IndentingXMLStreamWriter,
and build POMs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai The implementation has already addressed these concerns:

  1. Namespace handling: XmlSchemaLoader uses a NamespaceContext implementation in the getContent() method that maps prefixes (including 'xs') to the appropriate namespaces from the provided namespace map.

  2. Character encoding: UTF-8 is enforced via the XMLStreamWriter factory configuration. The XmlSchemaGenerator.getXmlOutputFactory() configures the output factory with UTF-8 encoding.

  3. StAX provider: The project uses Woodstox (com.ctc.wstx) as specified in the module-info.java. This is a transitive dependency through the existing stax2-api dependency.

Since the PRD is a planning document and the implementation is complete and working, I'm leaving the PRD as-is to reflect the original planning state. The actual implementation details are documented in the code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-waltermire Thank you for the clarification. You're absolutely right—since the PRD is a planning document and the implementation is complete with all these concerns properly addressed in the code, it makes sense to preserve the PRD in its original planning state.

The implementation approach you've outlined (NamespaceContext in XmlSchemaLoader, UTF-8 configuration in the factory, and Woodstox via transitive dependency) is sound, and having those details documented in the actual implementation is the appropriate place for them.


🧠 Learnings used
Learnt from: david-waltermire
Repo: metaschema-framework/metaschema-java PR: 567
File: metaschema-testing/src/main/java/gov/nist/csrc/ns/metaschema/test_suite/_1_0/Metaschema.java:41-47
Timestamp: 2025-12-24T21:21:40.208Z
Learning: In metaschema-framework/metaschema-java, generated binding classes in package gov.nist.csrc.ns.metaschema.test_suite._1_0 (and similar generated binding packages) are pre-generated by metaschema-maven-plugin and checked into source control. Javadoc coverage issues in these generated classes should be tracked as code generator improvements rather than file-level issues, and improvements are deferred to generator enhancements.

Learnt from: CR
Repo: metaschema-framework/metaschema-java PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T13:27:43.679Z
Learning: Applies to **/metapath/**/*.java : Metapath is an implementation of XPath 3.1. Use the XPath 3.1 specification (https://www.w3.org/TR/xpath-31/) and XPath Functions 3.1 (https://www.w3.org/TR/xpath-functions-31/) as authoritative reference when implementing new functions, fixing bugs, or understanding error handling. Raise clarification before making changes if implementation differs from spec.

## Success Metrics

| Metric | Target |
|--------|--------|
| Dependencies removed | 4 (Saxon-HE, xmlresolver, jdom2, jaxen) |
| New dependencies added | 0 |
| Existing tests passing | 100% |
| JAR size reduction | ~5.5MB |
| Build verification | `mvn clean install -PCI -Prelease` passes |

## Risks and Mitigations

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| Whitespace differences in output | High | Low | Document as expected; only affects formatting in xs:documentation |
| XPath behavior differences | Low | Medium | Comprehensive test coverage for XSD loading |
| Performance regression | Low | Low | Streaming approach should be faster than buffered XSLT |

## Dependencies

- No blocking dependencies on other work
- This change is isolated to the `schemagen` module

## Design Decisions

### Interface Compatibility

The implementation will use the standard `XMLStreamWriter` interface rather than Woodstox's `XMLStreamWriter2` extension. Analysis shows that only standard `XMLStreamWriter` methods are used:

- `writeStartDocument`, `writeEndDocument`
- `writeStartElement`, `writeEndElement`
- `writeDefaultNamespace`, `writeNamespace`
- `writeAttribute`
- `flush`

This ensures compatibility with any StAX implementation.

### Mixed Content Detection

The `IndentingXMLStreamWriter` will use dynamic mixed content detection rather than element-specific suppression:

1. Track `hasText` flag per element level using a stack
2. When `writeCharacters()` is called with non-whitespace text, set `hasText = true`
3. When `hasText` is true, suppress indentation for child elements
4. When closing an element, pop the stack to restore parent's state

This approach:
- Is simpler than the Saxon XSLT's `suppress-indentation="xhtml:b xhtml:p"` approach
- Works correctly for any inline elements, not just a hardcoded list
- Automatically handles mixed content regardless of element names

### Line Endings and Configurability

| Setting | Value | Rationale |
|---------|-------|-----------|
| Line ending | `\n` (Unix) | Consistent across platforms; matches Saxon output |
| Indent size | 2 spaces | Fixed; matches existing output; no configurability needed |

### Acceptable Whitespace Differences

The following whitespace differences between Saxon XSLT and IndentingXMLStreamWriter output are acceptable:

1. **xs:documentation content**: Saxon preserves original formatting; new approach adds structure indentation
2. **Empty element spacing**: Minor differences in element-only content are acceptable
3. **Trailing whitespace**: Any trailing whitespace differences are acceptable

Semantic equivalence (XML parses to identical DOM) is required; formatting differences are acceptable.

## Existing Test Coverage

### Current Tests
- `XmlSuiteTest` - Integration tests for XML schema generation (uses JDOM2 for assertions)
- `JsonSuiteTest` - JSON schema generation (unaffected by this change)
- `MetaschemaModuleTest` - Module loading tests

### Tests Requiring Update
- `XmlSuiteTest` uses JDOM2 (`StAXEventBuilder`, `XPathExpression`) for test assertions
- These must be converted to standard DOM/XPath APIs

### New Tests Required
- `IndentingXMLStreamWriterTest` - Comprehensive text production tests
- `XmlSchemaLoaderTest` - Characterization tests for XPath queries
- `DomDatatypeContentTest` - DOM serialization tests

## Related Documents

- [Implementation Plan](./implementation-plan.md)
Loading
Loading