Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- **`--color` flag for CI log output control** — Controls ANSI color output with `--color <never|always|auto>`. Use `--color=never` to suppress ANSI codes in CI logs (GitHub Actions, GitLab CI), `--color=always` to force colors in piped output, `--color=auto` (default) to auto-detect based on terminal. Respects `NO_COLOR=1` environment variable.

- **`# Errors` sections for core public APIs** — Added `# Errors` sections to documentation for core public APIs per Rust API Guidelines C409:
- `parse_unified_diff`
- `compile_rules`
- `RuleOverrideMatcher::compile`
- `run_check`

- **`bench` crate for performance benchmarking** — Criterion-based benchmark infrastructure:
- Parsing benchmarks: measures `parse_unified_diff()` at 0, 100, 1K, 10K, 100K lines
- Evaluation benchmarks: measures `evaluate_lines()` at 0, 1, 10, 100, 500 rules
Expand Down Expand Up @@ -56,6 +62,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Affects all output formats (markdown, SARIF, GitLab Quality JSON, JUnit, CSV)
- Rationale: enterprises need to onboard existing codebases without flagging pre-existing issues

### Internal

- **Extracted duplicated `escape_xml` function** from `checkstyle.rs` and `junit.rs` into shared `xml_utils.rs` module

## [0.2.0] - 2026-04-06

### Added
Expand Down
110 changes: 110 additions & 0 deletions adr-011-parse-blame-porcelain-result.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# ADR-011: Remove Unnecessary Result Wrapper from parse_blame_porcelain

**Status:** Accepted

**Date:** 2026-04-11

**Work Item:** work-430b0729

---

## Context

Issue #141 reports that `parse_blame_porcelain` in `crates/diffguard/src/main.rs` (line 1768) is typed to return `Result<BTreeMap<u32, BlameLineMeta>>` but never actually returns `Err`. This is dead code — the function silently skips invalid entries via `continue` rather than propagating errors, and always reaches `Ok(out)` at line 1818.

Clippy detects this pattern with the `unnecessary_result_bool` lint (or equivalent): *"this function's return value is unnecessarily wrapped by `Result`"*.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

What is the exact Clippy lint name for functions that unnecessarily return Result (i.e., can never return Err)?

💡 Result:

The exact Clippy lint name is unnecessary_wraps. It detects private functions that only return Ok or Some, meaning they unnecessarily wrap values in Result or Option when no Err or None is ever returned.

Citations:


Update Clippy lint name in ADR.

The ADR references unnecessary_result_bool, which is not a Clippy lint. The correct lint name is unnecessary_wraps—it detects private functions that unnecessarily return Result or Option when they never return Err or None. Update line 15 to use the accurate lint name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adr-011-parse-blame-porcelain-result.md` at line 15, Update the ADR text to
use the correct Clippy lint name: replace the incorrect
`unnecessary_result_bool` mention with `unnecessary_wraps` so the document
accurately states that the lint detecting private functions that unnecessarily
return Result/Option is `unnecessary_wraps`; ensure the change is applied to the
sentence that currently references `unnecessary_result_bool` (the one describing
Clippy detecting functions whose return values are unnecessarily wrapped).


---

## Decision

Change `parse_blame_porcelain` to return `BTreeMap<u32, BlameLineMeta>` directly, removing the `Result` wrapper.

### Changes Required

1. **Function signature (line 1768):**
```rust
// Before
fn parse_blame_porcelain(blame_text: &str) -> Result<BTreeMap<u32, BlameLineMeta>>

// After
fn parse_blame_porcelain(blame_text: &str) -> BTreeMap<u32, BlameLineMeta>
```

2. **Return expression (line 1818):**
```rust
// Before
Ok(out)

// After
out
```

3. **Caller in `collect_blame_allowed_lines` (lines 1861-1862):**
```rust
// Before
let blame_map = parse_blame_porcelain(&blame_text)
.with_context(|| format!("parse git blame for {}", path))?;

// After
let blame_map = parse_blame_porcelain(&blame_text);
```

4. **Test at line 4068:**
```rust
// Before
let map = parse_blame_porcelain(porcelain).expect("parse");

// After
let map = parse_blame_porcelain(porcelain);
```

### Rationale for Silent-Skip Behavior

The parsing logic skips malformed entries rather than failing because:
- Git blame output for files with unusual content (binary, untrusted encoding) may contain partial/invalid entries
- The function is used to extract allowed-line metadata for diff checking — incomplete data is tolerable, hard failure is not
- This behavior is established and users depend on it

---

## Alternatives Considered

### 1. Keep Result and document the never-err case
Adding a comment like `// SAFETY: this function never returns Err` would suppress the lint but leave unnecessary complexity for callers.

### 2. Return Option instead of bare BTreeMap
`Option<BTreeMap<u32, BlameLineMeta>>` would allow `None` for parse failures, but no call site checks for `Err` so `None` would be equally unused. The bare type is cleaner.

### 3. Make the function return Result and propagate real errors
Adding proper error propagation would be a breaking change to the call sites' logic and is out of scope for this fix.

---

## Consequences

**Positive:**
- Removes dead error-handling code from callers
- Eliminates Clippy lint
- Improves code clarity — readers know the function cannot fail
- Removes `.expect()` from test, making test failure messages cleaner

**Negative:**
- None — this is purely a refactor with no behavioral change

**Neutral:**
- The `anyhow::Result` type alias used throughout the crate remains; this only affects one function's return type

---

## Files Affected

- `crates/diffguard/src/main.rs` — function definition (line 1768), return (line 1818), caller (lines 1861-1862), test (line 4068)

---

## Verification

After applying changes:
1. Run `cargo clippy -p diffguard` — confirm no lint warnings related to `parse_blame_porcelain`
2. Run `cargo test -p diffguard` — confirm all tests pass, especially `parse_blame_porcelain_extracts_line_metadata`
9 changes: 9 additions & 0 deletions crates/diffguard-core/src/check.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,15 @@ pub enum PathFilterError {
},
}

/// Run a policy check over a unified diff text.
///
/// # Errors
///
/// Returns an error if:
/// - The diff text cannot be parsed ([`diffguard_diff::DiffParseError`])
/// - Path filter globs are invalid ([`PathFilterError`])
/// - Rule compilation fails ([`diffguard_domain::RuleCompileError`])
/// - Override compilation fails ([`diffguard_domain::OverrideCompileError`])
pub fn run_check(
plan: &CheckPlan,
config: &diffguard_types::ConfigFile,
Expand Down
32 changes: 1 addition & 31 deletions crates/diffguard-core/src/checkstyle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

use std::collections::BTreeMap;

use super::xml_utils::escape_xml;
use diffguard_types::{CheckReceipt, Finding, Severity};

/// Renders a CheckReceipt as a Checkstyle XML report.
Expand Down Expand Up @@ -77,24 +78,6 @@ pub fn render_checkstyle_for_receipt(receipt: &CheckReceipt) -> String {
out
}

/// Escape characters that have special meaning in XML.
///
/// Required for: description, message, path, rule_id, and any other text content.
fn escape_xml(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'&' => out.push_str("&amp;"),
'<' => out.push_str("&lt;"),
'>' => out.push_str("&gt;"),
'"' => out.push_str("&quot;"),
'\'' => out.push_str("&apos;"),
_ => out.push(c),
}
}
out
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down Expand Up @@ -290,17 +273,4 @@ mod tests {
assert!(xml.contains("<checkstyle version=\"5.0\">"));
assert!(xml.contains("</checkstyle>"));
}

#[test]
fn escape_xml_handles_all_special_chars() {
assert_eq!(escape_xml("&"), "&amp;");
assert_eq!(escape_xml("<"), "&lt;");
assert_eq!(escape_xml(">"), "&gt;");
assert_eq!(escape_xml("\""), "&quot;");
assert_eq!(escape_xml("'"), "&apos;");
assert_eq!(
escape_xml("a&b<c>d\"e'f"),
"a&amp;b&lt;c&gt;d&quot;e&apos;f"
);
}
}
28 changes: 1 addition & 27 deletions crates/diffguard-core/src/junit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

use std::collections::BTreeMap;

use super::xml_utils::escape_xml;
use diffguard_types::{CheckReceipt, Finding, Severity};

/// Renders a CheckReceipt as a JUnit XML report.
Expand Down Expand Up @@ -103,22 +104,6 @@ pub fn render_junit_for_receipt(receipt: &CheckReceipt) -> String {
out
}

/// Escapes special XML characters in a string.
fn escape_xml(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'&' => out.push_str("&amp;"),
'<' => out.push_str("&lt;"),
'>' => out.push_str("&gt;"),
'"' => out.push_str("&quot;"),
'\'' => out.push_str("&apos;"),
_ => out.push(c),
}
}
out
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down Expand Up @@ -327,15 +312,4 @@ mod tests {
let xml = render_junit_for_receipt(&receipt);
insta::assert_snapshot!(xml);
}

#[test]
fn escape_xml_handles_all_special_chars() {
assert_eq!(escape_xml("&"), "&amp;");
assert_eq!(escape_xml("<"), "&lt;");
assert_eq!(escape_xml(">"), "&gt;");
assert_eq!(escape_xml("\""), "&quot;");
assert_eq!(escape_xml("'"), "&apos;");
assert_eq!(escape_xml("normal text"), "normal text");
assert_eq!(escape_xml("<a & b>"), "&lt;a &amp; b&gt;");
}
}
1 change: 1 addition & 0 deletions crates/diffguard-core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ mod render;
mod sarif;
mod sensor;
mod sensor_api;
pub mod xml_utils;

pub use check::{CheckPlan, CheckRun, PathFilterError, run_check};
pub use checkstyle::render_checkstyle_for_receipt;
Expand Down
86 changes: 86 additions & 0 deletions crates/diffguard-core/src/xml_utils.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
//! XML utility functions for diffguard output formatters.
//!
//! Provides shared XML escaping functionality used by JUnit, Checkstyle,
//! and other XML-based output formats.

/// Escapes special XML characters and illegal control characters in a string.
///
/// Handles:
/// - 5 named XML entities: `&`, `<`, `>`, `"`, `'`
/// - Illegal control characters (0x00-0x1F except tab/LF/CR) as `&#xNN;` entities
///
/// Legal control characters (tab=0x09, LF=0x0A, CR=0x0D) are preserved as-is
/// since they are allowed in XML character content.
pub fn escape_xml(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'&' => out.push_str("&amp;"),
'<' => out.push_str("&lt;"),
'>' => out.push_str("&gt;"),
'"' => out.push_str("&quot;"),
'\'' => out.push_str("&apos;"),
// Illegal XML control characters (0x00-0x1F except tab/LF/CR)
c if c <= '\u{001F}' && c != '\t' && c != '\n' && c != '\r' => {
out.push_str(&format!("&#x{:X};", c as u32));
}
Comment on lines +24 to +26
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In XML 1.0, are numeric character references like &#x0; or &#x1B; valid for forbidden control characters U+0000-U+001F (except tab, LF, CR)?

💡 Result:

No, in XML 1.0, numeric character references like � (U+0000) or � (U+001B) are not valid for forbidden control characters in the range U+0000-U+001F (except U+0009 tab, U+000A LF, U+000D CR). The XML 1.0 specification defines valid characters in production [2] Char as #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]. This excludes all other control characters in U+0000-U+001F except the three exceptions. Section 4.1 specifies a well-formedness constraint [WFC: Legal Character]: "Characters referred to using character references must match the production for Char." Therefore, references to non-Char codepoints, including � and �, are invalid and must be rejected by conforming processors. U+0000 is always invalid. For the other forbidden controls (e.g., U+001B), they are not legal Chars, so their NCRs are forbidden in XML 1.0. XML 1.1 introduced support for NCRs to these controls (except U+0000), but the question specifies XML 1.0. This is confirmed by multiple sources quoting the spec, including Wikipedia, Stack Overflow discussions citing the XML 1.1 rationale (which explicitly states these are "forbidden in XML 1.0"), and W3C pages on control characters.

Citations:


Numeric character references for forbidden XML 1.0 control characters are invalid and will break XML parsers.

The code at lines 24–25 encodes forbidden control characters (U+0000–U+001F except tab, LF, CR) as numeric character references like &#xNN;. However, in XML 1.0, these numeric references are not valid even though they reference valid Unicode codepoints. Section 4.1 of the XML 1.0 specification requires that "Characters referred to using character references must match the production for Char," which explicitly excludes these control characters. Conforming XML parsers will reject output containing such references.

Replace the encoding with dropping the forbidden characters:

Proposed fix
-            c if c <= '\u{001F}' && c != '\t' && c != '\n' && c != '\r' => {
-                out.push_str(&format!("&#x{:X};", c as u32));
-            }
+            // Forbidden XML 1.0 control characters—drop them
+            '\u{0000}'..='\u{0008}'
+            | '\u{000B}'
+            | '\u{000C}'
+            | '\u{000E}'..='\u{001F}' => {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
c if c <= '\u{001F}' && c != '\t' && c != '\n' && c != '\r' => {
out.push_str(&format!("&#x{:X};", c as u32));
}
// Forbidden XML 1.0 control characters—drop them
'\u{0000}'..='\u{0008}'
| '\u{000B}'
| '\u{000C}'
| '\u{000E}'..='\u{001F}' => {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/diffguard-core/src/xml_utils.rs` around lines 24 - 26, The match arm
that currently encodes control chars (the branch matching c if c <= '\u{001F}'
&& c != '\t' && c != '\n' && c != '\r') must stop emitting numeric character
references and instead drop those forbidden XML 1.0 control characters; update
the logic in that branch (the code that manipulates out using
out.push_str(&format!("&#x{:X};", c as u32))) to simply skip/ignore the
character so nothing is appended to out. Keep all other escaping behavior
unchanged.

_ => out.push(c),
}
}
out
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn escape_xml_handles_all_special_chars() {
assert_eq!(escape_xml("&"), "&amp;");
assert_eq!(escape_xml("<"), "&lt;");
assert_eq!(escape_xml(">"), "&gt;");
assert_eq!(escape_xml("\""), "&quot;");
assert_eq!(escape_xml("'"), "&apos;");
assert_eq!(escape_xml("normal text"), "normal text");
assert_eq!(escape_xml("<a & b>"), "&lt;a &amp; b&gt;");
}

#[test]
fn escape_xml_escapes_illegal_control_chars() {
// NUL
let result = escape_xml("a\x00b");
assert!(result.contains("&#x0;"));
assert!(!result.contains('\x00'));

// BEL (0x07)
let result = escape_xml("a\x07b");
assert!(result.contains("&#x7;"));

// ESC (0x1B)
let result = escape_xml("a\x1Bb");
assert!(result.contains("&#x1B;"));
}

#[test]
fn escape_xml_preserves_legal_control_chars() {
// Tab
let result = escape_xml("a\tb");
assert!(result.contains('\t'));
assert!(!result.contains("&#x9;"));

// LF
let result = escape_xml("a\nb");
assert!(result.contains('\n'));
assert!(!result.contains("&#xA;"));

// CR
let result = escape_xml("a\rb");
assert!(result.contains('\r'));
assert!(!result.contains("&#xD;"));
}

#[test]
fn escape_xml_empty_string() {
assert_eq!(escape_xml(""), "");
}
}
Loading
Loading