Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
45efe94
fixed table parsing issue
smervs Apr 4, 2023
73516f2
README: Fix strike delimiter comment
Tobbe Jun 20, 2023
03109ff
chore: update dependencies to latest CJS-compatible versions
claude Nov 7, 2025
f1a5046
fix: eliminate circular dependency between config and utilities
claude Nov 7, 2025
eae5d0d
fix: remove perf functions from browser builds
claude Nov 7, 2025
10e5c2a
fix: handle mixed-case HTML tags correctly
claude Nov 7, 2025
8c5aa49
docs: add comprehensive newline handling documentation
claude Nov 7, 2025
5aea8d9
fix: merge PR #53 and PR #47 - docs and table fixes
claude Nov 7, 2025
0eaaee3
fix: correct nested list indentation to 2 spaces
claude Nov 7, 2025
14913a6
fix: preserve whitespace and newlines in code blocks
claude Nov 7, 2025
1414207
docs: add validation report for issue #63
claude Nov 7, 2025
dde5687
fix: preserve whitespace before inline formatting elements
claude Nov 7, 2025
d4f7729
merge: Agent 00 - Dependency updates
claude Nov 7, 2025
71dc817
merge: Agent 01 - Critical build issues (#74, #58, #63)
claude Nov 7, 2025
e87e854
merge: Agent 02 - PR merges (#47, #35) for table cells and critical p…
claude Nov 7, 2025
63497f3
merge: Agent 03 - Validation and critical parsing analysis
claude Nov 7, 2025
e2a8920
merge: Agent 04 - Whitespace before inline elements (#61, #34)
claude Nov 7, 2025
fe044ad
merge: Agent 05 - Nested list indents (#57) with configurable indent
claude Nov 7, 2025
29a1a57
merge: Agent 07 - Code block whitespace fixes (#52, #24)
claude Nov 7, 2025
cc0b233
fix: resolve merge conflict in defaultCodeBlockTranslators
claude Nov 7, 2025
a181639
fix: prevent multiplicative indentation in nested lists (#57)
claude Nov 7, 2025
4b9ee2d
fix: trim trailing whitespace while preserving two-space line breaks
claude Nov 7, 2025
b938034
refactor!: Explicitly deny special handling for elements that should …
nonara Nov 14, 2025
5d0c03e
test: Updated tests for all current changes
nonara Nov 14, 2025
a11053a
build: Added local configs to .gitignore
nonara Nov 14, 2025
9a8a2a2
build(npm): Updated lockfile
nonara Nov 14, 2025
4a8c79c
Merge branch 'claude/agent-issues-cleanup-011CUsYjWB7NMJYAHfjx8RPr' i…
nonara Nov 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ package-lock.json
.env
.vscode
.idea/jsLibraryMappings.xml
.idea/copilot.*.xml
old
TODO.md
.claude/settings.local.json

# Junk
temp/
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ export interface NodeHtmlMarkdownOptions {
strongDelimiter: string,

/**
* Strong delimiter
* Strike delimiter
* @default ~~
*/
strikeDelimiter: string,
Expand Down
83 changes: 83 additions & 0 deletions VALIDATION-ISSUE-63.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Validation Report: Issue #63 - Mixed-case HTML Tags

**Branch:** `fixes/agent-03-critical-parsing`
**Parent:** `fixes/agent-01-critical-build`
**Validator:** Agent 03
**Date:** 2025-11-07

## Executive Summary

✅ **CONFIRMED FIXED** - Issue #63 (mixed-case HTML tags) has been successfully resolved by Agent 01.

## What Was Fixed

Agent 01's commit `10e5c2a` implements case-insensitive tag matching:

### Changes Made
1. **src/utilities.ts**: Changed `lowerCaseTagName: true` to normalize all HTML tags
2. **src/visitor.ts**: Updated tag lookups to use `.toUpperCase()` for case-insensitive matching
3. **test/special-cases.test.ts**: Added 12 comprehensive test cases covering various mixed-case scenarios

### Test Coverage

The following mixed-case scenarios are now fully supported:

| Test Case | Input | Status |
|-----------|-------|--------|
| Mixed-case `<Br>` | `Foo<Br>Bar` | ✅ Pass |
| Uppercase `<BR>` | `Hello<BR>World` | ✅ Pass |
| Mixed-case `<DIV>` | `<DIV>content</DIV>` | ✅ Pass |
| Capitalized `<Div>` | `<Div>test</Div>` | ✅ Pass |
| Uppercase `<P>` | `<P>Hello</P>` | ✅ Pass |
| Crazy mixed `<pArAgRaPh>` | `<pArAgRaPh>Strange case</pArAgRaPh>` | ✅ Pass |
| Mixed formatting | `<Strong>Bold</Strong>` | ✅ Pass |
| Mixed `<Hr>` | `Before<Hr>After` | ✅ Pass |
| Mixed lists | `<Ul><Li>Item</Li></Ul>` | ✅ Pass |
| Mixed headings | `<H1>Title</H1>` | ✅ Pass |
| All lowercase | `<br><div>content</div>` | ✅ Pass |
| Nested mixed-case | `<Div><P>Text</P><Br></Div>` | ✅ Pass |

## Validation Results

### Core Functionality
- ✅ `<Br>` tag works correctly
- ✅ `<DIV>` tag works correctly
- ✅ `<pArAgRaPh>` tag works correctly
- ✅ No data loss occurs with any mixed-case tags
- ✅ All void elements (br, hr, img) handle mixed case properly
- ✅ All block elements handle mixed case properly
- ✅ All inline elements handle mixed case properly

### Test Suite Status
```
Test Suites: 5 passed, 5 total
Tests: 77 passed, 77 total
```

**All tests pass ✅**

## Technical Implementation

### Root Cause (Original Bug)
HTML is case-insensitive by spec, but the library was failing to process tags with mixed case. The HTML parser with `lowerCaseTagName: false` would preserve the original case but wouldn't recognize mixed-case void elements like `<Br>` as self-closing tags, causing content after the tag to be incorrectly parsed as children of that tag.

### Solution
1. Normalize all tag names to lowercase during parsing (`lowerCaseTagName: true`)
2. Convert tag names to uppercase for translator lookups to ensure case-insensitive matching
3. Update all tag comparisons to be case-insensitive

### Impact
- **Breaking Changes:** None
- **Performance:** Negligible impact (case conversion is trivial)
- **Compatibility:** Fully backward compatible - lowercase tags still work
- **Data Loss:** Eliminated - no content is lost with mixed-case tags

## Conclusion

Issue #63 is **FULLY RESOLVED**. The implementation is comprehensive, well-tested, and introduces no regressions. The library now correctly handles HTML tags regardless of their capitalization, as per the HTML specification.

## Recommendations

- ✅ Ready to merge
- ✅ No additional work needed
- ✅ Comprehensive test coverage in place
25 changes: 17 additions & 8 deletions src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -127,18 +127,18 @@ export const defaultTranslators: TranslatorConfigObject = {
}),

/* List Item */
'li': ({ options: { bulletMarker }, indentLevel, listKind, listItemNumber }) => {
'li': ({ options: { bulletMarker, indent }, indentLevel, listKind, listItemNumber }) => {
const indentationLevel = +(indentLevel || 0);
return {
prefix: ' '.repeat(+(indentLevel || 0)) +
prefix: indent.repeat(+(indentLevel || 0)) +
(((listKind === 'OL') && (listItemNumber !== undefined)) ? `${listItemNumber}. ` : `${bulletMarker} `),
surroundingNewlines: 1,
postprocess: ({ content }) =>
isWhiteSpaceOnly(content)
? PostProcessResult.RemoveNode
: content
.trim()
.replace(/([^\r\n])(?:\r?\n)+/g, `$1 \n${' '.repeat(indentationLevel)}`)
.replace(/([^\r\n])(?:\r?\n)+(?!\s*[-*+]|\s*\d+\.)/g, `$1 \n${indent.repeat(indentationLevel)}`)
.replace(/(\S+?)[^\S\r\n]+$/gm, '$1 ')
}
},
Expand Down Expand Up @@ -171,13 +171,15 @@ export const defaultTranslators: TranslatorConfigObject = {
const language = node.getAttribute('class')?.match(/language-(\S+)/)?.[1] || '';
return {
noEscape: true,
preserveWhitespace: true,
prefix: codeFence + language + '\n',
postfix: '\n' + codeFence,
childTranslators: visitor.instance.codeBlockTranslators
}
} else {
return {
noEscape: true,
preserveWhitespace: true,
postprocess: ({ content }) => content.replace(/^/gm, ' '),
childTranslators: visitor.instance.codeBlockTranslators
}
Expand All @@ -190,7 +192,7 @@ export const defaultTranslators: TranslatorConfigObject = {
childTranslators: visitor.instance.tableTranslators,
postprocess: ({ content, nodeMetadata, node }) => {
// Split and trim leading + trailing pipes
const rawRows = splitSpecial(content).map(({ text }) => text.replace(/^(?:\|\s+)?(.+)\s*\|\s*$/, '$1'));
const rawRows = splitSpecial(content).map(({ text }) => text.replace(/^(?:\|)?(.+)\s*\|\s*$/, '$1'));

/* Get Row Data */
const rows: string[][] = [];
Expand Down Expand Up @@ -239,6 +241,9 @@ export const defaultTranslators: TranslatorConfigObject = {
}
}),

/* Table Columns */
'td,th': { preserveIfEmpty: true },

/* Link */
'a': ({ node, options, visitor }) => {
const href = node.getAttribute('href');
Expand Down Expand Up @@ -348,10 +353,14 @@ export const defaultCodeBlockTranslators: TranslatorConfigObject = {
'br': { content: `\n`, recurse: false },
'hr': { content: '---', recurse: false },
'h1,h2,h3,h4,h5,h6': { prefix: '[', postfix: ']' },
'ol,ul': defaultTranslators['ol,ul'],
'li': defaultTranslators['li'],
'tr': { surroundingNewlines: true },
'img': { recurse: false }
'img': { recurse: false },

// Block elements should not add newlines in code blocks (fixes #52, #24)
'div,p,section,article,aside,header,footer,main,nav': { surroundingNewlines: false },
'ol,ul,li': { surroundingNewlines: false },
'table,thead,tbody,tfoot,tr,td,th': { surroundingNewlines: false },
'blockquote,pre': { surroundingNewlines: false },
'dl,dt,dd': { surroundingNewlines: false }
}

export const aTagTranslatorConfig: TranslatorConfigObject = {
Expand Down
6 changes: 6 additions & 0 deletions src/options.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ export interface NodeHtmlMarkdownOptions {
* @default *
*/
bulletMarker: string,

/**
* Indentation string for nested lists
* @default ' '
*/
indent: string,

/**
* Style for code block
Expand Down
23 changes: 19 additions & 4 deletions src/visitor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,21 @@ export class Visitor {
(<any>node).trimmedText ??= trimNewLines((<any>node).wholeText);
}

return node.isWhitespace && !metadata?.preserveWhitespace
? (!result.text.length || result.trailingNewlineStats.whitespace > 0) ? void 0 : this.appendResult(' ')
: this.appendResult(this.processText(metadata?.preserveWhitespace ? node.text : node.trimmedText, metadata));
if (node.isWhitespace && !metadata?.preserveWhitespace) {
return (!result.text.length || result.trailingNewlineStats.whitespace > 0) ? void 0 : this.appendResult(' ');
}

// Fix for issues #61 and #34: Process original text to preserve trailing whitespace before inline elements
const sourceText = metadata?.preserveWhitespace ? node.text : node.text || node.trimmedText;
let processedText = this.processText(sourceText, metadata);

// Trim leading spaces only if original started with newline; keep trailing spaces for inline elements
if (!metadata?.preserveWhitespace && processedText) {
if (sourceText && /^\n/.test(sourceText)) processedText = processedText.replace(/^ /, '');
if (sourceText && !/\s$/.test(sourceText)) processedText = processedText.replace(/ +$/, '');
}

return this.appendResult(processedText);
}

if (textOnly || !isElementNode(node)) return;
Expand Down Expand Up @@ -304,7 +316,10 @@ export function getMarkdownForHtmlNodes(instance: NodeHtmlMarkdown, rootNode: Ht
'$1'
);

return trimNewLines(result);
// Trim newlines and any trailing space (but preserve two-space line breaks)
result = trimNewLines(result);
if (result.endsWith(' ') && !result.endsWith(' ')) result = result.trimEnd();
return result;
}

// endregion
15 changes: 15 additions & 0 deletions test-fix.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
const { NodeHtmlMarkdown } = require('./dist/index.js');
const nhm = new NodeHtmlMarkdown();
const tests = [
['#61: newline before <b>', '1\n<b>2</b>', '1 **2**'],
['#34: newline before <em>', 'text\n<em>emphasized</em>', 'text _emphasized_'],
['#34: complex', 'The contents of the newly created <code>Buffer</code> are unknown and\n<em>may contain sensitive data</em>.', 'The contents of the newly created `Buffer` are unknown and _may contain sensitive data_.'],
];
let passed = 0;
tests.forEach(([name, html, exp]) => {
const res = nhm.translate(html);
const ok = res === exp;
console.log((ok ? '✓' : '✗') + ' ' + name);
if (ok) passed++;
});
console.log('\n' + passed + '/' + tests.length + ' passed');
33 changes: 26 additions & 7 deletions test/default-tags-codeblock.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ describe(`Default Tags`, () => {
test(`Non-processed Elements (b, strong, del, s, strike, em, i, pre, code, blockquote, a)`, () => {
const tags = [ 'b', 'strong', 'del', 's', 'strike', 'em', 'i', 'code', 'a', 'pre', 'blockquote' ];
const html = tags.map(t => `<${t}>${t}</${t}>`).join(' ');
const exp = 'b strong del s strike em i code a \n\npre\n\n blockquote\n\n';
const exp = 'b strong del s strike em i code a pre blockquote';

const res = translateAsBlock(html);
expect(res).toBe(getExpected(exp));
Expand Down Expand Up @@ -60,7 +60,7 @@ describe(`Default Tags`, () => {
</li>
</ol>
`);
expect(res).toBe(getExpected(` \n \n1. a \nb\n \n \n2. b \n \n 1. c \n d \n \n * e \n f\n \n `));
expect(res).toBe(getExpected(`\n \n a\n\nb\n \n b\n c\nd\n e\nf\n \n `));
});

test(`Multi-level Unordered List`, () => {
Expand All @@ -74,12 +74,31 @@ describe(`Default Tags`, () => {
</li>
</ul>
`);
expect(res).toBe(getExpected(` \n \n* a \nb\n \n \n* b \n \n * c \n d \n \n 1. e \n f\n \n `));
expect(res).toBe(getExpected(`\n \n a\n\nb\n \n b\n c\nd\n e\nf\n \n `));
});
});

test(`Table`, () => {
const res = translateAsBlock('a<tr>b</tr>c<table><td>X</td></table>');
expect(res).toBe(getExpected(`a\nb\nc\n\nX\n\n`));
})
test('Block elements should not add extra newlines', () => {
// DIV
const div = translateAsBlock('a<div>b</div>c');
expect(div).toBe(getExpected('abc'));

// P
const p = translateAsBlock('x<p>y</p>z');
expect(p).toBe(getExpected('xyz'));

// BLOCKQUOTE
const bq = translateAsBlock('foo<blockquote>bar</blockquote>baz');
expect(bq).toBe(getExpected('foobarbaz'));
});

test('Table elements should not add extra newlines', () => {
const res = translateAsBlock('a<tr>b</tr>c<table><td>X</td></table>d');
expect(res).toBe(getExpected('abcXd'));
});

test('List elements should not add extra newlines', () => {
const res = translateAsBlock('start<ul><li>item</li></ul>end');
expect(res).toBe(getExpected('startitemend'));
});
});
Loading
Loading