Skip to content

fix: preserve underline formatting in DOCX to Markdown conversion#1696

Open
nitinbhatia-dev wants to merge 1 commit intomicrosoft:mainfrom
nitinbhatia-dev:fix/docx-underline-preservation
Open

fix: preserve underline formatting in DOCX to Markdown conversion#1696
nitinbhatia-dev wants to merge 1 commit intomicrosoft:mainfrom
nitinbhatia-dev:fix/docx-underline-preservation

Conversation

@nitinbhatia-dev
Copy link
Copy Markdown

Summary

Closes #35

Underlined text in .docx files was silently dropped during conversion. Two things were missing:

  • Mammoth was not configured to emit <u> HTML tags for underlined runs (it requires an explicit style_map rule)
  • _CustomMarkdownify had no convert_u handler, so even if <u> tags arrived they would have been stripped

Changes

  • _docx_converter.py: Set u => u as the default Mammoth style_map so underlined runs are emitted as <u> elements. Any user-supplied style_map is appended and takes precedence.
  • _markdownify.py: Add convert_u to _CustomMarkdownify that renders <u>text</u> inline HTML — the only way to express underline in CommonMark.

Test plan

  • Convert a .docx file containing underlined text and verify the output contains <u>...</u> wrapping the underlined spans
  • Verify bold and italic text still convert correctly (**bold**, *italic*)
  • Verify passing a custom style_map kwarg still works and takes precedence over the default
  • Run the existing test suite: hatch test

🤖 Generated with Claude Code

Underlined text in .docx files was silently dropped during conversion
because Mammoth's default style map does not emit <u> tags, and
_CustomMarkdownify had no handler for them.

- Add `u => u` as the default Mammoth style_map in DocxConverter so
  underlined runs are emitted as <u> HTML elements. User-supplied
  style_map is appended and still takes precedence.
- Add `convert_u` to _CustomMarkdownify to render <u> elements as
  `<u>text</u>` in the Markdown output (HTML inline is the only way
  to express underline in CommonMark).

Fixes microsoft#35

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nitinbhatia-dev
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Underline not preserved

1 participant