Skip to content

feat(pptx): preserve hyperlinks and list hierarchy in PPTX conversion#1

Open
CheenDing wants to merge 1 commit into
mainfrom
feat-pptx-hyperlinks-and-lists
Open

feat(pptx): preserve hyperlinks and list hierarchy in PPTX conversion#1
CheenDing wants to merge 1 commit into
mainfrom
feat-pptx-hyperlinks-and-lists

Conversation

@CheenDing
Copy link
Copy Markdown
Owner

  • Add _convert_text_frame_to_markdown() to extract text paragraph-by-paragraph
  • Add _convert_paragraph_to_markdown() to preserve hyperlinks as text
  • Handle bullet list indentation based on paragraph.level
  • Escape brackets in hyperlink text to avoid breaking markdown syntax
  • Add test PPT and test vector for hyperlinks and nested lists

Fixes the issue where PPTX hyperlinks were lost and bullet list hierarchy was flattened into plain text.

Summary

This PR improves PPTX-to-Markdown conver
sion by preserving hyperlinks and **
bullet list hierarchy** that were previo
usly lost.

Problem

Before this change:

  • Hyperlinks in PPTX text were stripped,
    leaving only plain text
  • Bullet list indentation levels were fl
    attened into a single text block

Example input (PPTX slide):
Visit Microsoft (hyperlink to https://mi
crosoft.com) • Level 1 item • Level 2 it
em

Before (incorrect):
Visit Microsoft Level 1 item Level 2 ite
m

After (correct):
Visit Microsoft
• Level 1 item
• Level 2 item

- Add _convert_text_frame_to_markdown() to extract text paragraph-by-paragraph
- Add _convert_paragraph_to_markdown() to preserve hyperlinks as [text](url)
- Handle bullet list indentation based on paragraph.level
- Escape brackets in hyperlink text to avoid breaking markdown syntax
- Add test PPT and test vector for hyperlinks and nested lists

Fixes the issue where PPTX hyperlinks were lost and bullet list hierarchy
was flattened into plain text.
@CheenDing CheenDing added this to the 11111 milestone May 24, 2026
@CheenDing CheenDing self-assigned this May 24, 2026
@CheenDing CheenDing added bug Something isn't working documentation Improvements or additions to documentation labels May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant