Add rowspan support for HTML tables #237
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add rowspan Support for HTML Tables
Problem Statement
Previously, python-markdownify did not properly handle HTML tables with
rowspanattributes. When encountering table cells withrowspan > 1, the resulting Markdown table would have missing cells in subsequent rows, leading to malformed table structure and incorrect column alignment.Example of the problem:
Previous (incorrect) output:
Solution
This PR implements comprehensive rowspan support by:
_table_has_rowspan()method to detect tables containing rowspan attributes_build_rowspan_cells()method that:| |) are inserted where rowspan cells span multiple rowsNew (correct) output:
Implementation Details
Core Changes in
markdownify/__init__.py:convert_tr()method: Enhanced to detect and handle rowspan tables_table_has_rowspan()method: Efficient detection of tables with rowspan attributes_build_rowspan_cells()method: Algorithm to calculate empty cell placement for each rowKey Features:
<thead>sectionsTesting
New Test Coverage in
tests/test_tables.py:Added comprehensive test cases covering various rowspan scenarios:
Test Examples:
Test Integration:
test_table()andtest_table_infer_header()functionstable_infer_header=TruemodesCompatibility
Files Changed
markdownify/__init__.py: Core rowspan implementation (+~100 lines)tests/test_tables.py: Comprehensive test coverage (+~50 lines)Testing Results
All existing tests pass, confirming no regressions. New rowspan functionality is fully tested and validated.
-------------------------------test result----------------------------
===================================================== test session starts ======================================================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx
cachedir: .pytest_cache
rootdir: xxxx
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 2 items
tests/test_tables.py::test_table PASSED [ 50%]
tests/test_tables.py::test_table_infer_header PASSED [100%]
====================================================== 2 passed in 0.06s =======================================================
===================================================== test session starts ======================================================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx
cachedir: .pytest_cache
rootdir: xxxxx
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 83 items
tests/test_advanced.py::test_chomp PASSED [ 1%]
tests/test_advanced.py::test_nested PASSED [ 2%]
tests/test_advanced.py::test_ignore_comments PASSED [ 3%]
tests/test_advanced.py::test_ignore_comments_with_other_tags PASSED [ 4%]
tests/test_advanced.py::test_code_with_tricky_content PASSED [ 6%]
tests/test_advanced.py::test_special_tags PASSED [ 7%]
tests/test_args.py::test_strip PASSED [ 8%]
tests/test_args.py::test_do_not_strip PASSED [ 9%]
tests/test_args.py::test_convert PASSED [ 10%]
tests/test_args.py::test_do_not_convert PASSED [ 12%]
tests/test_args.py::test_strip_document PASSED [ 13%]
tests/test_args.py::test_strip_pre PASSED [ 14%]
tests/test_basic.py::test_single_tag PASSED [ 15%]
tests/test_basic.py::test_soup PASSED [ 16%]
tests/test_basic.py::test_whitespace PASSED [ 18%]
tests/test_conversions.py::test_a PASSED [ 19%]
tests/test_conversions.py::test_a_spaces PASSED [ 20%]
tests/test_conversions.py::test_a_with_title PASSED [ 21%]
tests/test_conversions.py::test_a_shortcut PASSED [ 22%]
tests/test_conversions.py::test_a_no_autolinks PASSED [ 24%]
tests/test_conversions.py::test_a_in_code PASSED [ 25%]
tests/test_conversions.py::test_b PASSED [ 26%]
tests/test_conversions.py::test_b_spaces PASSED [ 27%]
tests/test_conversions.py::test_blockquote PASSED [ 28%]
tests/test_conversions.py::test_blockquote_with_nested_paragraph PASSED [ 30%]
tests/test_conversions.py::test_blockquote_with_paragraph PASSED [ 31%]
tests/test_conversions.py::test_blockquote_nested PASSED [ 32%]
tests/test_conversions.py::test_br PASSED [ 33%]
tests/test_conversions.py::test_code PASSED [ 34%]
tests/test_conversions.py::test_dl PASSED [ 36%]
tests/test_conversions.py::test_del PASSED [ 37%]
tests/test_conversions.py::test_div_section_article PASSED [ 38%]
tests/test_conversions.py::test_em PASSED [ 39%]
tests/test_conversions.py::test_figcaption PASSED [ 40%]
tests/test_conversions.py::test_header_with_space PASSED [ 42%]
tests/test_conversions.py::test_h1 PASSED [ 43%]
tests/test_conversions.py::test_h2 PASSED [ 44%]
tests/test_conversions.py::test_hn PASSED [ 45%]
tests/test_conversions.py::test_hn_chained PASSED [ 46%]
tests/test_conversions.py::test_hn_nested_tag_heading_style PASSED [ 48%]
tests/test_conversions.py::test_hn_nested_simple_tag PASSED [ 49%]
tests/test_conversions.py::test_hn_nested_img PASSED [ 50%]
tests/test_conversions.py::test_hn_atx_headings PASSED [ 51%]
tests/test_conversions.py::test_hn_atx_closed_headings PASSED [ 53%]
tests/test_conversions.py::test_hn_newlines PASSED [ 54%]
tests/test_conversions.py::test_head PASSED [ 55%]
tests/test_conversions.py::test_hr PASSED [ 56%]
tests/test_conversions.py::test_i PASSED [ 57%]
tests/test_conversions.py::test_img PASSED [ 59%]
tests/test_conversions.py::test_video PASSED [ 60%]
tests/test_conversions.py::test_kbd PASSED [ 61%]
tests/test_conversions.py::test_p PASSED [ 62%]
tests/test_conversions.py::test_pre PASSED [ 63%]
tests/test_conversions.py::test_q PASSED [ 65%]
tests/test_conversions.py::test_script PASSED [ 66%]
tests/test_conversions.py::test_style PASSED [ 67%]
tests/test_conversions.py::test_s PASSED [ 68%]
tests/test_conversions.py::test_samp PASSED [ 69%]
tests/test_conversions.py::test_strong PASSED [ 71%]
tests/test_conversions.py::test_strong_em_symbol PASSED [ 72%]
tests/test_conversions.py::test_sub PASSED [ 73%]
tests/test_conversions.py::test_sup PASSED [ 74%]
tests/test_conversions.py::test_lang PASSED [ 75%]
tests/test_conversions.py::test_lang_callback PASSED [ 77%]
tests/test_conversions.py::test_spaces PASSED [ 78%]
tests/test_custom_converter.py::test_custom_conversion_functions PASSED [ 79%]
tests/test_custom_converter.py::test_soup PASSED [ 80%]
tests/test_escaping.py::test_asterisks PASSED [ 81%]
tests/test_escaping.py::test_underscore PASSED [ 83%]
tests/test_escaping.py::test_xml_entities PASSED [ 84%]
tests/test_escaping.py::test_named_entities PASSED [ 85%]
tests/test_escaping.py::test_hexadecimal_entities PASSED [ 86%]
tests/test_escaping.py::test_single_escaping_entities PASSED [ 87%]
tests/test_escaping.py::test_misc PASSED [ 89%]
tests/test_lists.py::test_ol PASSED [ 90%]
tests/test_lists.py::test_nested_ols PASSED [ 91%]
tests/test_lists.py::test_ul PASSED [ 92%]
tests/test_lists.py::test_inline_ul PASSED [ 93%]
tests/test_lists.py::test_nested_uls PASSED [ 95%]
tests/test_lists.py::test_bullets PASSED [ 96%]
tests/test_lists.py::test_li_text PASSED [ 97%]
tests/test_tables.py::test_table PASSED [ 98%]
tests/test_tables.py::test_table_infer_header PASSED [100%]
====================================================== 83 passed in 0.10s ======================================================