Skip to content

Conversation

@ExcitingFrog
Copy link

Add rowspan Support for HTML Tables

Problem Statement

Previously, python-markdownify did not properly handle HTML tables with rowspan attributes. When encountering table cells with rowspan > 1, the resulting Markdown table would have missing cells in subsequent rows, leading to malformed table structure and incorrect column alignment.

Example of the problem:

<table>
    <tr>
        <th>Name</th>
        <th>Department</th>
        <th>Age</th>
    </tr>
    <tr>
        <td rowspan="2">John</td>
        <td>IT</td>
        <td>30</td>
    </tr>
    <tr>
        <td>Management</td>
        <td>31</td>
    </tr>
</table>

Previous (incorrect) output:

| Name | Department | Age |
| --- | --- | --- |
| John | IT | 30 |
| Management | 31 |  <!-- Missing cell, causing misalignment -->

Solution

This PR implements comprehensive rowspan support by:

  1. Detection Logic: Added _table_has_rowspan() method to detect tables containing rowspan attributes
  2. Grid Algorithm: Implemented _build_rowspan_cells() method that:
    • Tracks which columns are occupied by rowspan cells from previous rows
    • Calculates the correct placement of empty placeholder cells
    • Handles complex scenarios with multiple rowspan cells and nested table structures
  3. Backward Compatibility: Tables without rowspan continue to use the original optimized logic
  4. Empty Cell Generation: Properly formatted empty cells (| |) are inserted where rowspan cells span multiple rows

New (correct) output:

| Name | Department | Age |
| --- | --- | --- |
| John | IT | 30 |
|  | Management | 31 |  <!-- Proper empty cell for rowspan -->

Implementation Details

Core Changes in markdownify/__init__.py:

  • convert_tr() method: Enhanced to detect and handle rowspan tables
  • _table_has_rowspan() method: Efficient detection of tables with rowspan attributes
  • _build_rowspan_cells() method: Algorithm to calculate empty cell placement for each row
  • Grid tracking: Maintains occupied column positions across table rows
  • Column counting: Accurate calculation of total columns including rowspan effects

Key Features:

  • Simple rowspan: Basic single-column row spanning
  • Complex rowspan: Multiple consecutive rows (rowspan > 2)
  • Mixed scenarios: Rowspan combined with colspan attributes
  • Multiple rowspan: Multiple rowspan cells in the same row
  • Table headers: Proper handling of rowspan in <thead> sections
  • Backward compatibility: No impact on tables without rowspan
  • Performance: Rowspan processing only activates when needed

Testing

New Test Coverage in tests/test_tables.py:

Added comprehensive test cases covering various rowspan scenarios:

  1. Simple rowspan: Basic two-row spanning functionality
  2. Complex rowspan: Multi-row spanning (rowspan="3")
  3. Rowspan + colspan: Combined row and column spanning
  4. Multiple rowspan: Multiple rowspan cells in the same row
  5. Thead rowspan: Rowspan in table headers with colspan

Test Examples:

# Simple rowspan test
table_with_simple_rowspan = """<table>
    <tr><th>Name</th><th>Department</th><th>Age</th></tr>
    <tr><td rowspan="2">John</td><td>IT</td><td>30</td></tr>
    <tr><td>Management</td><td>31</td></tr>
</table>"""

# Expected output with proper empty cell placement
expected = '\n\n| Name | Department | Age |\n| --- | --- | --- |\n| John | IT | 30 |\n|  | Management | 31 |\n\n'

Test Integration:

  • Integrated approach: Rowspan tests are integrated into existing test_table() and test_table_infer_header() functions
  • Format consistency: New tests follow the same format and style as existing table tests
  • Full coverage: Tests both normal and table_infer_header=True modes
  • Regression prevention: All existing tests (83 total) continue to pass

Compatibility

  • Backward compatible: No breaking changes to existing functionality
  • API unchanged: No new parameters or configuration options required
  • Performance: Minimal overhead for tables without rowspan
  • Edge cases: Handles malformed HTML gracefully
  • Option support: Works correctly with all existing markdownify options

Files Changed

  • markdownify/__init__.py: Core rowspan implementation (+~100 lines)
  • tests/test_tables.py: Comprehensive test coverage (+~50 lines)

Testing Results

83 passed in 0.10s

All existing tests pass, confirming no regressions. New rowspan functionality is fully tested and validated.

-------------------------------test result----------------------------
===================================================== test session starts ======================================================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx
cachedir: .pytest_cache
rootdir: xxxx
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 2 items

tests/test_tables.py::test_table PASSED [ 50%]
tests/test_tables.py::test_table_infer_header PASSED [100%]

====================================================== 2 passed in 0.06s =======================================================

===================================================== test session starts ======================================================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0 -- xxxx
cachedir: .pytest_cache
rootdir: xxxxx
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 83 items

tests/test_advanced.py::test_chomp PASSED [ 1%]
tests/test_advanced.py::test_nested PASSED [ 2%]
tests/test_advanced.py::test_ignore_comments PASSED [ 3%]
tests/test_advanced.py::test_ignore_comments_with_other_tags PASSED [ 4%]
tests/test_advanced.py::test_code_with_tricky_content PASSED [ 6%]
tests/test_advanced.py::test_special_tags PASSED [ 7%]
tests/test_args.py::test_strip PASSED [ 8%]
tests/test_args.py::test_do_not_strip PASSED [ 9%]
tests/test_args.py::test_convert PASSED [ 10%]
tests/test_args.py::test_do_not_convert PASSED [ 12%]
tests/test_args.py::test_strip_document PASSED [ 13%]
tests/test_args.py::test_strip_pre PASSED [ 14%]
tests/test_basic.py::test_single_tag PASSED [ 15%]
tests/test_basic.py::test_soup PASSED [ 16%]
tests/test_basic.py::test_whitespace PASSED [ 18%]
tests/test_conversions.py::test_a PASSED [ 19%]
tests/test_conversions.py::test_a_spaces PASSED [ 20%]
tests/test_conversions.py::test_a_with_title PASSED [ 21%]
tests/test_conversions.py::test_a_shortcut PASSED [ 22%]
tests/test_conversions.py::test_a_no_autolinks PASSED [ 24%]
tests/test_conversions.py::test_a_in_code PASSED [ 25%]
tests/test_conversions.py::test_b PASSED [ 26%]
tests/test_conversions.py::test_b_spaces PASSED [ 27%]
tests/test_conversions.py::test_blockquote PASSED [ 28%]
tests/test_conversions.py::test_blockquote_with_nested_paragraph PASSED [ 30%]
tests/test_conversions.py::test_blockquote_with_paragraph PASSED [ 31%]
tests/test_conversions.py::test_blockquote_nested PASSED [ 32%]
tests/test_conversions.py::test_br PASSED [ 33%]
tests/test_conversions.py::test_code PASSED [ 34%]
tests/test_conversions.py::test_dl PASSED [ 36%]
tests/test_conversions.py::test_del PASSED [ 37%]
tests/test_conversions.py::test_div_section_article PASSED [ 38%]
tests/test_conversions.py::test_em PASSED [ 39%]
tests/test_conversions.py::test_figcaption PASSED [ 40%]
tests/test_conversions.py::test_header_with_space PASSED [ 42%]
tests/test_conversions.py::test_h1 PASSED [ 43%]
tests/test_conversions.py::test_h2 PASSED [ 44%]
tests/test_conversions.py::test_hn PASSED [ 45%]
tests/test_conversions.py::test_hn_chained PASSED [ 46%]
tests/test_conversions.py::test_hn_nested_tag_heading_style PASSED [ 48%]
tests/test_conversions.py::test_hn_nested_simple_tag PASSED [ 49%]
tests/test_conversions.py::test_hn_nested_img PASSED [ 50%]
tests/test_conversions.py::test_hn_atx_headings PASSED [ 51%]
tests/test_conversions.py::test_hn_atx_closed_headings PASSED [ 53%]
tests/test_conversions.py::test_hn_newlines PASSED [ 54%]
tests/test_conversions.py::test_head PASSED [ 55%]
tests/test_conversions.py::test_hr PASSED [ 56%]
tests/test_conversions.py::test_i PASSED [ 57%]
tests/test_conversions.py::test_img PASSED [ 59%]
tests/test_conversions.py::test_video PASSED [ 60%]
tests/test_conversions.py::test_kbd PASSED [ 61%]
tests/test_conversions.py::test_p PASSED [ 62%]
tests/test_conversions.py::test_pre PASSED [ 63%]
tests/test_conversions.py::test_q PASSED [ 65%]
tests/test_conversions.py::test_script PASSED [ 66%]
tests/test_conversions.py::test_style PASSED [ 67%]
tests/test_conversions.py::test_s PASSED [ 68%]
tests/test_conversions.py::test_samp PASSED [ 69%]
tests/test_conversions.py::test_strong PASSED [ 71%]
tests/test_conversions.py::test_strong_em_symbol PASSED [ 72%]
tests/test_conversions.py::test_sub PASSED [ 73%]
tests/test_conversions.py::test_sup PASSED [ 74%]
tests/test_conversions.py::test_lang PASSED [ 75%]
tests/test_conversions.py::test_lang_callback PASSED [ 77%]
tests/test_conversions.py::test_spaces PASSED [ 78%]
tests/test_custom_converter.py::test_custom_conversion_functions PASSED [ 79%]
tests/test_custom_converter.py::test_soup PASSED [ 80%]
tests/test_escaping.py::test_asterisks PASSED [ 81%]
tests/test_escaping.py::test_underscore PASSED [ 83%]
tests/test_escaping.py::test_xml_entities PASSED [ 84%]
tests/test_escaping.py::test_named_entities PASSED [ 85%]
tests/test_escaping.py::test_hexadecimal_entities PASSED [ 86%]
tests/test_escaping.py::test_single_escaping_entities PASSED [ 87%]
tests/test_escaping.py::test_misc PASSED [ 89%]
tests/test_lists.py::test_ol PASSED [ 90%]
tests/test_lists.py::test_nested_ols PASSED [ 91%]
tests/test_lists.py::test_ul PASSED [ 92%]
tests/test_lists.py::test_inline_ul PASSED [ 93%]
tests/test_lists.py::test_nested_uls PASSED [ 95%]
tests/test_lists.py::test_bullets PASSED [ 96%]
tests/test_lists.py::test_li_text PASSED [ 97%]
tests/test_tables.py::test_table PASSED [ 98%]
tests/test_tables.py::test_table_infer_header PASSED [100%]

====================================================== 83 passed in 0.10s ======================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant