Skip to content

Fix Bug 66263 — Add support for reading SDT row in tables#971

Open
hostedbygnome wants to merge 7 commits intoapache:trunkfrom
hostedbygnome:Bug-66263
Open

Fix Bug 66263 — Add support for reading SDT row in tables#971
hostedbygnome wants to merge 7 commits intoapache:trunkfrom
hostedbygnome:Bug-66263

Conversation

@hostedbygnome
Copy link
Copy Markdown

The issue is that there's an SDT at the same level as the table row.
Added support for this case.

CTR elements inside CTSdtRun and CTRow elements inside CTSdtRow are processed recursively.

Comment thread poi-ooxml/src/main/java/org/apache/poi/xwpf/usermodel/XWPFParagraph.java Outdated
@pjfanning
Copy link
Copy Markdown
Member

  • doesn't even compile
  • no tests

@pjfanning pjfanning changed the title Fix Bug 66263 — Add support for SDT row in tables Fix Bug 66263 — Add support for reading SDT row in tables Dec 18, 2025
@pjfanning
Copy link
Copy Markdown
Member

would it be possible to provide a test?

Comment thread poi-ooxml/src/main/java/org/apache/poi/xwpf/usermodel/XWPFParagraph.java Outdated
@hostedbygnome
Copy link
Copy Markdown
Author

hostedbygnome commented Apr 6, 2026

would it be possible to provide a test?

yes, sorry for the delay

@hostedbygnome
Copy link
Copy Markdown
Author

Is there anything else required in pr?
@pjfanning

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for handling Word SDT (content control) structures that appear at the same level as table rows and runs, so their inner content can be discovered during parsing and text extraction (Bug 66263).

Changes:

  • Update XWPFTable to iterate table child elements and recursively process CTSdtRow to extract inner CTRow content.
  • Update XWPFParagraph to additionally process CTR elements inside CTSdtBlock / CTSdtRun content.
  • Add regression tests and new sample .docx files for SDT rows in tables and SDT runs in paragraphs.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
poi-ooxml/src/main/java/org/apache/poi/xwpf/usermodel/XWPFTable.java Cursor-based traversal of table children; recursive SDT-row handling
poi-ooxml/src/main/java/org/apache/poi/xwpf/usermodel/XWPFParagraph.java Processes CTRs nested inside SDT content when building run lists
poi-ooxml/src/test/java/org/apache/poi/xwpf/usermodel/TestXWPFTable.java New unit tests for SDT-row table behavior and sample-doc regression
poi-ooxml/src/test/java/org/apache/poi/xwpf/usermodel/TestXWPFParagraph.java New unit tests for SDT run handling + sample-doc regression
test-data/document/Bug66263-table.docx Added sample document covering SDT row in table
test-data/document/Bug66263-paragraph.docx Added sample document covering SDT runs in paragraphs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +175 to 190
try (XmlCursor cursor = table.newCursor()) {
cursor.selectPath("./*");
while (cursor.toNextSelection()) {
XmlObject xmlObject = cursor.getObject();
if (xmlObject instanceof CTRow) {
processCTRow((CTRow)xmlObject);
}
else if (xmlObject instanceof CTSdtRow) {
List<CTRow> rows = new ArrayList<>();
collectCTRowsInnerSdtRow((CTSdtRow)xmlObject, rows);
for (CTRow row : rows)
{
processCTRow(row);
}
rowText.append(p.getText());
}
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor now adds CTRow instances extracted from CTSdtRow into tableRows. Many mutation/access methods (e.g., removeRow, insertNewTableRow, addRow, getRow(int), getNumberOfRows) assume tableRows indexes map 1:1 to ctTbl’s top-level w:tr array. With SDT-derived rows included, removeRow(pos) will remove the wrong underlying w:tr (or throw) and getRow(int)/getNumberOfRows() become inconsistent with getRows(). Consider either (a) keeping tableRows aligned to top-level w:tr only and handling SDT rows separately for text extraction, or (b) refactoring row operations to remove/insert via the actual row element’s cursor (or SDT wrapper) rather than ctTbl.removeTr(pos) and updating getRow/getNumberOfRows semantics accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +226
List<CTRow> rowsInnerSdtContent = sdtContent.getTrList();
if (!rowsInnerSdtContent.isEmpty()) {
rows.addAll(rowsInnerSdtContent);
return;
}

for (CTSdtRow innerSdt : sdtContent.getSdtList()) {
collectCTRowsInnerSdtRow(innerSdt, rows);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectCTRowsInnerSdtRow returns immediately when sdtContent.getTrList() is non-empty, which skips any nested w:sdt elements that may appear alongside w:tr within the same CTSdtContentRow. This will miss rows in mixed-content SDTs and also loses document order between tr and nested sdt. Iterate through the SDT content children in order (e.g., via an XmlCursor over sdtContent) and collect both CTRow and nested CTSdtRow recursively without the early return.

Suggested change
List<CTRow> rowsInnerSdtContent = sdtContent.getTrList();
if (!rowsInnerSdtContent.isEmpty()) {
rows.addAll(rowsInnerSdtContent);
return;
}
for (CTSdtRow innerSdt : sdtContent.getSdtList()) {
collectCTRowsInnerSdtRow(innerSdt, rows);
XmlCursor cursor = sdtContent.newCursor();
try {
if (!cursor.toFirstChild()) {
return;
}
do {
XmlObject child = cursor.getObject();
if (child instanceof CTRow) {
rows.add((CTRow) child);
} else if (child instanceof CTSdtRow) {
collectCTRowsInnerSdtRow((CTSdtRow) child, rows);
}
} while (cursor.toNextSibling());
} finally {
cursor.dispose();

Copilot uses AI. Check for mistakes.
Comment on lines 170 to 173
// is an empty table: I add one row and one column as default
if (initRow && table.sizeOfTrArray() == 0) {
createEmptyTable(table);
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "empty table" check only looks at sizeOfTrArray() == 0. A table that contains only w:sdt row wrappers has no top-level w:tr, but it isn’t actually empty; createEmptyTable(table) will inject an extra blank row when using the default constructor (initRow=true). Consider extending the condition to also check for the presence of SDT rows (e.g., sizeOfSdtArray()==0/getSdtList().isEmpty()), or otherwise detect whether the table already contains any row-like content before creating a default row.

Copilot uses AI. Check for mistakes.
Comment on lines +179 to +184
private void processCTRs(List<CTR> ctrs) {
if (ctrs == null) {
return;
}
for (CTR ctr : ctrs) {
if (ctr.getRPr() != null) {
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processCTRs adds XWPFRun objects for CTRs inside SDT content into the paragraph’s runs list. Many editing APIs (insertNewRun, insertNewHyperlinkRun, etc.) assume every entry in runs corresponds to a CTR whose cursor is in the paragraph (or hyperlink/field) container; for CTRs nested under CTSdtContentRun, insertNewProvidedRun will fail the isCursorInParagraph check and return null for positions that point at these SDT-derived runs. Either avoid adding nested SDT CTRs into runs (keep SDT content accessible via XWPFSDT/iruns), or update the insertion/removal logic and cursor positioning to support runs whose parent is SDT content so that the runs list remains a valid edit surface.

Suggested change
private void processCTRs(List<CTR> ctrs) {
if (ctrs == null) {
return;
}
for (CTR ctr : ctrs) {
if (ctr.getRPr() != null) {
private boolean isRunInParagraphEditContainer(CTR ctr) {
try (XmlCursor cursor = ctr.newCursor()) {
if (!cursor.toParent()) {
return false;
}
XmlObject parent = cursor.getObject();
return parent instanceof CTP
|| parent instanceof CTHyperlink
|| parent instanceof CTSimpleField;
}
}
private void processCTRs(List<CTR> ctrs) {
if (ctrs == null) {
return;
}
for (CTR ctr : ctrs) {
if (ctr.getRPr() != null && isRunInParagraphEditContainer(ctr)) {

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +48
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtBlock;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtContentBlock;
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports are unused in this test class (CTSdtBlock, CTSdtContentBlock) and will cause compilation to fail with "unused import" in Java. Remove them or add coverage that actually uses SDT blocks.

Suggested change
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtBlock;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtContentBlock;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants