-
-
Notifications
You must be signed in to change notification settings - Fork 140
Description
I would like to propose the following feature (needed for one of my work projects):
I need an ability to output into HTML document that has tracked changes on, so that all insertions are going under <ins> tag and deletions under <del> tag
For example:
Word:
This is a house that John Jack build
Html:
<p>This is the house that <del>John</del><ins>Jack</ins> built</p>
It should be an optional feature, which the client can control though additional parameter of the convert_to_html() function or by using a specific style map, like currently python-mammoth can show or hide comments based on style map ).
Implementation details:
In OpenXML format these tags are present in the following format
<w:del w:author="John Doe" w:date="2023-10-25T14:18:00Z" w:id="1">
<w:r>
<w:delText>Deleted text</w:delText>
</w:r>
</w:del>
<w:ins w:author="John Doe" w:date="2023-10-25T14:18:10Z" w:id="2">
<w:r>
<w:t>Inserted text</w:t>
</w:r>
</w:ins>
Current version of mammoth ignores <w:del> tag and for <w:ins> tag it takes all children nodes
I propose to introduce Insertion and Deletion elements in Document model that will handle the data of these nodes
p.s. In fact I have this implemented in my local repo and if such feature looks interesting, I can make a pull request
But I would leave to the author of the library to define how the public interface for this option will look like, would it be really a paremeter in convert_to_html
mammoth.convert_to_html(fileobj=fileobj, ignore_tracked_changes=True)
or would it be some specific style in style_map
using style_map looks preferable as this parameter is passed from https://github.com/microsoft/markitdown into mammoth as well, so it would be great to make a change in mammoth that will not require a change in markitdown
Here are some unit tests that I used to verify my implementation
def _run_element_with_deleted_text(text):
return xml_element("w:r", {}, [_deleted_text_element(text)])
def _deleted_text_element(value):
return xml_element("w:delText", {}, [xml_text(value)])
def test_insertion_element():
element = xml_element("w:p", {}, [
_run_element_with_text("This is "),
xml_element("w:ins", {}, [
_run_element_with_text("inserted")
])
])
assert_equal(
documents.paragraph([
documents.run([documents.text("This is ")]),
documents.run([documents.text("inserted")])]),
_read_and_get_document_xml_element(element, ignore_tracked_changes=True)
)
assert_equal(
documents.paragraph([
documents.run([documents.text("This is ")]),
documents.insertion([documents.run([documents.text("inserted")])])]),
_read_and_get_document_xml_element(element, ignore_tracked_changes=False)
)
def test_deletion_element():
element = xml_element("w:p", {}, [
_run_element_with_text("This is "),
xml_element("w:del", {}, [
_run_element_with_deleted_text("deleted")
])
])
assert_equal(
documents.paragraph([
documents.run([documents.text("This is ")])]),
_read_and_get_document_xml_element(element, ignore_tracked_changes=True)
)
assert_equal(
documents.paragraph([
documents.run([documents.text("This is ")]),
documents.deletion([documents.run([documents.text("deleted")])])]),
_read_and_get_document_xml_element(element, ignore_tracked_changes=False)
)