Skip to content

Latest commit

 

History

History
298 lines (221 loc) · 13.7 KB

File metadata and controls

298 lines (221 loc) · 13.7 KB

GroupDocs.Editor for Python via .NET -- AGENTS.md

Instructions for AI agents working with this package.

Load a document, convert it to editable HTML/CSS, edit the markup, then save it back to the original format or convert to another -- Word, Excel, PowerPoint, PDF, email, eBook, and text/markup formats, all without MS Office or OpenOffice installed.

Install

pip install groupdocs-editor-net

Python: 3.5 - 3.14 | Platforms: Windows, Linux, macOS

Resources

Resource URL
Documentation https://docs.groupdocs.com/editor/python-net/
LLM-optimized docs https://docs.groupdocs.com/editor/python-net/llms-full.txt
API reference https://reference.groupdocs.com/editor/python-net/
Code examples https://docs.groupdocs.com/editor/python-net/developer-guide/
Release notes https://releases.groupdocs.com/editor/python-net/release-notes/
PyPI https://pypi.org/project/groupdocs-editor-net/
Free support forum https://forum.groupdocs.com/c/editor/
Temporary license https://purchase.groupdocs.com/temporary-license

MCP Server

If your environment has MCP configured, you can connect your AI tool to the GroupDocs documentation server for on-demand API lookups:

{
  "mcpServers": {
    "groupdocs-docs": {
      "url": "https://docs.groupdocs.com/mcp"
    }
  }
}

Works with Claude Code (~/.claude/settings.json), Cursor (.cursor/mcp.json), VS Code Copilot (.vscode/mcp.json), and any MCP-compatible client. If MCP is unavailable, fall back to the LLM-optimized docs URL above and this file -- both are shipped inside the wheel.

Imports

from groupdocs.editor import (
    License, Metered, Editor, EditableDocument, FormFieldManager,
    EncryptedException, IncorrectPasswordException, PasswordRequiredException, InvalidFormatException,
)
from groupdocs.editor.formats import (
    WordProcessingFormats, SpreadsheetFormats, PresentationFormats,
    FixedLayoutFormats, EBookFormats, EmailFormats, TextualFormats, FormatFamilies,
)
from groupdocs.editor.options import (
    # Load options
    WordProcessingLoadOptions, SpreadsheetLoadOptions, PresentationLoadOptions, PdfLoadOptions,
    # Edit options
    WordProcessingEditOptions, SpreadsheetEditOptions, PresentationEditOptions, PdfEditOptions,
    EbookEditOptions, EmailEditOptions, MarkdownEditOptions, TextEditOptions, XmlEditOptions, DelimitedTextEditOptions,
    # Save options
    WordProcessingSaveOptions, SpreadsheetSaveOptions, PresentationSaveOptions, PdfSaveOptions,
    HtmlSaveOptions, MhtmlSaveOptions, MarkdownSaveOptions, XpsSaveOptions, TextSaveOptions,
    EbookSaveOptions, EmailSaveOptions, DelimitedTextSaveOptions,
)
from groupdocs.editor.metadata import (
    IDocumentInfo, WordProcessingDocumentInfo, SpreadsheetDocumentInfo,
    PresentationDocumentInfo, FixedLayoutDocumentInfo, TextualDocumentInfo,
    EmailDocumentInfo, EbookDocumentInfo, MarkdownDocumentInfo,
)

Load + Edit + Save (the core workflow)

Editor is the entry point. The flow is always: open → edit() → manipulate HTML → save(). Use it as a context manager so the native document handle is released.

from groupdocs.editor import Editor, EditableDocument
from groupdocs.editor.formats import WordProcessingFormats
from groupdocs.editor.options import WordProcessingLoadOptions, WordProcessingSaveOptions

with Editor("input.docx", WordProcessingLoadOptions()) as editor:
    editable = editor.edit()                 # -> EditableDocument
    html = editable.get_embedded_html()      # full self-contained HTML

    edited_html = html.replace("Hello", "Goodbye")

    after_edit = EditableDocument.from_markup(edited_html)
    save_opts = WordProcessingSaveOptions(WordProcessingFormats.DOCX)
    editor.save(after_edit, "output.docx", save_opts)

Editor constructor. Editor(file_path), Editor(file_path, load_options), or Editor(stream, load_options). The load-options type should match the input family (WordProcessingLoadOptions for DOCX, SpreadsheetLoadOptions for XLSX, etc.). Omitting load options lets the engine auto-detect.

editor.save(input_document, file_path, save_options) writes the (possibly modified) EditableDocument to disk. The save_options type — not the file extension — decides the output format.

Per-format quick recipes

Word Processing (DOC, DOCX, RTF, ODT, …)

from groupdocs.editor.options import WordProcessingLoadOptions, WordProcessingEditOptions

with Editor("input.docx", WordProcessingLoadOptions()) as editor:
    eo = WordProcessingEditOptions()
    eo.enable_pagination = False
    eo.enable_language_information = True
    editable = editor.edit(eo)
    body = editable.get_body_content()       # <body>…</body> only

Spreadsheet (XLS, XLSX, ODS, CSV, …) — edit one worksheet at a time

from groupdocs.editor.options import SpreadsheetLoadOptions, SpreadsheetEditOptions

with Editor("book.xlsx", SpreadsheetLoadOptions()) as editor:
    eo = SpreadsheetEditOptions()
    eo.worksheet_index = 0                   # 0-based
    eo.exclude_hidden_worksheets = True
    html = editor.edit(eo).get_content()

Presentation (PPT, PPTX, ODP, …) — edit one slide at a time

from groupdocs.editor.options import PresentationLoadOptions, PresentationEditOptions

with Editor("deck.pptx", PresentationLoadOptions()) as editor:
    eo = PresentationEditOptions()
    eo.slide_number = 0                      # 0-based
    eo.show_hidden_slides = True
    html = editor.edit(eo).get_content()

PDF input → HTML

from groupdocs.editor.options import PdfLoadOptions, PdfEditOptions

lo = PdfLoadOptions()                        # lo.password = "..." for encrypted PDFs
with Editor("input.pdf", lo) as editor:
    eo = PdfEditOptions()
    eo.enable_pagination = True
    html = editor.edit(eo).get_content()

Email (EML, MSG, MBOX, …)

from groupdocs.editor.options import EmailEditOptions, EmailSaveOptions, MailMessageOutput

with Editor("message.eml") as editor:
    eo = EmailEditOptions()
    eo.mail_message_output = MailMessageOutput.ALL   # body, subject, to/cc/bcc, attachments, …
    html = editor.edit(eo).get_content()

Round-trip vs. convert

There is no separate "convert" call — saving an EditableDocument with a different save-options family converts via the HTML intermediate. Same input, different save_options ⇒ different output format.

from groupdocs.editor import Editor, EditableDocument
from groupdocs.editor.options import (
    WordProcessingSaveOptions, PdfSaveOptions, MarkdownSaveOptions,
)
from groupdocs.editor.formats import WordProcessingFormats

with Editor("input.docx") as editor:
    editable = editor.edit()

    editor.save(editable, "same.docx", WordProcessingSaveOptions(WordProcessingFormats.DOCX))  # round-trip
    editor.save(editable, "out.pdf",  PdfSaveOptions())        # DOCX -> PDF
    editor.save(editable, "out.md",   MarkdownSaveOptions())   # DOCX -> Markdown

To feed modified markup back into a save call, wrap it with EditableDocument.from_markup(html) (or from_markup_and_resource_folder(html, folder) / from_file(html_path, folder) when the HTML references external images/fonts on disk).

EditableDocument resources

An EditableDocument exposes its extracted assets as collections you can iterate (for r in coll / len(coll)):

Property Contents
images embedded raster/vector images
fonts extracted fonts
css stylesheets
audio audio resources (e.g. from presentations)
all_resources everything above, combined
with Editor("input.docx") as editor:
    editable = editor.edit()
    print("images:", len(editable.images), "css:", len(editable.css))
    # write HTML + every resource (images/fonts/css) into a folder:
    editable.save("page.html", "page_resources")

Document info without editing

get_document_info() returns a lightweight StructView — a dict subclass exposing both snake_case attribute access and the raw PascalCase dict keys. Fields: format (nested: name, extension, mime, format_family), page_count, size, is_encrypted.

with Editor("input.docx") as editor:
    info = editor.get_document_info()        # password="..." for encrypted files
    print("pages:", info.page_count, "size:", info.size,
          "encrypted:", info.is_encrypted, "format:", info.format.name)
    # dict access still works for back-compat: info["PageCount"], info["Format"]["Name"]

Licensing

from groupdocs.editor import License

# From file
License().set_license("path/to/license.lic")

# From stream
with open("license.lic", "rb") as f:
    License().set_license(f)

Or auto-apply: export GROUPDOCS_LIC_PATH="path/to/license.lic"

Evaluation vs licensed. Without a license the library still runs, but output is restricted: PDF output carries an evaluation watermark, other formats show an equivalent evaluation mark, and there is a page/document-count cap. Set GROUPDOCS_LIC_PATH (or call License().set_license(...)) and re-run to clear it. A 30-day full license is free: https://purchase.groupdocs.com/temporary-license

API Reference

Editor

Method Returns Description
Editor(file_path / stream [, load_options]) Open by path or binary stream; optional *LoadOptions matching the input family. Use as a context manager.
edit([edit_options]) EditableDocument Convert the document to editable HTML/CSS; optional *EditOptions (pagination, worksheet/slide selection, …).
save(input_document, file_path, save_options) None Write the EditableDocument out; the *SaveOptions type decides the output format.
get_document_info([password]) StructView dict subclass with both snake_case attrs and PascalCase keys: format (nested), page_count, size, is_encrypted. No full edit pass needed.
form_field_manager FormFieldManager Read/update form fields (Word processing).

EditableDocument

Method Returns Description
get_content() str Full HTML document.
get_body_content([external_images_template]) str <body> inner markup only.
get_css_content([img_prefix, font_prefix]) list CSS stylesheet(s) as strings.
get_embedded_html() str Self-contained HTML with images/CSS inlined.
save(html_file_path[, resources_folder_path]) None Persist HTML (+ resources) to disk.
dispose() None Release native resources (handled by with).
images / fonts / css / audio / all_resources collection Extracted resources.
from_markup(html) (classmethod) EditableDocument Build an editable doc from modified HTML.
from_markup_and_resource_folder(html, folder) (classmethod) EditableDocument …with on-disk resources.
from_file(html_path, folder) (classmethod) EditableDocument …from an HTML file + resource folder.

License / Metered

License().set_license(path_or_stream) · Metered().set_metered_key(public, private) · Metered.get_consumption_quantity() · Metered.get_consumption_credit()

Key Patterns

  • Properties: use snake_case -- auto-mapped to .NET PascalCase
  • Context managers: with Editor(...) as e: ensures the document handle is released; EditableDocument is disposable too
  • Options families: pick the *LoadOptions / *EditOptions / *SaveOptions that matches the document family; *SaveOptions controls the output format
  • Modified markup: round-trip edited HTML through EditableDocument.from_markup(html) / from_file(...) before editor.save(...)
  • Streams: pass open("file", "rb") or io.BytesIO(data) where .NET expects a Stream; BytesIO is updated after save(stream)
  • Enums: case-insensitive, lazy-loaded (e.g., WordProcessingFormats.DOCX, MailMessageOutput.ALL)
  • Collections: for r in editable.images and len(editable.css) work on .NET collections
  • Callbacks: Python functions work for handler interfaces whose methods return None. Returning a .NET Stream from a Python callback is not supported by the binding -- use the file-path / resource-folder save overloads instead.

Platform Requirements

Platform Requirements
Windows None
Linux apt install libgdiplus libfontconfig1 ttf-mscorefonts-installer
macOS brew install mono-libgdiplus

Troubleshooting

Output is watermarked / a few pages only -- you are running unlicensed (evaluation mode). Apply a license / set GROUPDOCS_LIC_PATH.

PasswordRequiredException / IncorrectPasswordException -- the document is encrypted. Set the password on the load options: lo = WordProcessingLoadOptions(); lo.password = "..."; Editor(path, lo) (or pass password= to get_document_info).

System.Drawing.Common is not supported -- install libgdiplus: sudo apt install libgdiplus (Linux) / brew install mono-libgdiplus (macOS)

Gdip type initializer exception -- outdated libgdiplus: brew reinstall mono-libgdiplus (macOS)

Garbled text / missing fonts -- install fonts: sudo apt install ttf-mscorefonts-installer fontconfig && sudo fc-cache -f

DllNotFoundException: libSkiaSharp -- a stale system copy conflicts with the bundled version. Rename it: sudo mv /usr/local/lib/libSkiaSharp.dylib /usr/local/lib/libSkiaSharp.dylib.bak

DOTNET_SYSTEM_GLOBALIZATION_INVARIANT errors -- do NOT set this. Install ICU: sudo apt install libicu-dev

TypeLoadException -- reinstall: pip install --force-reinstall groupdocs-editor-net

Still stuck? Post your question at https://forum.groupdocs.com/c/editor/ -- the development team responds directly.