Skip to content

fix(get_addresses): accept email.header.Header inputs (no more 'Header' object has no attribute 'strip')#153

Open
rakurtz wants to merge 3 commits into
SpamScope:developfrom
rakurtz:develop
Open

fix(get_addresses): accept email.header.Header inputs (no more 'Header' object has no attribute 'strip')#153
rakurtz wants to merge 3 commits into
SpamScope:developfrom
rakurtz:develop

Conversation

@rakurtz
Copy link
Copy Markdown

@rakurtz rakurtz commented May 12, 2026

I ran into a .eml file validation error when an email contained certain strings / encodings in the header. I could be tracked down to a call of .strip() on a non-str object.

Summary

mailparser.utils.get_addresses(raw_header) calls raw_header.strip() and
passes raw_header to email.utils.getaddresses([raw_header], strict=True).
Both assume a plain str, but core.py feeds get_addresses whatever
email.message.Message.get(name) returns:

# src/mailparser/core.py
elif name_header in ADDRESSES_HEADERS:
    raw_header = self.message.get(name_header, "") if self.message else ""
    ...
    parsed_addresses = get_addresses(raw_header)

For any address header containing RFC 2047 encoded-words — typical for non-ASCII display names like =?utf-8?q?=C3=81rp=C3=A1d?= user@example.com — Message.get() may return an email.header.Header instance. That instance has no .strip(), so MailParser.from_bytes(...) (or parse_from_bytes) raises:

AttributeError: 'Header' object has no attribute 'strip'
…on perfectly valid mail and the entire parse fails.

This is the same shape of bug that #149 fixed in ported_string; that PR covered attachment header values but the address-parsing path was missed.

Fix

Normalise raw_header once at the top of get_addresses:

if raw_header is None:
        return []
    if isinstance(raw_header, email.header.Header):
        raw_header = decode_header_part(raw_header.encode())
    elif not isinstance(raw_header, str):
        raw_header = str(raw_header)

(edited to reflect the last commit that ensures correct encoding/decoding)

email.header.Header.__str__ already produces the encoded-word string form that the existing _ADDR_FALLBACK_RE and email.utils.getaddresses know how to handle, and core.py decodes the display name afterwards via decode_header_part. The strict-string happy path is unchanged.

Tests

Adds three focused unit tests in tests/test_utils.py::TestUtilsEdgeCases (matching the style of the existing test_ported_string_handles_header_object test from #149):

test_get_addresses_handles_header_object — direct regression. Constructs an email.header.Header from an encoded-word value and asserts get_addresses returns the parsed [(display_name, "user@example.com")] without raising.
test_get_addresses_handles_none — defensive: get_addresses(None) returns [] instead of crashing on attribute access.
test_get_addresses_plain_string_unchanged — guards the existing happy path so the new isinstance(raw_header, str) branch can't silently regress the common case.

Full local test run on Python 3.13:

============================= 190 passed in 1.00s ==============================
TOTAL: 99% coverage
Reproducer (without the fix)
from email.header import Header
from mailparser.utils import get_addresses
get_addresses(Header(
    "=?utf-8?q?=C3=81rp=C3=A1d?= <user@example.com>",
    charset="utf-8",
))
# AttributeError: 'Header' object has no attribute 'strip'

Notes

No behaviour change for str inputs.
Public docstring of get_addresses updated to document the accepted input types (str | email.header.Header | None).
No changes to core.py — the fix is local to the helper so any other callers (current or future) get the same robustness automatically.

rakurtz and others added 3 commits May 12, 2026 18:58
Message.get() returns email.header.Header for RFC 2047 encoded-word
headers (non-ASCII display names). The Header type has no .strip()
method and is not a valid input to email.utils.getaddresses, which
caused MailParser.from_bytes() to raise

    AttributeError: 'Header' object has no attribute 'strip'

for any mail with a non-ASCII display name in an address header.
Normalize input at the top of get_addresses() so callers can pass
either a str, a Header, or None.

Adds three regression tests in tests/test_utils.py:
- get_addresses with an email.header.Header instance
- get_addresses with None
- get_addresses with a plain str (guards the existing happy path)

Co-authored-by: Cursor <cursoragent@cursor.com>
Coercing email.header.Header via str() avoids the AttributeError crash,
but can be lossy for some real-world headers and surface replacement
characters (�) in rendered output.

Use Header.encode() in get_addresses() to preserve RFC 2047 encoded-words.
This keeps the header parseable by email.utils.getaddresses while allowing
core.py to decode display names later via decode_header_part() without
introducing replacement chars.

Adds a regression test that parses a full message via MailParser.from_bytes
and asserts Unicode display names are preserved on mail.to.

Co-authored-by: Cursor <cursoragent@cursor.com>
Some real-world mails produce To/From Header objects that render as
encoded-word tokens like =?unknown-8bit?...?=. Passing those directly to
email.utils.getaddresses(strict=True) can misclassify the token itself as
the address, leaking encoded text into downstream output (e.g. PDF To: line).

Decode Header values first via decode_header_part(raw_header.encode())
so getaddresses operates on a parseable Unicode string
(e.g. "Álpám Longsom <recipient@example.com>").

Adds regression tests for:
- get_addresses with unknown-8bit Header objects
- MailParser.from_bytes end-to-end unknown-8bit display-name decoding

Co-authored-by: Cursor <cursoragent@cursor.com>
@rakurtz
Copy link
Copy Markdown
Author

rakurtz commented May 12, 2026

after the first commit the PR is based on i realized that the string encoding was broken. the second and third commit fix this by decoding the Header values before address parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant