fix(get_addresses): accept email.header.Header inputs (no more 'Header' object has no attribute 'strip') by rakurtz · Pull Request #153 · SpamScope/mail-parser

rakurtz · 2026-05-12T17:09:12Z

I ran into a .eml file validation error when an email contained certain strings / encodings in the header. I could be tracked down to a call of .strip() on a non-str object.

Summary

mailparser.utils.get_addresses(raw_header) calls raw_header.strip() and
passes raw_header to email.utils.getaddresses([raw_header], strict=True).
Both assume a plain str, but core.py feeds get_addresses whatever
email.message.Message.get(name) returns:

# src/mailparser/core.py
elif name_header in ADDRESSES_HEADERS:
    raw_header = self.message.get(name_header, "") if self.message else ""
    ...
    parsed_addresses = get_addresses(raw_header)

For any address header containing RFC 2047 encoded-words — typical for non-ASCII display names like =?utf-8?q?=C3=81rp=C3=A1d?= user@example.com — Message.get() may return an email.header.Header instance. That instance has no .strip(), so MailParser.from_bytes(...) (or parse_from_bytes) raises:

AttributeError: 'Header' object has no attribute 'strip'
…on perfectly valid mail and the entire parse fails.

This is the same shape of bug that #149 fixed in ported_string; that PR covered attachment header values but the address-parsing path was missed.

Fix

Normalise raw_header once at the top of get_addresses:

if raw_header is None:
        return []
    if isinstance(raw_header, email.header.Header):
        raw_header = decode_header_part(raw_header.encode())
    elif not isinstance(raw_header, str):
        raw_header = str(raw_header)

(edited to reflect the last commit that ensures correct encoding/decoding)

email.header.Header.__str__ already produces the encoded-word string form that the existing _ADDR_FALLBACK_RE and email.utils.getaddresses know how to handle, and core.py decodes the display name afterwards via decode_header_part. The strict-string happy path is unchanged.

Tests

Adds three focused unit tests in tests/test_utils.py::TestUtilsEdgeCases (matching the style of the existing test_ported_string_handles_header_object test from #149):

test_get_addresses_handles_header_object — direct regression. Constructs an email.header.Header from an encoded-word value and asserts get_addresses returns the parsed [(display_name, "user@example.com")] without raising.
test_get_addresses_handles_none — defensive: get_addresses(None) returns [] instead of crashing on attribute access.
test_get_addresses_plain_string_unchanged — guards the existing happy path so the new isinstance(raw_header, str) branch can't silently regress the common case.

Full local test run on Python 3.13:

============================= 190 passed in 1.00s ==============================
TOTAL: 99% coverage
Reproducer (without the fix)
from email.header import Header
from mailparser.utils import get_addresses
get_addresses(Header(
    "=?utf-8?q?=C3=81rp=C3=A1d?= <user@example.com>",
    charset="utf-8",
))
# AttributeError: 'Header' object has no attribute 'strip'

Notes

No behaviour change for str inputs.
Public docstring of get_addresses updated to document the accepted input types (str | email.header.Header | None).
No changes to core.py — the fix is local to the helper so any other callers (current or future) get the same robustness automatically.

Message.get() returns email.header.Header for RFC 2047 encoded-word headers (non-ASCII display names). The Header type has no .strip() method and is not a valid input to email.utils.getaddresses, which caused MailParser.from_bytes() to raise AttributeError: 'Header' object has no attribute 'strip' for any mail with a non-ASCII display name in an address header. Normalize input at the top of get_addresses() so callers can pass either a str, a Header, or None. Adds three regression tests in tests/test_utils.py: - get_addresses with an email.header.Header instance - get_addresses with None - get_addresses with a plain str (guards the existing happy path) Co-authored-by: Cursor <cursoragent@cursor.com>

Coercing email.header.Header via str() avoids the AttributeError crash, but can be lossy for some real-world headers and surface replacement characters (�) in rendered output. Use Header.encode() in get_addresses() to preserve RFC 2047 encoded-words. This keeps the header parseable by email.utils.getaddresses while allowing core.py to decode display names later via decode_header_part() without introducing replacement chars. Adds a regression test that parses a full message via MailParser.from_bytes and asserts Unicode display names are preserved on mail.to. Co-authored-by: Cursor <cursoragent@cursor.com>

Some real-world mails produce To/From Header objects that render as encoded-word tokens like =?unknown-8bit?...?=. Passing those directly to email.utils.getaddresses(strict=True) can misclassify the token itself as the address, leaking encoded text into downstream output (e.g. PDF To: line). Decode Header values first via decode_header_part(raw_header.encode()) so getaddresses operates on a parseable Unicode string (e.g. "Álpám Longsom <recipient@example.com>"). Adds regression tests for: - get_addresses with unknown-8bit Header objects - MailParser.from_bytes end-to-end unknown-8bit display-name decoding Co-authored-by: Cursor <cursoragent@cursor.com>

rakurtz · 2026-05-12T17:37:11Z

after the first commit the PR is based on i realized that the string encoding was broken. the second and third commit fix this by decoding the Header values before address parsing.

rakurtz and others added 3 commits May 12, 2026 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(get_addresses): accept email.header.Header inputs (no more 'Header' object has no attribute 'strip')#153

fix(get_addresses): accept email.header.Header inputs (no more 'Header' object has no attribute 'strip')#153
rakurtz wants to merge 3 commits into
SpamScope:developfrom
rakurtz:develop

rakurtz commented May 12, 2026 •

edited

Loading

Uh oh!

rakurtz commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rakurtz commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Tests

Notes

Uh oh!

rakurtz commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rakurtz commented May 12, 2026 •

edited

Loading