Skip to content

Commit 671f6e8

Browse files
committed
Improve handling of non-ASCII characters in message headers
Improve Anymail's normalized EmailAddress object: - Stop using Django's undocumented, deprecated sanitize_address() helper - Add ANYMAIL["IDNA_ENCODER"] setting, defaulting to idna2008 [breaking] - Implement a few useful IDNA_ENCODER presets - Add idna package as direct dependency (already sub-dependency of 'requests') - Add uts46 package optional dependency - Add base backend idna_encode method for subclass use - Subclass EmailAddress from Python's modern email.headerregistry.Address object - Add EmailAddress.as_dict() and formatting options useful for various ESPs - Add utils to apply/reverse RFC 2047 encoded-word and RFC 5322 quoted-string formatting Update each ESP's EmailBackend to use the appropriate EmailAddress and header encodings, based on testing its API's behavior for Unicode characters. Add some ESP-specific unsupported feature errors to prevent particularly problematic Unicode handling: - Brevo: Prevent non-ASCII values in custom headers to avoid raw, 8-bit utf-8 (also affects metadata, which uses a custom header) - Mailgun: Prevent EAI in from_email (API accepts EAI, but generated message is undeliverable) - Scaleway: Prevent EAI in any address field (API accepts EAI, but generated message is undeliverable) Documentation: - Add a new "International email" page in the advanced section, with general details on Unicode handling and the new IDNA_ENCODER setting - Update each ESP's page to indicate whether it handles EAI - Document some other ESP-specific Unicode quirks uncovered during testing Closes #444
1 parent 53ea757 commit 671f6e8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+2271
-261
lines changed

CHANGELOG.rst

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,66 @@ Release history
2525
^^^^^^^^^^^^^^^
2626
.. This extra heading level keeps the ToC from becoming unmanageably long
2727
28+
vNext
29+
-----
30+
31+
*Unreleased changes*
32+
33+
This release improves handling of non-ASCII characters everywhere email messages
34+
allow them, based on extensive testing of Unicode handling for all supported
35+
ESPs. There are several new workarounds for ESP bugs and a handful of new
36+
errors to help you avoid bugs that don't have workarounds. See
37+
`International email <https://anymail.dev/en/latest/tips/international_email/#idna>`_
38+
in the docs for more information.
39+
40+
Breaking changes
41+
~~~~~~~~~~~~~~~~
42+
43+
* **International domain names:** When sending email to a non-ASCII domain name,
44+
use IDNA 2008 with UTS-46 pre-processing rather than obsolete IDNA 2003
45+
encoding. This ensures email can be sent to newer domains enabled by IDNA 2008.
46+
47+
This change should make no difference for virtually all real-world email
48+
addresses that worked with earlier Anymail releases. But trying to send to
49+
emoji domains or others no longer allowed by IDNA 2008 will now raise an
50+
``AnymailInvalidAddress`` error.
51+
52+
To restore the old behavior or select a different encoding, use the new
53+
``IDNA_ENCODER`` setting. See
54+
`Domains (IDNA) <https://anymail.dev/en/latest/tips/international_email/#idna>`_
55+
in the docs.
56+
57+
As part of this change, Anymail now has a direct dependency on the ``idna``
58+
package. (It was already being installed as a sub-dependency of ``requests``.)
59+
60+
* **Brevo:** Raise an error if metadata or custom header values include non-ASCII
61+
characters. This avoids a Brevo API bug that sends unencoded 8-bit headers,
62+
which can cause bounces or dropped messages.
63+
64+
* **Mailgun:** Raise an error if the ``from_email`` uses EAI (has a non-ASCII
65+
local part). This avoids a Mailgun API bug that generates undeliverable
66+
messages.
67+
68+
* **Scaleway TEM:** Raise an error if any address field uses EAI (has a non-ASCII
69+
local part). This avoids a Scaleway API bug that generates undeliverable messages.
70+
71+
Fixes
72+
~~~~~
73+
74+
* **Brevo:** Work around a Brevo API bug which loses non-ASCII display names
75+
that also contain a comma or certain other punctuation.
76+
77+
Other
78+
~~~~~
79+
80+
* **Mandrill:** Document a Mandrill API bug that can cause an address with a
81+
non-ASCII display name to display incorrectly in some email clients.
82+
83+
* **Unisender Go:** Document a Unisender Go API bug that can cause an Reply-To
84+
address (only) with a non-ASCII display name to display incorrectly in some
85+
email clients.
86+
87+
2888
v13.1
2989
-----
3090

anymail/_idna.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Pre-packaged IDNA_ENCODER options
2+
import idna
3+
4+
from anymail.exceptions import AnymailImproperlyInstalled, _LazyError
5+
6+
try:
7+
from uts46 import encode as uts46_encode
8+
except ImportError:
9+
uts46_encode = _LazyError(
10+
AnymailImproperlyInstalled("uts46", "<your-esp>,uts46", 'IDNA_ENCODER="uts46"')
11+
)
12+
13+
__all__ = ["idna2003", "idna2008", "uts46", "none"]
14+
15+
16+
def idna2003(domain: str) -> str:
17+
"""
18+
Encode domain (if necessary) using IDNA 2003 standard.
19+
This matches Django's own behavior and requires no extra libraries.
20+
But it will fail to encode some newer IDNs that require IDNA 2008, and
21+
it will use obsolete encoding for IDNs that contain deviation characters.
22+
"""
23+
return domain.encode("idna").decode("ascii")
24+
25+
26+
def idna2008(domain: str) -> str:
27+
"""
28+
Encode domain (if necessary) using the IDNA 2008 standard
29+
with UTS46 preprocessing. (Preprocessing is required to handle
30+
case insensitivity most users would expect.)
31+
32+
Will reject some domains (e.g., emojis) that browsers allow.
33+
Relies on the third-party 'idna' package (installed with django-anymail).
34+
"""
35+
return idna.encode(domain, uts46=True).decode("ascii")
36+
37+
38+
def uts46(domain: str) -> str:
39+
"""
40+
Encode domain (if necessary) using the UTS46 standard.
41+
This is the encoding used by all modern browsers.
42+
43+
Requires the 'uts46' package (installable via 'django-anymail[uts46]' extra).
44+
"""
45+
return uts46_encode(domain).decode("ascii")
46+
47+
48+
def none(domain: str) -> str:
49+
"""
50+
Leaves domain as unencoded Unicode characters.
51+
Can be used with an ESP whose API correctly handles IDNA encoding.
52+
"""
53+
return domain

anymail/backends/amazon_ses.py

Lines changed: 47 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from .. import __version__ as ANYMAIL_VERSION
88
from ..exceptions import AnymailAPIError, AnymailImproperlyInstalled
99
from ..message import AnymailRecipientStatus
10-
from ..utils import UNSET, get_anymail_setting
10+
from ..utils import UNSET, get_anymail_setting, parse_address_list
1111
from .base import AnymailBaseBackend, BasePayload
1212

1313
try:
@@ -153,7 +153,30 @@ class AmazonSESV2SendEmailPayload(AmazonSESBasePayload):
153153
def init_payload(self):
154154
super().init_payload()
155155
self.all_recipients = [] # for parse_recipient_status
156-
self.mime_message = self.message.message()
156+
157+
# Temporarily replace the address fields on self.message with
158+
# pre-IDNA-encoded versions, only while we're converting it
159+
# to a MIME message. (We don't own self.message, so should not
160+
# permanently modify it.)
161+
address_fields = {"from_email", "to", "cc", "bcc", "reply_to"}
162+
original_values = {
163+
field: getattr(self.message, field) for field in address_fields
164+
}
165+
try:
166+
for field in address_fields:
167+
addresses = getattr(self.message, field)
168+
idna_encoded_addresses = [
169+
address.format(idna_encode=self.backend.idna_encode)
170+
for address in parse_address_list(addresses, field)
171+
]
172+
if field == "from_email":
173+
# from_email is a single string; all others are lists
174+
idna_encoded_addresses = ", ".join(idna_encoded_addresses)
175+
setattr(self.message, field, idna_encoded_addresses)
176+
self.mime_message = self.message.message()
177+
finally:
178+
for field, original_value in original_values.items():
179+
setattr(self.message, field, original_value)
157180

158181
def finalize_payload(self):
159182
# (The boto3 SES client handles base64 encoding raw_message.)
@@ -232,7 +255,8 @@ def set_recipients(self, recipient_type, emails):
232255
assert recipient_type in ("to", "cc", "bcc")
233256
destination_key = f"{recipient_type.capitalize()}Addresses"
234257
self.params.setdefault("Destination", {})[destination_key] = [
235-
email.address for email in emails
258+
email.format_addr_spec(idna_encode=self.backend.idna_encode)
259+
for email in emails
236260
]
237261

238262
def set_subject(self, subject):
@@ -355,18 +379,28 @@ def finalize_payload(self):
355379
cc_and_bcc_addresses = {}
356380
if self.recipients["cc"]:
357381
cc_and_bcc_addresses["CcAddresses"] = [
358-
cc.address for cc in self.recipients["cc"]
382+
cc.format(use_rfc2047=True, idna_encode=self.backend.idna_encode)
383+
for cc in self.recipients["cc"]
359384
]
360385
if self.recipients["bcc"]:
361386
cc_and_bcc_addresses["BccAddresses"] = [
362-
bcc.address for bcc in self.recipients["bcc"]
387+
# (display-name is not relevant for bcc recipients)
388+
bcc.format_addr_spec(idna_encode=self.backend.idna_encode)
389+
for bcc in self.recipients["bcc"]
363390
]
364391

365392
# Construct an entry with merge data for each "to" recipient:
366393
self.params["BulkEmailEntries"] = []
367394
for to in self.recipients["to"]:
368395
entry = {
369-
"Destination": dict(ToAddresses=[to.address], **cc_and_bcc_addresses),
396+
"Destination": dict(
397+
ToAddresses=[
398+
to.format(
399+
use_rfc2047=True, idna_encode=self.backend.idna_encode
400+
)
401+
],
402+
**cc_and_bcc_addresses,
403+
),
370404
"ReplacementEmailContent": {
371405
"ReplacementTemplate": {
372406
"ReplacementTemplateData": self.serialize_json(
@@ -447,8 +481,9 @@ def parse_recipient_status(self, response):
447481
return dict(zip(to_addrs, anymail_statuses))
448482

449483
def set_from_email(self, email):
450-
# this will RFC2047-encode display_name if needed:
451-
self.params["FromEmailAddress"] = email.address
484+
self.params["FromEmailAddress"] = email.format(
485+
use_rfc2047=True, idna_encode=self.backend.idna_encode
486+
)
452487

453488
def set_recipients(self, recipient_type, emails):
454489
# late-bound in finalize_payload
@@ -462,7 +497,10 @@ def set_subject(self, subject):
462497

463498
def set_reply_to(self, emails):
464499
if emails:
465-
self.params["ReplyToAddresses"] = [email.address for email in emails]
500+
self.params["ReplyToAddresses"] = [
501+
email.format(use_rfc2047=True, idna_encode=self.backend.idna_encode)
502+
for email in emails
503+
]
466504

467505
def set_extra_headers(self, headers):
468506
self.headers = headers

anymail/backends/base.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,15 @@
33

44
from django.conf import settings
55
from django.core.mail.backends.base import BaseEmailBackend
6+
from django.utils.module_loading import import_string
67
from django.utils.timezone import get_current_timezone, is_naive, make_aware
78
from requests.structures import CaseInsensitiveDict
89

910
from ..exceptions import (
1011
AnymailCancelSend,
12+
AnymailConfigurationError,
1113
AnymailError,
14+
AnymailInvalidAddress,
1215
AnymailRecipientsRefused,
1316
AnymailSerializationError,
1417
AnymailUnsupportedFeature,
@@ -32,6 +35,8 @@
3235
parse_single_address,
3336
)
3437

38+
DEFAULT_IDNA_ENCODER = "idna2008"
39+
3540

3641
class AnymailBaseBackend(BaseEmailBackend):
3742
"""
@@ -63,6 +68,26 @@ def __init__(self, *args, **kwargs):
6368
send_defaults.update(esp_send_defaults)
6469
self.send_defaults = send_defaults
6570

71+
# Initialize self._idna_encoder from IDNA_ENCODER setting
72+
# (optionally ESP-specific)
73+
self.idna_encoder = get_anymail_setting(
74+
"idna_encoder", esp_name=self.esp_name, kwargs=kwargs, default=None
75+
) or get_anymail_setting("idna_encoder", default=DEFAULT_IDNA_ENCODER)
76+
if callable(self.idna_encoder):
77+
self._idna_encoder = self.idna_encoder
78+
else:
79+
try:
80+
dotted_path = (
81+
self.idna_encoder
82+
if "." in self.idna_encoder
83+
else f"anymail._idna.{self.idna_encoder}" # built-ins
84+
)
85+
self._idna_encoder = import_string(dotted_path)
86+
except (ImportError, TypeError) as error:
87+
raise AnymailConfigurationError(
88+
f"cannot resolve IDNA_ENCODER={self.idna_encoder!r}"
89+
) from error
90+
6691
def open(self):
6792
"""
6893
Open and persist a connection to the ESP's API, and whether
@@ -157,6 +182,15 @@ def _send(self, message):
157182

158183
return True
159184

185+
def idna_encode(self, domain):
186+
try:
187+
return self._idna_encoder(domain)
188+
except ValueError as error:
189+
# ValueError includes UnicodeError, idna.IDNAError, uts46.UTS46Error, etc.
190+
raise AnymailInvalidAddress(
191+
f"Cannot encode {domain!r} using IDNA_ENCODER={self.idna_encoder!r}"
192+
) from error
193+
160194
def run_pre_send(self, message):
161195
"""Send pre_send signal, and return True if message should still be sent"""
162196
try:

0 commit comments

Comments
 (0)