Skip to content

Provide text attachment charset to ESPs that support it; normalize to utf-8 otherwise #449

@medmunds

Description

@medmunds

Email attachments with a text/* content-type need to identify the encoding used for non-ASCII characters in the charset MIME parameter. Anymail (v13.1 and earlier) is not currently passing charset to ESPs whose APIs support it, which can lead to mojibake if the recipient's email app guesses the wrong charset.

Based on recent Unicode testing related to #448, here's each ESP's support for text attachment charsets and what Anymail needs to do:

  • ESP APIs that allow charset=... in their attachment type fields. Anymail needs to provide the charset for text attachments (it's a bug that we don't):
    • Postmark (ContentType field)
    • Mailgun (in the multipart/form-data field's Content-Type header, which requests allows passing in the files list as the third element of a (name, content, content_type) tuple)
    • Mailtrap (type field)
    • Mandrill (type field)
    • Resend (content_type field, added to Resend's API at some point and not currently sent by Anymail)
    • Scaleway (type field)
    • Sparkpost (type field)
    • Unisender Go (type field)
  • ESP APIs that don't have a way to specify the charset. Anymail should ensure any text attachment content is encoded as utf-8 before calling the API:
    • Brevo: API guesses content-type from filename extension; seems to unconditionally add charset=utf-8 to all text attachments (so Anymail should ensure utf-8 encoding)
    • Mailersend: guesses content-type from filename extension; no way to get a charset on a text attachment (⚠️ and document that lack of charset can cause mojibake in some email clients)
    • Mailjet: accepts charset=... in attachment type field and includes that in the attachment headers, but also unconditionally adds charset=utf-8 for text attachments. To avoid duplicate, conflicting charset headers, Anymail should just ensure utf-8 encoding.
  • Already works correctly:
    • Amazon SES (Anymail uses Python's email package to build the raw MIME message, which handles attachment charset correctly)
  • Unknown:

While we're updating the docs, should also note ESPs that incorrectly send Unicode attachment filenames as raw 8-bit utf-8 (in violation of rfc2231). This can lead to mojibake filenames:

Nearly all of the other ESPs incorrectly send attachment filenames using rfc2047. This is invalid, but a lot of email clients seem to allow it, and because rfc2047 includes the charset you at least won't get mojibake. (The two that handle Unicode attachment filenames correctly are Mailjet, which correctly uses rfc2231, and Amazon SES, because we let Python build the raw MIME message.) I'm inclined not to document this unless someone can identify an email app that displays the undecoded =?utf-8?...?= rfc2047 encoded-word rather than the decoded Unicode attachment filename.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions