develooper Front page | perl.pep | Postings from November 2014

Email::MIME::Kit v3

Ricardo Signes
November 21, 2014 02:45
Email::MIME::Kit v3
Message ID:
Ever since its early releases, Email::MIME::Kit had a big problem.  It screwed
up encodings.  Specifically, imagine his manifest (I'm kinda skipping some
required junk):

  # manifest.yaml
  renderer: TemplateToolkit
    - Subject: "Message for [% name %]"
    - type: text/plain
      path: body.txt
    - type: text/html
      path: body.html

The manifest turns into a data structure before it's used, and the subject
header is a text string that, later, will get encoded into MIME encoded-words
on the assumption that it's all Unicode text.

The files on disk are read with :raw, then filled in as-is, and trusted to
already be UTF-8.

If your customer's name is Распутин, strangely enough, you're okay.  The header
handling encodes it properly, and the wide characters (because Cyrillic
codepoints are all above U+00FF) turn into UTF-8 with a warning.  On the other
hand, for some trouble, consider Ævar Arnfjörð Bjarmason.  All those codepoints
are below U+0100, so the non-ASCII ones are encoded directly, and you end up
with =C6 (Æ) in your quoted-printable body instead of =C3=86 (Æ UTF-8 encoded).

Now, you're probably actually okay.  Your email is not correct, but email
clients are good at dealing with your (read: my) stupid mistakes.  If your
email part says it's UTF-8 but it's actually Latin-1, mail clients will usually
do the right thing.

The big problem is when you've got both Ævar Arnfjörð Bjarmason and Распутин
both in your email.  Your body is a mish mash of Latin-1 and UTF-8 data.

In Email::MIME::Kit v3, templates (or non-template bodies) loaded from disk are
— if and only if they're for text/* parts — decoded into text and then, when
the email is assembled, it's encoded by Email::MIME's usual header_str

There's a case where this can start making things worse, rather than better.
If you know that templates in files are treated as bytes, you might be passing
in strings pre-encoded into UTF-8.  If that was the case, it will now become

Finally, plugins that read kit contents for uses as text will need upgrading.
The only one I know of like this is my own
Email::MIME::Kit::Assembler::Markdown.  I will fix it.  The trick is: look at
what content-type is being built and consider using "get_decoded_kit_entry"
instead of "get_kit_entry."

I think this is an important change, and worth the breakage.  Please look at
your use of EMK and test with v3.

rjbs Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About