Skip to content

ensure a proper encoding set - fix parsing of some UTF-8 mails

ng requested to merge wip/fix-encoding-issues into master

This addresses issues with certain mails that are failing to be parsed with the error: invalid byte sequence in US-ASCII.

We are now setting the default_external encoding to ASCII-8BIT as this is what we can assume is mostly safe to parse emails.

MTAs are usually cleaning ENV so only trusted ENV variables are set and thus e.g. LANG is not set. (E.g. exim https://lists.exim.org/lurker/message/20160302.191005.a72d8433.en.html)

This makes Ruby falling back to US-ASCII and then reading mail from STDIN fails: https://ruby-doc.org/core-2.6.1/Encoding.html#class-Encoding-label-External+encoding See e.g. https://github.com/mikel/mail/issues/1348

We had flaky tests in the past (e.g. #409 (closed)) that were failing sometimes (because some LANG env was (not) set?), we fixed the failing tests by not passing the mail through the shell and thus completely hiding/working around the actual cause.

We now introduce a bunch of emails with different charsets and pass them through the shell without LANG being set. Thus, they are failing without the enforced encoding and pass with the enforced encoding.

Closes #409 (closed)

Edited by georg

Merge request reports