-
ng authored
This revisits the fixes for #409 (merged in !301), as when trying to parse unencrypted, but signed utf-8 mails, they were failing to be parsed and thus generated another error. More in-depth later. With this fix we are changing the approach: 1. We switch to UTF-8 as default input 2. If this is not a valid encoding, we try to convert the input to UTF-8 3. If we are failing to convert, we scrub the input, so that at least what is proper UTF-8 will be passed on. ASCII-8BIT is a BINARY encoding and the mail library will only force the mail to have CRLF if it only contains ASCII conent. While UTF-8 is not a BINARY encoding and thus will get CRLF if it is a valid encoding. Getting CRLF is important for mails such as the one in `signed_utf8.eml`, as otherwise the parts detection will fail and the whole body will end up in the prologue, with no parts. Enforcing UTF-8 will still make some of our charset mails failing, as they are - validly - not UTF-8. By using a new dependency 'charlock_holmes', we are able to detect the actual encoding and thus try to convert it to UTF-8. If everything fails, we just drop the invalid characters.
39ca2227
To find the state of this project's repository at the time of any of these versions, check out the tags.