Skip to content
  • ng's avatar
    fix #458 - more proper encoding handling · 39ca2227
    ng authored
    This revisits the fixes for #409 (merged in !301), as when
    trying to parse unencrypted, but signed utf-8 mails, they were
    failing to be parsed and thus generated another error. More
    in-depth later.
    
    With this fix we are changing the approach:
    
      1. We switch to UTF-8 as default input
      2. If this is not a valid encoding, we try to convert the
         input to UTF-8
      3. If we are failing to convert, we scrub the input, so that
         at least what is proper UTF-8 will be passed on.
    
    ASCII-8BIT is a BINARY encoding and the mail library will only
    force the mail to have CRLF if it only contains ASCII conent.
    While UTF-8 is not a BINARY encoding and thus will get CRLF if
    it is a valid encoding. Getting CRLF is important for mails such
    as the one in `signed_utf8.eml`, as otherwise the parts detection
    will fail and the whole body will end up in the prologue, with no
    parts.
    Enforcing UTF-8 will still make some of our charset mails failing,
    as they are - validly - not UTF-8. By using a new dependency
    'charlock_holmes', we are able to detect the actual encoding and
    thus try to convert it to UTF-8. If everything fails, we just
    drop the invalid characters.
    39ca2227
To find the state of this project's repository at the time of any of these versions, check out the tags.