[rt-users] Bug about subject in utf-8

Alex Vandiver alex at chmrr.net
Thu Sep 1 04:39:53 EDT 2016


On Thu, 1 Sep 2016 09:42:59 +0200
Albert Shih <Albert.Shih at obspm.fr> wrote:
> > First, https://tools.ietf.org/html/rfc2047#page-5
> >   unencoded white space characters (such as SPACE and HTAB) are
> >   FORBIDDEN within an 'encoded-word'
> >
> > As such, "=?utf-8?q? #NUMBER=5D?=" is not a valid encoded-word.  
> 
> Well I think that's my bad, I change a little the subject to fit my first
> email about the tag. The real subject is
> 
>   =?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous

OK, that's a little different.  Rather better.  It still violates:

>   However, an 'encoded-word' that appears in a header field defined as
>   '*text' MUST be separated from any adjacent 'encoded-word' or 'text'
>   by 'linear-white-space'.

But:

> I'm a not very good with perl, but when I try using ruby to decode this
> line
> 
> irb(main):008:0> Mail::Encodings.unquote_and_convert_to('=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous','utf-8')
> => "Re: [Info Obspm #31684] Bonjour à vous"  
> 
> the result seem correct.

For decoders that are lenient to encoded-words that aren't
space-separated, that's correct.  The difference between this and what
you had previously is the non-encoded word between the two
encoded-words, which makes the space significant.

And indeed, this does point to an RT bug.  Namely, for historical and
bad reasons, RT doesn't use the standard MIME-words decoding library,
which would produce:

> perl -MEncode -lE 'print Encode::encode("utf8",
>                      Encode::decode("MIME-header",
>   "=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous"))'
>
> Re: [Info Obspm #31684] Bonjour à vous

Instead, it rolls its own, and gets it wrong:

> perl -Ilib -MRT=-init -le 'print RT::I18N::DecodeMIMEWordsToUTF8(
>   "=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous","Subject")'
>
> Re: [Info Obspm#31684] Bonjourà vous

Specifically, it removes spaces before the second and later
encoded-words, due to
https://github.com/bestpractical/rt/blob/stable/lib/RT/I18N.pm#L445

This looks to be a bug.  I've pushed 4.2/encoded-word-spaces to
address it; if you'd like to test the fix locally, you can apply
https://github.com/bestpractical/rt/commit/bdd6bd96 .

Thanks for the more complete bug report.
 - Alex



More information about the rt-users mailing list