[rt-users] Bug about subject in utf-8
Alex Vandiver
alex at chmrr.net
Thu Sep 1 04:39:53 EDT 2016
On Thu, 1 Sep 2016 09:42:59 +0200
Albert Shih <Albert.Shih at obspm.fr> wrote:
> > First, https://tools.ietf.org/html/rfc2047#page-5
> > unencoded white space characters (such as SPACE and HTAB) are
> > FORBIDDEN within an 'encoded-word'
> >
> > As such, "=?utf-8?q? #NUMBER=5D?=" is not a valid encoded-word.
>
> Well I think that's my bad, I change a little the subject to fit my first
> email about the tag. The real subject is
>
> =?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous
OK, that's a little different. Rather better. It still violates:
> However, an 'encoded-word' that appears in a header field defined as
> '*text' MUST be separated from any adjacent 'encoded-word' or 'text'
> by 'linear-white-space'.
But:
> I'm a not very good with perl, but when I try using ruby to decode this
> line
>
> irb(main):008:0> Mail::Encodings.unquote_and_convert_to('=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous','utf-8')
> => "Re: [Info Obspm #31684] Bonjour à vous"
>
> the result seem correct.
For decoders that are lenient to encoded-words that aren't
space-separated, that's correct. The difference between this and what
you had previously is the non-encoded word between the two
encoded-words, which makes the space significant.
And indeed, this does point to an RT bug. Namely, for historical and
bad reasons, RT doesn't use the standard MIME-words decoding library,
which would produce:
> perl -MEncode -lE 'print Encode::encode("utf8",
> Encode::decode("MIME-header",
> "=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous"))'
>
> Re: [Info Obspm #31684] Bonjour à vous
Instead, it rolls its own, and gets it wrong:
> perl -Ilib -MRT=-init -le 'print RT::I18N::DecodeMIMEWordsToUTF8(
> "=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous","Subject")'
>
> Re: [Info Obspm#31684] Bonjourà vous
Specifically, it removes spaces before the second and later
encoded-words, due to
https://github.com/bestpractical/rt/blob/stable/lib/RT/I18N.pm#L445
This looks to be a bug. I've pushed 4.2/encoded-word-spaces to
address it; if you'd like to test the fix locally, you can apply
https://github.com/bestpractical/rt/commit/bdd6bd96 .
Thanks for the more complete bug report.
- Alex
More information about the rt-users
mailing list