[rt-users] Bug about subject in utf-8

Thu Sep 1 03:42:59 EDT 2016

 Le 31/08/2016 à 23:41:07-0700, Alex Vandiver a écrit
> On Wed, 31 Aug 2016 23:12:28 +0200
> Albert Shih <Albert.Shih at obspm.fr> wrote:
> > So until known everything is correct. The problem is when the person who
> > answer this ticket encode the subject like this
> >
> >   =?utf-8?q?Re=3A?==?utf-8?q?_=5BRTTAG =?utf-8?q? #NUMBER=5D?= Bonjour =?utf-8?q?=C3=A0?= vous
> >
> > because in that case RT drop the space between the RTTAG and the #NUMBER.
>
> What mail client is generating that?  Whatever it is, it is violating

SOGo.

> RFC 2047 spec in _multiple_ ways.

And yes I didn't find any other client do that.

>
> First, https://tools.ietf.org/html/rfc2047#page-5
>   unencoded white space characters (such as SPACE and HTAB) are
>   FORBIDDEN within an 'encoded-word'
>
> As such, "=?utf-8?q? #NUMBER=5D?=" is not a valid encoded-word.

Well I think that's my bad, I change a little the subject to fit my first
email about the tag. The real subject is

  =?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous

> Secondly, https://tools.ietf.org/html/rfc2047#page-7
>   However, an 'encoded-word' that appears in a header field defined as
>   '*text' MUST be separated from any adjacent 'encoded-word' or 'text'
>   by 'linear-white-space'.
>
> As such, "=?utf-8?q?Re=3A?==?utf-8?" is not valid, as the two
> "encoded-word"s are not separated by spaces.

So can you just confirm

  =?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous

are still not valid (so I can make a bug report on the mail client).

I'm a not very good with perl, but when I try using ruby to decode this
line

irb(main):008:0> Mail::Encodings.unquote_and_convert_to('=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous','utf-8')
=> "Re: [Info Obspm #31684] Bonjour à vous"

the result seem correct. Well if I try in the other way

irb(main):009:0> Mail::Encodings.q_value_encode('Re: [Info Obspm #31684] Bonjour à vous','UTF-8')
=> "=?UTF-8?Q?Re:_[Info_Obspm_#31684]_Bonjour_=C3=A0_vous?="

>
> Even ignoring those errors, the example you gave still isn't parsable.
> My best attempt splits it into the following tokens:
>
>  =?utf-8?q?Re=3A?=         # "Re:
>  =?utf-8?q?_=5BRTTAG       # " [RTTAG", but no closing "?=" ?!
>  =?utf-8?q?#NUMBER=5D?=    # "#NUMBER]"
>  Bonjour                   # "bonjour"
>  =?utf-8?q?=C3=A0?=        # "à
>  vous                      # "vous"
>
> Were it somehow parsed as the above, RT would _still_ be correct in
> omitting the space before the number, because space between
> encoded-words is removed, https://tools.ietf.org/html/rfc2047#page-10 :
>
>   When displaying a particular header field that contains multiple
>   'encoded-word's, any 'linear-white-space' that separates a pair of
>   adjacent 'encoded-word's is ignored.
>
>
> In short, fix the mail client.  Failing that, set
> $ExtractSubjectTagMatch, as this is not a bug in RT.

Thanks a lot for your help

Regards.

--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
5 Place Jules Janssen
92195 Meudon Cedex
France
Téléphone : +33 1 45 07 76 26/+33 6 86 69 95 71
xmpp: jas at obspm.fr
Heure local/Local time:
jeu 1 sep 2016 09:21:31 CEST