[rt-users] Bug about subject in utf-8

Thu Sep 1 02:41:07 EDT 2016

On Wed, 31 Aug 2016 23:12:28 +0200
Albert Shih <Albert.Shih at obspm.fr> wrote:
> So until known everything is correct. The problem is when the person who
> answer this ticket encode the subject like this
> 
>   =?utf-8?q?Re=3A?==?utf-8?q?_=5BRTTAG =?utf-8?q? #NUMBER=5D?= Bonjour =?utf-8?q?=C3=A0?= vous
> 
> because in that case RT drop the space between the RTTAG and the #NUMBER.

What mail client is generating that?  Whatever it is, it is violating
RFC 2047 spec in _multiple_ ways.

First, https://tools.ietf.org/html/rfc2047#page-5
  unencoded white space characters (such as SPACE and HTAB) are
  FORBIDDEN within an 'encoded-word'

As such, "=?utf-8?q? #NUMBER=5D?=" is not a valid encoded-word.

Secondly, https://tools.ietf.org/html/rfc2047#page-7
  However, an 'encoded-word' that appears in a header field defined as
  '*text' MUST be separated from any adjacent 'encoded-word' or 'text'
  by 'linear-white-space'.

As such, "=?utf-8?q?Re=3A?==?utf-8?" is not valid, as the two
"encoded-word"s are not separated by spaces.

Even ignoring those errors, the example you gave still isn't parsable.
My best attempt splits it into the following tokens:

 =?utf-8?q?Re=3A?=         # "Re:
 =?utf-8?q?_=5BRTTAG       # " [RTTAG", but no closing "?=" ?!
 =?utf-8?q?#NUMBER=5D?=    # "#NUMBER]"
 Bonjour                   # "bonjour"
 =?utf-8?q?=C3=A0?=        # "à 
 vous                      # "vous"

Were it somehow parsed as the above, RT would _still_ be correct in
omitting the space before the number, because space between
encoded-words is removed, https://tools.ietf.org/html/rfc2047#page-10 :

  When displaying a particular header field that contains multiple
  'encoded-word's, any 'linear-white-space' that separates a pair of
  adjacent 'encoded-word's is ignored.

In short, fix the mail client.  Failing that, set
$ExtractSubjectTagMatch, as this is not a bug in RT.
 - Alex