[rt-users] Email Subject Header creating fragmented strings when decoded

Thomas Sibley trs at bestpractical.com
Fri Mar 18 10:38:56 EDT 2011


On 18 Mar 2011 10:14, Lars Reimann wrote:
> Hi all,
> 
> the following problem is very annoying:
> 
> RT Encodes Subject lines using the following concept:
> 
> Original example Header
> 
> Subject:
> =?UTF-8?B?W3NlcnZpY2UubWV0YXdheXMubmV0ICM2NzAyOF0gU3BlaWNoZXJwbGF0eiBF?=
>  =?UTF-8?B?cmjDtmh1bmcgd2FzbWFpbjogNTAwIEdC?=
> 
> The header is split into 2 parts:
> 
> 1st part decoded: "[Queue Name #Ticket nubmer] First part of subject line"
> 2nd part decoded: "Second part of subject line"
> 
> Completely decoded string: "[Queue Name #Ticket nubmer] First part of
> subject line"_"Second part of subject line"
>
> The underscore (_) marks an additional space character which is
> introduced into ALL emails on decoding the two UTF parts.

I think this is actually a bug in Encode::MIME::Header's
parsing/generation of the encoded header lines.  I tracked it down when
it broke a test in other code.  I believe it was introduced with the fix
for https://rt.cpan.org/Public/Bug/Display.html?id=40027.

I've copied this mail to the bug tracker for Encode.

> I double checked with decoding UTF in python. Results: When using 2 UTF
> parts, a decode introduces an additional space. When using only ONE
> UTF-string (the above subject w/o padding and UTF header) the decode is
> done correctly!
> 
> If would be very glad the resolve this problem. If RT could use only one
> UTF string, the problem would go away.
> How can we do that?

If you're really, really annoyed by it, I believe you can downgrade to
an older Encode.  But you'll regain other bugs that have been fixed as
well, and I can't suggest it.

> And: does anyone have the same problem with email clients (we use
> evolution and thunderbird, but most likely other clients are also
> affected).
> 
> p.s. It's unclear to me when UTF encoding is used. Sometimes the Subject
> line is not UTF encoded and uses ASCII. Perhaps it depends on non-ASCII
> characters within the subject.

It's used when there are characters other than ascii in a mail header.

Thomas



More information about the rt-users mailing list