[Rt-devel] [BUG] OriginalContent can return string with utf8 flag on

Ruslan U. Zakirov Ruslan.Zakirov at acronis.com
Tue Oct 12 06:24:44 EDT 2004


Thing that RT shouldn't use at all is Encode::from_to because of absence 
of CHECK argument. Encode uses default 0.

perldoc Encode:
If CHECK is 0, (en|de)code will put a substitution character in place of 
a malformed character.

code example:
perl -e 'my $str = "\x{442}\x{435}\x{441}\x{442}"; require Encode; 
Encode::_utf8_off($str); Encode::from_to($str, "utf8"=> "us-ascii"); 
print $str;'
output would be '????' instead of croak. RT don't like die, but current 
behaviour hide bugs. Now we have corrupted data in DB.

Codepath in RT that trigger this bug:
sub RT::I18N::SetMIMEEntityToEncoding {
...
         eval {
             $RT::Logger->debug("Converting '$charset' to '$enc' for ". 
$head->mime_type . " - ". $head->get('subject'));
 
 

             # NOTE:: see the comments at the end of the sub.
             Encode::_utf8_off( $lines[$_] ) foreach ( 0 .. $#lines );
             Encode::from_to( $lines[$_], $charset => $enc ) for ( 0 .. 
$#lines );
         };
...
}

Ruslan U. Zakirov wrote:
>     Hello.
> sub Attachment::OriginalContent {
> ...
>   if (!$enc || $enc eq '' ||  $enc eq 'utf8' || $enc eq 'utf-8') {
>     # If we somehow fail to do the decode, at least push out the raw bits
>     eval {return( Encode::decode_utf8($content))} || return ($content);
>   }
> ...
> }
> 
> decode_utf8 returns string instead of octets.
> 
> perldoc Encode:
>  $string = decode_utf8($octets [, CHECK]);
> 
> 
>                     Best regards. Ruslan.


More information about the Rt-devel mailing list