[Rt-commit] rt branch, 4.0/strict-decodelob-decoding, created. rt-4.0.18-120-g82403d5

Mon Dec 16 17:38:05 EST 2013

The branch, 4.0/strict-decodelob-decoding has been created
        at  82403d5aabad7bcc155fa89db301cd9bd1fa552d (commit)

- Log -----------------------------------------------------------------
commit 82403d5aabad7bcc155fa89db301cd9bd1fa552d
Author: Kevin Falcone <falcone at bestpractical.com>
Date:   Mon Dec 16 16:14:22 2013 -0500

    Instead of the flimsy utf8 encoding, use UTF-8 and fix bogus data.
    
    Old versions of RT (especially those running on MySQL) were happy to pass
    garbage into MySQL and it was stored there, lurking, waiting for you to
    retrieve it.  If you do retrieve it and then try to treat it like UTF-8
    data (say by passing it to another system that strictly handles UTF-8
    such as PostgreSQL) it will be rejected vigorously.
    
    This converts from
    
    Encode::decode('utf8','string');
    which doesn't check the content and converts to perl's internal utf8.
    
    Encode::decode('UTF-8','string',Encode::PERLQQ);
    which converts to actual UTF-8 strings and will apply the PERLQQ filter
    documented in the Encode docs under Handling Malformed Data.
    
    This is similar to what we now do to all Web UI input in
    RT::Interface::Web::DecodeArgs

diff --git a/lib/RT/Record.pm b/lib/RT/Record.pm
index 66a6d65..b498459 100644
--- a/lib/RT/Record.pm
+++ b/lib/RT/Record.pm
@@ -820,7 +820,7 @@ sub _DecodeLOB {
         return ( $self->loc( "Unknown ContentEncoding [_1]", $ContentEncoding ) );
     }
     if ( RT::I18N::IsTextualContentType($ContentType) ) {
-       $Content = Encode::decode_utf8($Content) unless Encode::is_utf8($Content);
+       $Content = Encode::decode('UTF-8',$Content,Encode::FB_PERLQQ) unless Encode::is_utf8($Content);
     }
         return ($Content);
 }

-----------------------------------------------------------------------