[Rt-commit] rt branch, 4.4/fix-importer-encoding-for-pg, created. rt-4.4.3-17-g7e756b830

? sunnavy sunnavy at bestpractical.com
Tue Jul 17 15:02:09 EDT 2018


The branch, 4.4/fix-importer-encoding-for-pg has been created
        at  7e756b830c18ac0514fb10ddefa612dae337a9d6 (commit)

- Log -----------------------------------------------------------------
commit 7e756b830c18ac0514fb10ddefa612dae337a9d6
Author: sunnavy <sunnavy at bestpractical.com>
Date:   Tue Jul 3 03:26:36 2018 +0800

    Pass UTF-8 decoded data to Create method for rt-importer on Pg
    
    MySQL and Pg are different: serializer generates UTF-8 encoded bytes for
    MySQL and UTF-8 decoded string for Pg, respectively. This is not a
    problem if you stick to the same database, but when you try to migrate
    an RT db from MySQL to Pg, encoding issues could happen because Pg
    expects UTF-8 decoded values instead of encoded ones. MySQL on the other
    hand, isn't that picky.
    
    This commit fixes this particular issue by passing UTF-8 decoded string
    to Create on Pg. But for cases where the data is not valid UTF-8(quite
    rare), original values will be stored.

diff --git a/lib/RT/Migrate/Importer.pm b/lib/RT/Migrate/Importer.pm
index 6eef04532..6ed3c9739 100644
--- a/lib/RT/Migrate/Importer.pm
+++ b/lib/RT/Migrate/Importer.pm
@@ -316,6 +316,30 @@ sub Create {
     }
 
     my $obj = $class->new( RT->SystemUser );
+
+    # Unlike MySQL and Oracle, Pg stores UTF-8 strings, without this, data
+    # could be be wrongly encoded on Pg.
+    if ( RT->Config->Get( 'DatabaseType' ) eq 'Pg' ) {
+        for my $field ( keys %$data ) {
+            if ( $data->{$field} && !utf8::is_utf8( $data->{$field} ) ) {
+
+                # Make sure decoded data is valid UTF-8, otherwise Pg won't insert
+                my $decoded;
+                eval {
+                    local $SIG{__DIE__};    # don't exit importer for errors happen here
+                    $decoded = Encode::decode( 'UTF-8', $data->{$field}, Encode::FB_CROAK );
+                };
+                if ( $@ ) {
+                    warn "$uid contains invalid UTF-8 data in $field: $@, will store encoded string instead\n"
+                      . Data::Dumper::Dumper( $data ) . "\n";
+                }
+                else {
+                    $data->{$field} = $decoded;
+                }
+            }
+        }
+    }
+
     my ($id, $msg) = eval {
         # catch and rethrow on the outside so we can provide more info
         local $SIG{__DIE__};

-----------------------------------------------------------------------


More information about the rt-commit mailing list