[rt-users] RT 3.4.5: UTF-8 problems in the web interface

Niko Tyni ntyni+rt-users at mappi.helsinki.fi
Tue Jun 27 08:47:24 EDT 2006


Hi rt-users,

I'm trying to get non-ASCII (mostly latin1) characters to work with RT
3.4.5, and I have problems with UTF-8 encoding in the web interface. It
looks like the characters come out in ISO-8859-1 encoding, while the
HTTP headers call it UTF-8.

I'm using PostgreSQL as the database, and its encoding is set to 'UNICODE'
(or 'UTF8', as it's called in postgresql 8.1) by rt-setup-database. When
I look at the database contents with the 'psql' command-line tool,
they look UTF8-encoded, as expected. However, in the web interface the
non-ASCII characters don't show properly. A dump with 'curl' shows that
while the HTTP headers claim that the encoding is utf-8, the characters
are actually in ISO-8859-1.

This is RT 3.4.5, perl 5.8.8 and PostgreSQL 8.1.4, on Debian. I can also
reproduce it with MySQL 5.0.22 and PostgreSQL 7.4.7, and with perl 5.8.4.

The encoding settings are untouched defaults; from RT_Config.pm:

 @LexiconLanguages = qw(*) unless (@LexiconLanguages);
 @EmailInputEncodings = qw(utf-8 iso-8859-1 us-ascii) unless (@EmailInputEncodings);
 Set($EmailOutputEncoding , 'utf-8');

The non-ascii characters get into the database from iso-8859-1-encoded
emails. They are correctly utf-8-encoded in outgoing emails, like in
an AutoReply at creation time. Only the web interface seems to work
incorrectly.

After much fiddling, I found that this patch modifying
RT::Interface::Web::EscapeUTF8() fixes the behaviour completely for me:

--- lib/RT/Interface/Web.pm	2006/06/27 10:55:43	1.1
+++ lib/RT/Interface/Web.pm	2006/06/27 10:55:52
@@ -88,7 +88,7 @@
         $val =~ s/"/"/g;
         $val =~ s/'/'/g;
         $$ref = $val;
-        Encode::_utf8_on($$ref);
+        Encode::_utf8_off($$ref);
 
 
 }

This doesn't feel like the right solution, however, as there's probably
a reason for the _utf8_on() call. Or is there?

It looks like the charset info in the HTTP headers comes from
'html/autohandler', so Apache configuration is out of this, as far as I
understand. Indeed, using anything as 'AddDefaultCharset' in the Apache
config doesn't seem to have any effect.

Can anybody tell me what I'm doing wrong, please? I haven't found anything
in the wiki or the mailing list archives, which is a bit surprising
because I'd expect this to hit other people too.

Thanks,
-- 
Niko Tyni		ntyni at iki.fi



More information about the rt-users mailing list