[rt-users] RT 3.4.5: UTF-8 problems in the web interface
Niko Tyni
ntyni+rt-users at mappi.helsinki.fi
Tue Jun 27 08:47:24 EDT 2006
Hi rt-users,
I'm trying to get non-ASCII (mostly latin1) characters to work with RT
3.4.5, and I have problems with UTF-8 encoding in the web interface. It
looks like the characters come out in ISO-8859-1 encoding, while the
HTTP headers call it UTF-8.
I'm using PostgreSQL as the database, and its encoding is set to 'UNICODE'
(or 'UTF8', as it's called in postgresql 8.1) by rt-setup-database. When
I look at the database contents with the 'psql' command-line tool,
they look UTF8-encoded, as expected. However, in the web interface the
non-ASCII characters don't show properly. A dump with 'curl' shows that
while the HTTP headers claim that the encoding is utf-8, the characters
are actually in ISO-8859-1.
This is RT 3.4.5, perl 5.8.8 and PostgreSQL 8.1.4, on Debian. I can also
reproduce it with MySQL 5.0.22 and PostgreSQL 7.4.7, and with perl 5.8.4.
The encoding settings are untouched defaults; from RT_Config.pm:
@LexiconLanguages = qw(*) unless (@LexiconLanguages);
@EmailInputEncodings = qw(utf-8 iso-8859-1 us-ascii) unless (@EmailInputEncodings);
Set($EmailOutputEncoding , 'utf-8');
The non-ascii characters get into the database from iso-8859-1-encoded
emails. They are correctly utf-8-encoded in outgoing emails, like in
an AutoReply at creation time. Only the web interface seems to work
incorrectly.
After much fiddling, I found that this patch modifying
RT::Interface::Web::EscapeUTF8() fixes the behaviour completely for me:
--- lib/RT/Interface/Web.pm 2006/06/27 10:55:43 1.1
+++ lib/RT/Interface/Web.pm 2006/06/27 10:55:52
@@ -88,7 +88,7 @@
$val =~ s/"/"/g;
$val =~ s/'/'/g;
$$ref = $val;
- Encode::_utf8_on($$ref);
+ Encode::_utf8_off($$ref);
}
This doesn't feel like the right solution, however, as there's probably
a reason for the _utf8_on() call. Or is there?
It looks like the charset info in the HTTP headers comes from
'html/autohandler', so Apache configuration is out of this, as far as I
understand. Indeed, using anything as 'AddDefaultCharset' in the Apache
config doesn't seem to have any effect.
Can anybody tell me what I'm doing wrong, please? I haven't found anything
in the wiki or the mailing list archives, which is a bit surprising
because I'd expect this to hit other people too.
Thanks,
--
Niko Tyni ntyni at iki.fi
More information about the rt-users
mailing list