[rt-users] Problems with rt 3.8.7/postgresql 8.4.2 encoding

Ernesto Hernández-Novich emhnemhn at gmail.com
Mon Jan 11 11:25:23 EST 2010


On Mon, 2010-01-11 at 08:07 -0430, Eliezer E Chávez wrote:
> Sorry for the misunderstanding, but i'm a support consultant too, so,
> i dislike others selling me... :-)
> 
> Ok, as a clarification, and in spanish:

I believe this list requests messages to be written in english.

> Creé una plantilla de autorespuesta en español, pero cuando intento
> crear un nuevo ticket y RT intenta guardar el mensaje en la base de
> datos se queja de los caracteres latinos (á, é, ñ, etc...)

Translation and edition by me, for the non spanish reading readers:

"I created an autoresponse template in spanish. If I try to create a new
ticket, when RT tries to store the message in the database, it complains
on the latin characters (a acute, e acute, n tilde, etc.).

How do I fix that? Shall I define the database as ISO-8859-1 (LATIN1)?
How do I get RT to tell PostgreSQL the encoding?

Regards and apologies."

Looking at the couple of traces you've sent, you have a typical
re-encoding problem others have hinted about. Since 0xc361 starts with
character 0xC3 (Ã) and that one is the first one in the two-byte
sequences for many UTF-8 encodings, finding out that 0xC361 was meant to
be an 'á' (a acute) is trivial. Therefore, it's clear to me that
something went from proper UTF-8 into ISO-8559-X but then was
incorrectly interpreted back as UTF-8. And by incorrectly I don't mean
the software made a mistake, but that it's improperly configured
(encoding detection isn't automatic, and it's hard even for most alert
humans).

You should verify that you're working with UTF-8 end-to-end. This means
checking that Apache2 is serving UTF-8 and accepting UTF-8, and also
keep PostgreSQL using UTF-8 as database encoding. It also means that the
data YOU input is also in UTF-8, meaning your browser has a sane
configuration and the operating system it runs on can work with UTF-8.

I'm guessing you wrote the template using a browser that was working on
UTF-8, but Apache was expecting ISO-8859 either because the browser said
it was going to provide ISO-8859 or because Apache has a (wrongly)
forced default charset. That caused the properly formed 'á' (one char,
two bytes, UTF-8) coming from the browser to be transformed by Apache
into 'Ãa' (two chars, two bytes, ISO-8859), and then when that was fed
to PostgreSQL turned out as an error because it's not proper UTF-8.

BTW, you mentioned that Oracle did not complain. It doesn't complain
because it's dangerously permissive. It just gobbled whatever you gave
it without checking. Been there, done that, it's very very sad.

So, don't change Pg to ISO-8859-1. Make sure the browser, the OS it's
running on and Apache are working in UTF-8 all the way.
-- 
Ernesto Hernández-Novich - Linux 2.6.28 i686 - Unix: Live free or die!
Geek by nature, Linux by choice, Debian of course.
If you can't aptitude it, it isn't useful or doesn't exist.
GPG Key Fingerprint = 438C 49A2 A8C7 E7D7 1500 C507 96D6 A3D6 2F4C 85E3





More information about the rt-users mailing list