[rt-users] Different charsets problem

Bruce Campbell bruce_campbell at ripe.net
Tue Feb 19 17:48:38 EST 2002


On Tue, 19 Feb 2002, Jan Okrouhly wrote:

> I've look at this (man MIME::Head /decode). This is just another
> (maybe also important) problem with To, From, Subject etc. This actual
> behavior is just fine to me. The main problem is that charset information
> from Content-Type is not stored/used.
> Example:
> Content-Type: text/plain;
>         charset="iso-8859-2"

If you put extra stuff in Attachments.Content{Type,Encoding} at the
present time, you will cause random breakages down the track as various
regexes on those field will suddenly not work.

> I suppose the right behavior will be to reencode all incomming plain texts
> into one internal encoding (UTF8 should be the best). The
> Attachments.ContentEncoding could just fit for those, but it need a BIG
> work around ;-(I think).

yup.

> > planned for it (see the SQL Users.{Lang,EmailEncoding,WebEncoding}
> > columns).
>
> Yes, I know that schema, but not detailed Jesse's plans (are somewhere on

Telepathy seems to work for some.

> web?). In my opinion .Lang will be usable, but one user often has
> more different emails and/or webs encoding (in some heterogenous/open
> enviroment).

..

We had a discussion today on Google's default behaviour if you didn't have
a cookie saying 'I want this language'.  It will ignore what your browser
supplies, and use the dominant language of the region that it thinks your
IP address is.  ( at a rough guess, they're using the country of
registration of the ASN that originates the route to that IP ).  This is
good because natives of the country are set.  This is bad because
non-natives to the country have to look for 'Ik wil dat Google het Engels'

With RT storing language (and eventually using) on a user basis, thats
cool irrespective of the browser the user happens to be using at the time.
The encoding is to the user's browser should always be what the browser
says it can handle, and the WebEncoding in the Users table should be the
preferred one of that set.  If User uses a browser without that encoding,
fallback to us-ascii.  Email encoding is something that RT shouldn't guess
at.  If the User has said that they want encoding foo, then foo they will
get, irrespective of their (unknown to RT) email client.

Regards,

-- 
                             Bruce Campbell                            RIPE
                   Systems/Network Engineer                             NCC
                 www.ripe.net - PGP562C8B1B                      Operations





More information about the rt-users mailing list