[rt-devel] Re: RT 2.1.56 (wrong charset)
Stanislav Sinyagin
ssinyagin at yahoo.com
Sat Jan 4 19:52:08 EST 2003
In addition, in Stefan's message, each Latin1 letter
was replaced with 4 characters, not 2. This means, that
HTML::Entities received already a 4-byte sequence for
each symbol. It means either double Latin1->UTF-8
conversion, or surprisingly appeared UTF-16.
--- Stanislav Sinyagin <ssinyagin at yahoo.com> wrote:
> 1)
> lib/RT/I18N/de.po is encoded Latin1.
>
> 2)
> Then it goes through lib/RT/I18N.pm and is presented as wanna-be Unicode.
> I'm not sure at this stage if it really produces unicode.
>
> 3)
> Then it goes through HTML::Entities (as told by default_escape_flags => 'h'),
> and all non-ascii characters are replaced with entities:
> Ä for a-umlaut etc.
> At this stage, HTML::Entities depends on Perl version (Stefan, what's yours?).
>
> If it's 5.6, it treats each non-ascii byte (remember, Unicode
> symbols come as two-byte symbols?) as non-ascii character, and
> produces two HTML entities per each Unicode symbol.
>
> In 5.8, each non-ascii Unicode symbol (two bytes) is
> replaced with a HTML entity. In HTML::Entities, they are defined
> for Latin1 symbols only. It means, Cyrillic (Russian) symbols would
> be replaced with (one or two?) numeric entities.
> Some browsers will survive that (in case if it's still one entity),
> but it's definitely wrong way.
More information about the Rt-devel
mailing list