[rt-users] Re: [rt-devel] Patch for RT 3.0.3 attachment conversion problem (2)
Autrijus Tang
autrijus at autrijus.org
Thu Jun 26 12:27:31 EDT 2003
On Thu, Jun 26, 2003 at 08:09:10PM +0900, Dan Kogai wrote:
> But one thing you should be careful is that a guessed encoding is,
> after all, just a guess. You should not rely too much upon it. If you
> have alternate way to tell the encoding explicitly, use that instead.
Advice very well taken. Since this is MIME entities we're talking
about, RT will use all hints possible (content-type.charset, etc)
before falling back to Guess.
> >Cc'ing Kogai-san to try finding a solution. Kogai-san, can we
> >somehow disable this helpful guessing of "\x00", via a
> >$Encode::Guess::NoUTF32Guessing control variable or something?
>
> That's possible. Thought the name should be NoUTF1632 (horrible but
> more accurate) or something because it guesses not only UTF-32 (which
> is hardly ever used for the time being) but also UTF-16.
I'll say that $NoUTFAutoGuess is correct, which should eliminate all
unrequested-for guessing of this kind.
Code and POD patch as below, against 1.08. :-)
Thanks,
/Autrijus/
--- Guess.pm.orig Fri Jun 27 00:17:48 2003
+++ Guess.pm Fri Jun 27 00:25:33 2003
@@ -18,6 +18,7 @@
sub perlio_ok { 0 }
our @EXPORT = qw(guess_encoding);
+our $NoUTFAutoGuess = 0;
sub import { # Exporter not used so we do it on our own
my $callpkg = caller;
@@ -70,22 +71,27 @@
return unless defined $octet and length $octet;
# cheat 0: utf8 flag;
- Encode::is_utf8($octet) and return find_encoding('utf8');
+ if ( Encode::is_utf8($octet) ) {
+ return find_encoding('utf8') if !$NoUTFAutoGuess;
+ Encode::_utf8_off($octet);
+ }
# cheat 1: BOM
use Encode::Unicode;
- my $BOM = unpack('n', $octet);
- return find_encoding('UTF-16')
- if (defined $BOM and ($BOM == 0xFeFF or $BOM == 0xFFFe));
- $BOM = unpack('N', $octet);
- return find_encoding('UTF-32')
- if (defined $BOM and ($BOM == 0xFeFF or $BOM == 0xFFFe0000));
+ if (!$NoUTFAutoGuess) {
+ my $BOM = unpack('n', $octet);
+ return find_encoding('UTF-16')
+ if (defined $BOM and ($BOM == 0xFeFF or $BOM == 0xFFFe));
+ $BOM = unpack('N', $octet);
+ return find_encoding('UTF-32')
+ if (defined $BOM and ($BOM == 0xFeFF or $BOM == 0xFFFe0000));
+ }
my %try = %{$obj->{Suspects}};
for my $c (@_){
my $e = find_encoding($c) or die "Unknown encoding: $c";
$try{$e->name} = $e;
$DEBUG and warn "Added: ", $e->name;
}
- if ($octet =~ /\x00/o){ # if \x00 found, we assume UTF-(16|32)(BE|LE)
+ if (!$NoUTFAutoGuess and $octet =~ /\x00/o){ # if \x00 found, we assume UTF-(16|32)(BE|LE)
my $utf;
my ($be, $le) = (0, 0);
if ($octet =~ /\x00\x00/o){ # UTF-32(BE|LE) assumed
@@ -188,6 +194,10 @@
# tries all major Japanese Encodings as well
use Encode::Guess qw/euc-jp shiftjis 7bit-jis/;
+
+If the C<$Encode::Guess::NoUTFAutoGuess> variable is set to a true
+value, no heuristics will be applied to UTF8/16/32, and the result
+will be limited to the suspects and C<ascii>.
=over 4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://pallas.eruditorum.org/pipermail/rt-devel/attachments/20030627/36c76bf5/attachment.pgp
More information about the Rt-devel
mailing list