[rt-users] Re: UTF-8 problems

Palle Girgensohn girgen at pingpong.net
Wed Jan 14 18:16:47 EST 2004


Hi,

I can also reproduce this, but not every time. Jan 7, there was a mail from 
Jesse on the list about a problem in perl. Can this have anything to do 
with it? Check it out here:

--On onsdag, januari 07, 2004 13.18.00 -0500 Jesse Vincent 
<jesse at bestpractical.com> wrote:

> Nicholas has tracked the intermittent bug that causes attachment
> corruption for some users to a bug in perl's "join" method.  There is a
> potential fix that doesn't involve directly modifying perl's source code,
> but we don't have that available just yet.
>
>
> On Mon, Jan 05, 2004 at 10:24:27PM -0800, Nicholas Adrian Vinen wrote:
>>
>>    Hello,
>>       I am a consultant for a company which uses RT for their internal
>>       support. They asked me to fix a problem they were having where
>> attaching binary files to a ticket caused the file to become corrupt
>> sometimes. They tracked it down to the case where the mod_perl session
>> which serves the request to add the attachment to the ticket has
>> previously been used to perform some ticket-related operation. I finally
>> tracked down this problem to a bug in perl. Here is a detailed
>> description of the problem:
>>
>>       When you attach a file to a ticket using RT it saves the file you
>>       attach into a file into /tmp. It then adds a MIME::Body::File
>> record to the MIME::Entity which represents the ticket. Later, it calls
>> make_singlepart() on the MIME::Entity, which converts the entity into a
>> string. During this process, it calls as_string() on the
>> MIME::Body::File. This causes the file to be read in and printed into a
>> string using the IO::Scalar object. IO::Scalar's print() function calls
>> the function join() on the data as it is read in, before that data is
>> appended onto the destination string.
>>
>>       The problem occurs inside join(). join() recycles string objects
>>       into which it does the joining, which it later returns. It never
>> touches the UTF8 flag on these strings. So, on the initial run, it has
>> no strings to recycle (or few), and when they are created they are set
>> to ASCII. So all the results of join() are ASCII, which is what MIME and
>> RT wants, as ASCII is also what is used for processing binary data. The
>> problem is, on the second and subsequent executions of RT within the
>> perl system, the recycled strings often have the UTF8 flag set. So, join
>> ('', $string), where $string is ASCII, will often return a UTF8 string.
>> When this UTF8 string is later converted into ASCII it is modified, and
>> so the binary data is corrupted.
>>
>>       The solution is to apply the following patch to perl (tested with
>>       perl 5.8.2), which sets the UTF8 flag on the returned string to
>> something sensible.
>>
>> diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
>> --- perl-5.8.2/doop.c   2003-09-30 10:09:51.000000000 -0700
>> +++ perl-5.8.2-patched/doop.c   2004-01-05 23:23:13.000000000 -0800
>> @@ -647,6 +647,9 @@
>>      register STRLEN len;
>>      STRLEN delimlen;
>>      STRLEN tmplen;
>> +    int utf8;
>> +
>> +    utf8 = (SvUTF8(del)!=0);
>>
>>      (void) SvPV(del, delimlen); /* stringify and get the delimlen */
>>      /* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
>> @@ -674,22 +677,37 @@
>>         SvTAINTED_off(sv);
>>
>>      if (items-- > 0) {
>> -       if (*mark)
>> +       if (*mark) {
>> +           utf8 += (SvUTF8(*mark)!=0);
>>             sv_catsv(sv, *mark);
>> +       }
>>         mark++;
>>      }
>>
>>      if (delimlen) {
>>         for (; items > 0; items--,mark++) {
>>             sv_catsv(sv,del);
>> +           utf8 += (SvUTF8(*mark)!=0);
>>             sv_catsv(sv,*mark);
>>         }
>>      }
>>      else {
>> -       for (; items > 0; items--,mark++)
>> +       for (; items > 0; items--,mark++) {
>> +           utf8 += (SvUTF8(*mark)!=0);
>>             sv_catsv(sv,*mark);
>> +       }
>>      }
>>      SvSETMAGIC(sv);
>> +    if( utf8 )
>> +    {
>> +        if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
>> +       {
>> +           Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and
>> ASCII strings"); +       }
>> +        SvUTF8_on(sv);
>> +    } else {
>> +        SvUTF8_off(sv);
>> +    }
>>  }
>>
>>  void
>>
>>       There may be other perl functions with similar problems; this is
>>       beyond the scope of my job, however I hope that the maintainers of
>> perl will be proactive in attempting to find and fix any similar
>> problems, as the way they have added UTF8 support to perl doesn't make
>> it obvious when such bugs exist. I'd say that any built-in function that
>> returns a string should be checked for (a) setting the UTF8 flag at all
>> and (b) whether the value it sets it to is sensible. Also I think
>> warnings when mixed types of strings are passed into functions are
>> sensible as this can be dangerous, and as we don't know what character
>> set the ASCII strings are in, the routines themselves can't really
>> handle this case properly if any extended characters are present.
>>
>>       I hope this helps.
>>
>>             Nicholas
>>
>
> --
> http://www.bestpractical.com/rt  -- Trouble Ticketing. Free.
> _______________________________________________
> rt-devel mailing list
> rt-devel at lists.bestpractical.com
> http://lists.bestpractical.com/mailman/listinfo/rt-devel



--On onsdag, januari 14, 2004 18.01.41 +0100 Dirk Pape 
<pape-rt at inf.fu-berlin.de> wrote:

> Hello,
>
> --Am Mittwoch, 14. Januar 2004 15:51 Uhr +0100 schrieb Ond?ej Sur?
> <sury.ondrej at globe.cz>:
>
>> We have same problems here.  Installing hacked IO::Stringy fixed
>> corrupted attachments problem, but double encoding problem still
>> persist.
>
> same here (even after I upgraded from perl 5.8.0 to 5.8.2)
>
> Dirk.
> _______________________________________________
> rt-users mailing list
> rt-users at lists.bestpractical.com
> http://lists.bestpractical.com/mailman/listinfo/rt-users
>
> Have you read the FAQ? The RT FAQ Manager lives at http://fsck.com/rtfm







More information about the rt-users mailing list