[rt-users] Re: UTF-8 problems
Palle Girgensohn
girgen at pingpong.net
Wed Jan 14 18:16:47 EST 2004
Hi,
I can also reproduce this, but not every time. Jan 7, there was a mail from
Jesse on the list about a problem in perl. Can this have anything to do
with it? Check it out here:
--On onsdag, januari 07, 2004 13.18.00 -0500 Jesse Vincent
<jesse at bestpractical.com> wrote:
> Nicholas has tracked the intermittent bug that causes attachment
> corruption for some users to a bug in perl's "join" method. There is a
> potential fix that doesn't involve directly modifying perl's source code,
> but we don't have that available just yet.
>
>
> On Mon, Jan 05, 2004 at 10:24:27PM -0800, Nicholas Adrian Vinen wrote:
>>
>> Hello,
>> I am a consultant for a company which uses RT for their internal
>> support. They asked me to fix a problem they were having where
>> attaching binary files to a ticket caused the file to become corrupt
>> sometimes. They tracked it down to the case where the mod_perl session
>> which serves the request to add the attachment to the ticket has
>> previously been used to perform some ticket-related operation. I finally
>> tracked down this problem to a bug in perl. Here is a detailed
>> description of the problem:
>>
>> When you attach a file to a ticket using RT it saves the file you
>> attach into a file into /tmp. It then adds a MIME::Body::File
>> record to the MIME::Entity which represents the ticket. Later, it calls
>> make_singlepart() on the MIME::Entity, which converts the entity into a
>> string. During this process, it calls as_string() on the
>> MIME::Body::File. This causes the file to be read in and printed into a
>> string using the IO::Scalar object. IO::Scalar's print() function calls
>> the function join() on the data as it is read in, before that data is
>> appended onto the destination string.
>>
>> The problem occurs inside join(). join() recycles string objects
>> into which it does the joining, which it later returns. It never
>> touches the UTF8 flag on these strings. So, on the initial run, it has
>> no strings to recycle (or few), and when they are created they are set
>> to ASCII. So all the results of join() are ASCII, which is what MIME and
>> RT wants, as ASCII is also what is used for processing binary data. The
>> problem is, on the second and subsequent executions of RT within the
>> perl system, the recycled strings often have the UTF8 flag set. So, join
>> ('', $string), where $string is ASCII, will often return a UTF8 string.
>> When this UTF8 string is later converted into ASCII it is modified, and
>> so the binary data is corrupted.
>>
>> The solution is to apply the following patch to perl (tested with
>> perl 5.8.2), which sets the UTF8 flag on the returned string to
>> something sensible.
>>
>> diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
>> --- perl-5.8.2/doop.c 2003-09-30 10:09:51.000000000 -0700
>> +++ perl-5.8.2-patched/doop.c 2004-01-05 23:23:13.000000000 -0800
>> @@ -647,6 +647,9 @@
>> register STRLEN len;
>> STRLEN delimlen;
>> STRLEN tmplen;
>> + int utf8;
>> +
>> + utf8 = (SvUTF8(del)!=0);
>>
>> (void) SvPV(del, delimlen); /* stringify and get the delimlen */
>> /* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
>> @@ -674,22 +677,37 @@
>> SvTAINTED_off(sv);
>>
>> if (items-- > 0) {
>> - if (*mark)
>> + if (*mark) {
>> + utf8 += (SvUTF8(*mark)!=0);
>> sv_catsv(sv, *mark);
>> + }
>> mark++;
>> }
>>
>> if (delimlen) {
>> for (; items > 0; items--,mark++) {
>> sv_catsv(sv,del);
>> + utf8 += (SvUTF8(*mark)!=0);
>> sv_catsv(sv,*mark);
>> }
>> }
>> else {
>> - for (; items > 0; items--,mark++)
>> + for (; items > 0; items--,mark++) {
>> + utf8 += (SvUTF8(*mark)!=0);
>> sv_catsv(sv,*mark);
>> + }
>> }
>> SvSETMAGIC(sv);
>> + if( utf8 )
>> + {
>> + if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
>> + {
>> + Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and
>> ASCII strings"); + }
>> + SvUTF8_on(sv);
>> + } else {
>> + SvUTF8_off(sv);
>> + }
>> }
>>
>> void
>>
>> There may be other perl functions with similar problems; this is
>> beyond the scope of my job, however I hope that the maintainers of
>> perl will be proactive in attempting to find and fix any similar
>> problems, as the way they have added UTF8 support to perl doesn't make
>> it obvious when such bugs exist. I'd say that any built-in function that
>> returns a string should be checked for (a) setting the UTF8 flag at all
>> and (b) whether the value it sets it to is sensible. Also I think
>> warnings when mixed types of strings are passed into functions are
>> sensible as this can be dangerous, and as we don't know what character
>> set the ASCII strings are in, the routines themselves can't really
>> handle this case properly if any extended characters are present.
>>
>> I hope this helps.
>>
>> Nicholas
>>
>
> --
> http://www.bestpractical.com/rt -- Trouble Ticketing. Free.
> _______________________________________________
> rt-devel mailing list
> rt-devel at lists.bestpractical.com
> http://lists.bestpractical.com/mailman/listinfo/rt-devel
--On onsdag, januari 14, 2004 18.01.41 +0100 Dirk Pape
<pape-rt at inf.fu-berlin.de> wrote:
> Hello,
>
> --Am Mittwoch, 14. Januar 2004 15:51 Uhr +0100 schrieb Ond?ej Sur?
> <sury.ondrej at globe.cz>:
>
>> We have same problems here. Installing hacked IO::Stringy fixed
>> corrupted attachments problem, but double encoding problem still
>> persist.
>
> same here (even after I upgraded from perl 5.8.0 to 5.8.2)
>
> Dirk.
> _______________________________________________
> rt-users mailing list
> rt-users at lists.bestpractical.com
> http://lists.bestpractical.com/mailman/listinfo/rt-users
>
> Have you read the FAQ? The RT FAQ Manager lives at http://fsck.com/rtfm
More information about the rt-users
mailing list