[rt-devel] Re: Solution to corrupt attachments problem with RT3 and perl

Jesse Vincent jesse at bestpractical.com
Wed Jan 7 13:18:00 EST 2004


Nicholas has tracked the intermittent bug that causes attachment
corruption for some users to a bug in perl's "join" method.  There is a
potential fix that doesn't involve directly modifying perl's source code,
but we don't have that available just yet.


On Mon, Jan 05, 2004 at 10:24:27PM -0800, Nicholas Adrian Vinen wrote:
> 
>    Hello,
>       I am a consultant for a company which uses RT for their internal support. They asked me to fix a problem they were having where
> attaching binary files to a ticket caused the file to become corrupt sometimes. They tracked it down to the case where the mod_perl
> session which serves the request to add the attachment to the ticket has previously been used to perform some ticket-related operation. I
> finally tracked down this problem to a bug in perl. Here is a detailed description of the problem:
> 
>       When you attach a file to a ticket using RT it saves the file you attach into a file into /tmp. It then adds a MIME::Body::File
> record to the MIME::Entity which represents the ticket. Later, it calls make_singlepart() on the MIME::Entity, which converts the entity
> into a string. During this process, it calls as_string() on the MIME::Body::File. This causes the file to be read in and printed into a
> string using the IO::Scalar object. IO::Scalar's print() function calls the function join() on the data as it is read in, before that
> data is appended onto the destination string.
> 
>       The problem occurs inside join(). join() recycles string objects into which it does the joining, which it later returns. It never
> touches the UTF8 flag on these strings. So, on the initial run, it has no strings to recycle (or few), and when they are created they are
> set to ASCII. So all the results of join() are ASCII, which is what MIME and RT wants, as ASCII is also what is used for processing
> binary data. The problem is, on the second and subsequent executions of RT within the perl system, the recycled strings often have the
> UTF8 flag set. So, join ('', $string), where $string is ASCII, will often return a UTF8 string. When this UTF8 string is later converted
> into ASCII it is modified, and so the binary data is corrupted.
> 
>       The solution is to apply the following patch to perl (tested with perl 5.8.2), which sets the UTF8 flag on the returned string to
> something sensible.
> 
> diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
> --- perl-5.8.2/doop.c   2003-09-30 10:09:51.000000000 -0700
> +++ perl-5.8.2-patched/doop.c   2004-01-05 23:23:13.000000000 -0800
> @@ -647,6 +647,9 @@
>      register STRLEN len;
>      STRLEN delimlen;
>      STRLEN tmplen;
> +    int utf8;
> +
> +    utf8 = (SvUTF8(del)!=0);
>  
>      (void) SvPV(del, delimlen); /* stringify and get the delimlen */
>      /* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
> @@ -674,22 +677,37 @@
>         SvTAINTED_off(sv);
>  
>      if (items-- > 0) {
> -       if (*mark)
> +       if (*mark) {
> +           utf8 += (SvUTF8(*mark)!=0);
>             sv_catsv(sv, *mark);
> +       }
>         mark++;
>      }
>  
>      if (delimlen) {
>         for (; items > 0; items--,mark++) {
>             sv_catsv(sv,del);
> +           utf8 += (SvUTF8(*mark)!=0);
>             sv_catsv(sv,*mark);
>         }
>      }
>      else {
> -       for (; items > 0; items--,mark++)
> +       for (; items > 0; items--,mark++) {
> +           utf8 += (SvUTF8(*mark)!=0);
>             sv_catsv(sv,*mark);
> +       }
>      }
>      SvSETMAGIC(sv);
> +    if( utf8 )
> +    {
> +        if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
> +       {
> +           Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and ASCII strings");
> +       }
> +        SvUTF8_on(sv);
> +    } else {
> +        SvUTF8_off(sv);
> +    }
>  }
>  
>  void
> 
>       There may be other perl functions with similar problems; this is beyond the scope of my job, however I hope that the maintainers of
> perl will be proactive in attempting to find and fix any similar problems, as the way they have added UTF8 support to perl doesn't make
> it obvious when such bugs exist. I'd say that any built-in function that returns a string should be checked for (a) setting the UTF8 flag
> at all and (b) whether the value it sets it to is sensible. Also I think warnings when mixed types of strings are passed into functions
> are sensible as this can be dangerous, and as we don't know what character set the ASCII strings are in, the routines themselves can't
> really handle this case properly if any extended characters are present.
> 
>       I hope this helps.
> 
>             Nicholas
> 

-- 
http://www.bestpractical.com/rt  -- Trouble Ticketing. Free.



More information about the Rt-devel mailing list