[rt-users] Re: Solution to corrupt attachments problem with RT3 and perl
Jesse Vincent
jesse at bestpractical.com
Wed Jan 7 13:18:00 EST 2004
Nicholas has tracked the intermittent bug that causes attachment
corruption for some users to a bug in perl's "join" method. There is a
potential fix that doesn't involve directly modifying perl's source code,
but we don't have that available just yet.
On Mon, Jan 05, 2004 at 10:24:27PM -0800, Nicholas Adrian Vinen wrote:
>
> Hello,
> I am a consultant for a company which uses RT for their internal support. They asked me to fix a problem they were having where
> attaching binary files to a ticket caused the file to become corrupt sometimes. They tracked it down to the case where the mod_perl
> session which serves the request to add the attachment to the ticket has previously been used to perform some ticket-related operation. I
> finally tracked down this problem to a bug in perl. Here is a detailed description of the problem:
>
> When you attach a file to a ticket using RT it saves the file you attach into a file into /tmp. It then adds a MIME::Body::File
> record to the MIME::Entity which represents the ticket. Later, it calls make_singlepart() on the MIME::Entity, which converts the entity
> into a string. During this process, it calls as_string() on the MIME::Body::File. This causes the file to be read in and printed into a
> string using the IO::Scalar object. IO::Scalar's print() function calls the function join() on the data as it is read in, before that
> data is appended onto the destination string.
>
> The problem occurs inside join(). join() recycles string objects into which it does the joining, which it later returns. It never
> touches the UTF8 flag on these strings. So, on the initial run, it has no strings to recycle (or few), and when they are created they are
> set to ASCII. So all the results of join() are ASCII, which is what MIME and RT wants, as ASCII is also what is used for processing
> binary data. The problem is, on the second and subsequent executions of RT within the perl system, the recycled strings often have the
> UTF8 flag set. So, join ('', $string), where $string is ASCII, will often return a UTF8 string. When this UTF8 string is later converted
> into ASCII it is modified, and so the binary data is corrupted.
>
> The solution is to apply the following patch to perl (tested with perl 5.8.2), which sets the UTF8 flag on the returned string to
> something sensible.
>
> diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
> --- perl-5.8.2/doop.c 2003-09-30 10:09:51.000000000 -0700
> +++ perl-5.8.2-patched/doop.c 2004-01-05 23:23:13.000000000 -0800
> @@ -647,6 +647,9 @@
> register STRLEN len;
> STRLEN delimlen;
> STRLEN tmplen;
> + int utf8;
> +
> + utf8 = (SvUTF8(del)!=0);
>
> (void) SvPV(del, delimlen); /* stringify and get the delimlen */
> /* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
> @@ -674,22 +677,37 @@
> SvTAINTED_off(sv);
>
> if (items-- > 0) {
> - if (*mark)
> + if (*mark) {
> + utf8 += (SvUTF8(*mark)!=0);
> sv_catsv(sv, *mark);
> + }
> mark++;
> }
>
> if (delimlen) {
> for (; items > 0; items--,mark++) {
> sv_catsv(sv,del);
> + utf8 += (SvUTF8(*mark)!=0);
> sv_catsv(sv,*mark);
> }
> }
> else {
> - for (; items > 0; items--,mark++)
> + for (; items > 0; items--,mark++) {
> + utf8 += (SvUTF8(*mark)!=0);
> sv_catsv(sv,*mark);
> + }
> }
> SvSETMAGIC(sv);
> + if( utf8 )
> + {
> + if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
> + {
> + Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and ASCII strings");
> + }
> + SvUTF8_on(sv);
> + } else {
> + SvUTF8_off(sv);
> + }
> }
>
> void
>
> There may be other perl functions with similar problems; this is beyond the scope of my job, however I hope that the maintainers of
> perl will be proactive in attempting to find and fix any similar problems, as the way they have added UTF8 support to perl doesn't make
> it obvious when such bugs exist. I'd say that any built-in function that returns a string should be checked for (a) setting the UTF8 flag
> at all and (b) whether the value it sets it to is sensible. Also I think warnings when mixed types of strings are passed into functions
> are sensible as this can be dangerous, and as we don't know what character set the ASCII strings are in, the routines themselves can't
> really handle this case properly if any extended characters are present.
>
> I hope this helps.
>
> Nicholas
>
--
http://www.bestpractical.com/rt -- Trouble Ticketing. Free.
More information about the rt-users
mailing list