develooper Front page | perl.i18n | Postings from January 2004

Solution to corrupt attachments problem with RT3 and perl

From:
Nicholas Adrian Vinen
Date:
January 6, 2004 03:19
Subject:
Solution to corrupt attachments problem with RT3 and perl
Message ID:
20040106062427.GG1768@pandora.x256.com

   Hello,
      I am a consultant for a company which uses RT for their internal support. They asked me to fix a problem they were having where
attaching binary files to a ticket caused the file to become corrupt sometimes. They tracked it down to the case where the mod_perl
session which serves the request to add the attachment to the ticket has previously been used to perform some ticket-related operation. I
finally tracked down this problem to a bug in perl. Here is a detailed description of the problem:

      When you attach a file to a ticket using RT it saves the file you attach into a file into /tmp. It then adds a MIME::Body::File
record to the MIME::Entity which represents the ticket. Later, it calls make_singlepart() on the MIME::Entity, which converts the entity
into a string. During this process, it calls as_string() on the MIME::Body::File. This causes the file to be read in and printed into a
string using the IO::Scalar object. IO::Scalar's print() function calls the function join() on the data as it is read in, before that
data is appended onto the destination string.

      The problem occurs inside join(). join() recycles string objects into which it does the joining, which it later returns. It never
touches the UTF8 flag on these strings. So, on the initial run, it has no strings to recycle (or few), and when they are created they are
set to ASCII. So all the results of join() are ASCII, which is what MIME and RT wants, as ASCII is also what is used for processing
binary data. The problem is, on the second and subsequent executions of RT within the perl system, the recycled strings often have the
UTF8 flag set. So, join ('', $string), where $string is ASCII, will often return a UTF8 string. When this UTF8 string is later converted
into ASCII it is modified, and so the binary data is corrupted.

      The solution is to apply the following patch to perl (tested with perl 5.8.2), which sets the UTF8 flag on the returned string to
something sensible.

diff -u perl-5.8.2/doop.c perl-5.8.2-patched/doop.c
--- perl-5.8.2/doop.c   2003-09-30 10:09:51.000000000 -0700
+++ perl-5.8.2-patched/doop.c   2004-01-05 23:23:13.000000000 -0800
@@ -647,6 +647,9 @@
     register STRLEN len;
     STRLEN delimlen;
     STRLEN tmplen;
+    int utf8;
+
+    utf8 = (SvUTF8(del)!=0);
 
     (void) SvPV(del, delimlen); /* stringify and get the delimlen */
     /* SvCUR assumes it's SvPOK() and woe betide you if it's not. */
@@ -674,22 +677,37 @@
        SvTAINTED_off(sv);
 
     if (items-- > 0) {
-       if (*mark)
+       if (*mark) {
+           utf8 += (SvUTF8(*mark)!=0);
            sv_catsv(sv, *mark);
+       }
        mark++;
     }
 
     if (delimlen) {
        for (; items > 0; items--,mark++) {
            sv_catsv(sv,del);
+           utf8 += (SvUTF8(*mark)!=0);
            sv_catsv(sv,*mark);
        }
     }
     else {
-       for (; items > 0; items--,mark++)
+       for (; items > 0; items--,mark++) {
+           utf8 += (SvUTF8(*mark)!=0);
            sv_catsv(sv,*mark);
+       }
     }
     SvSETMAGIC(sv);
+    if( utf8 )
+    {
+        if( utf8 != sp-oldmark+1 && ckWARN_d(WARN_UTF8) )
+       {
+           Perl_warner(aTHX_ packWARN(WARN_UTF8), "Joining UTF8 and ASCII strings");
+       }
+        SvUTF8_on(sv);
+    } else {
+        SvUTF8_off(sv);
+    }
 }
 
 void

      There may be other perl functions with similar problems; this is beyond the scope of my job, however I hope that the maintainers of
perl will be proactive in attempting to find and fix any similar problems, as the way they have added UTF8 support to perl doesn't make
it obvious when such bugs exist. I'd say that any built-in function that returns a string should be checked for (a) setting the UTF8 flag
at all and (b) whether the value it sets it to is sensible. Also I think warnings when mixed types of strings are passed into functions
are sensible as this can be dangerous, and as we don't know what character set the ASCII strings are in, the routines themselves can't
really handle this case properly if any extended characters are present.

      I hope this helps.

            Nicholas




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About