develooper Front page | perl.perl5.porters | Postings from May 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
May 1, 2011 10:21
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
29746.1304270463@chthon
> The previous email had a file-encoding issue on my system.  The 
> attachment was saved in Latin1.  

It worked fine for me.

> Here is the same email, with the attachment in utf8

Um, howso?   Here are the MIME instructions from the first piece:

    Content-Type: text/plain;
     name="options"
    Content-Transfer-Encoding: base64
    Content-Disposition: attachment;
     filename="options"

whereas here they are from second piece:

    Content-Type: text/plain;
     name="options"
     Content-Transfer-Encoding: base64
     Content-Disposition: attachment;
      filename="options"

I see no difference at all.

When I send things like that, I send them this way:

    Content-Description: the greek_mfold.t file (in UTF-8)
    Content-Disposition: inline
    Content-Type: text/plain; charset="UTF-8"; filename="greek_mfold.t"; name="greek_mfol
    d.t"
    Content-ID: <31292.1304206558.2@chthon.perl.com>
    Content-Transfer-Encoding: quoted-printable

That way the charset is plainly [:)] specified.  Your anonymous 
base64 binaries don't.  Although I don't know why for you they
got double-encoded into Latin1.  You're using:

    User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8

So that must be doing something sneaky.

> I am not comfortable shipping 5.14 as-is.  I think we need to do 
> something.  If it is deemed ok for the churn, then I think option 4 is 
> the best; otherwise options 3b or 3c would be my recommendation.

You've done a customarily thorough job in delineating the problem-space
for us, Karl.  Thank you very very much for that.

As for anything else, I would like to take some time this morning to 
weight the various pros and cons of each individual option.

Central to all this is the surprises that can occur when patterns
implicitly written  for byte data are extended to Unicode character data.
I have a hunch this is only one clear example of this issue, and that
there are others, too.  

I'll get back to you.

thanks again,

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About