develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Tels
Date:
March 30, 2007 17:01
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
200703310146.54592@bloodgate.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Moin,

On Friday 30 March 2007 23:06:47 Marvin Humphrey wrote:
> On Mar 30, 2007, at 2:25 PM, Juerd Waalboer wrote:
> >> That so many users, including those as expert as Marc, possess a
> >> "broken" understanding of Perl's Unicode model suggests a flawed
> >> design.
> > I think the design is solid, but the implementation (see regex)
> > slightly
> > broken and documentation wildly misleading.
>
> I strongly disagree with this assessment.  In particular, I think
> insisting that the user be responsible for manually segregating
> character and byte-oriented data without any help from Perl is
> totally unreasonable.
>
> Look at how easily Marc made the "mistake" of commingling the two
> types of data.  It's debatable whether the fact that Perl allowed him
> to do that without complaint is a flaw with the design or the
> implementation, but it's one or the other and it's serious.
>
> Additionally, as Marc points out, there are lots of broken XS modules
> out there -- including one of mine. (KinoSearch 0.15 -- Unicode
> support is fixed as of 0.20_01, which breaks backwards
> compatibility.)  Few or none of them would be broken if Perl made it
> more difficult to move between character data and byte-oriented data
> -- errors would be flying right and left and the broken modules would
> get fixed right away.
>
> Of course I understand why that cannot be the case, but it's
> astonishing to me that you see this as a problem which can be solved
> via documentation.

I think just documenting isn't enough. We do have things like "strict", so 
if the current Perl model doesn't allow you to even detect when you mix the 
wrong kind of data, then we need module/pragma that catches these errors.

Of course warnings::encode exists, but it seems to not be able to 
distinguish between "untagged" data and real ISO-8859-1 strings as Perl 
itself doesn't make this distinction.

> How about encouraging the use of encoding::warnings in perlunitut?
>
> How about adding it to core and having 'use 5.10;' turn it on?

If I understand correctly, that would not be enough due to the "is this 
binary or really iso-8859-1 encoded data" problem mentioned above.

all the best,

tels

- -- 
 Signed on Sat Mar 31 01:42:47 2007 with key 0x93B84C15.
 View my photo gallery: http://bloodgate.com/photos
 PGP key on http://bloodgate.com/tels.asc or per email.

 "In 1988, Jack Thompson ran against Janet Reno for DA of Dade County:
 Thompson's unique campaign message was that Reno was unfit for the job
 because, as a closeted lesbian with a drinking problem, she was great
 candidate for blackmail by the criminal element. Jack never explained
 why this remained a threat even after he exposed her 'secret'. Reno
 cruised at the polls."

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRg29jncLPEOTuEwVAQJALAf/SsSjz5VB4l3Zcggd18SNmdTq8DpBLUtP
pxiPCs0fYrEtDny/HvDCbQss/nEaGmFwPaVpAA+kFp8jss3h3xzklW6MwAm7Aisy
+EiZO0JEcADXRWr9CChJpWfMr0qllmzsUUKHa6wc9iXagD6kPoiL49Ay5bkqPBDT
OKOfcJIRDqk12VKATpdQlBIHR3cEpnUMdh8QKhmAArkXAsV5cZGBC9EGm8l+dgeK
Uc2k7pxvLXdjCZu6YbJfPwwdiLlugL23Bci7sZrCO/JyboBOK3ch5dWYohZ8QoMw
SahL/axgJ1DeFTP2ryL6wvnM1djF+HSbzoaLD1E+d7XJqB700Qxdfg==
=eI9w
-----END PGP SIGNATURE-----

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About