Eric Brine wrote:
> On Wed, Feb 24, 2010 at 11:00 PM, Ben Morrow <ben@morrow.me.uk> wrote:
>
>> Quoth pagaltzis@gmx.de (Aristotle Pagaltzis):
>>> * Jesse Vincent <jesse@fsck.com> [2010-02-25 01:50]:
>>>
>>>> Or do we actually have a better way to do _everything_ bytes
>>>> does?
>>> Well, no. We do have correct ways for every misguided use of
>>> bytes.pm and I believe we have better ways for all its correct
>>> uses, but I don’t think we have anything to offer to people with
>>> insane uses. :-)
>> OK, so how do I ensure that certain strings will never be upgraded
>
>
> bytes doesn't affect "certain strings", and it doesn't prevent upgrading. It
> just downgrades or encodes everything automatically, and you can do that
> (with more predictable results) using utf8::downgrade or utf8::encode.
>
> use strict;
> use warnings;
> use Test::More tests => 2;
> use bytes;
> my $x = "abcd\x{E9}fghij";
> utf8::upgrade($x);
> is($x, "abcd\x{E9}fghij", "prevents upgrade 1"); # pass
> $x .= chr(0x2660);
> chop($x) for 1..3;
> is($x, "abcd\x{E9}fghij", "prevents upgrade 2"); # fail
>
> You lose the automatic aspect by using utf8::encode or utf8::decode, though.
> I don't know of anything that does it automatically and predictably.
>
>
> How do I make /\d\s\w/ match ASCII-only?
>
We were a little too late for 5.12 in getting in a /a modifier to
regexes to do just that.
>
> use bytes only does that if the input string doesn't contains greater than
> 8-bit values. utf8:downgrade does the same.
>
> use strict;
> use warnings;
> use utf8;
> use Test::More tests => 2;
> my $eacute = "\xE9";
> utf8::upgrade( $eacute );
> ok($eacute =~ /♠|\w/, "Control");
> utf8::downgrade( $eacute );
> ok($eacute !~ /♠|\w/, "Test");
>
>
>
>> How, in general terms, do I say 'This is 8-bit data so
>> keep your grubby Unicode hands off it'?
>>
>
> use bytes is lexically scoped, so use bytes will be completely ineffectual
> at keeping grubby hands away. It just pretends everything is 8-bit data,
> corrupting it if it isn't.
>
Once "use feature unicode_strings" is fully implemented, there will be
much less need to upgrade it, so many of those cases of upgrading can be
removed, for efficiency's sake, and this will improve this situation as
a side effect.
Thread Previous
|
Thread Next