On Wed, Feb 24, 2010 at 11:00 PM, Ben Morrow <ben@morrow.me.uk> wrote:
> Quoth pagaltzis@gmx.de (Aristotle Pagaltzis):
> > * Jesse Vincent <jesse@fsck.com> [2010-02-25 01:50]:
> >
> > > Or do we actually have a better way to do _everything_ bytes
> > > does?
> >
> > Well, no. We do have correct ways for every misguided use of
> > bytes.pm and I believe we have better ways for all its correct
> > uses, but I don’t think we have anything to offer to people with
> > insane uses. :-)
>
> OK, so how do I ensure that certain strings will never be upgraded
bytes doesn't affect "certain strings", and it doesn't prevent upgrading. It
just downgrades or encodes everything automatically, and you can do that
(with more predictable results) using utf8::downgrade or utf8::encode.
use strict;
use warnings;
use Test::More tests => 2;
use bytes;
my $x = "abcd\x{E9}fghij";
utf8::upgrade($x);
is($x, "abcd\x{E9}fghij", "prevents upgrade 1"); # pass
$x .= chr(0x2660);
chop($x) for 1..3;
is($x, "abcd\x{E9}fghij", "prevents upgrade 2"); # fail
You lose the automatic aspect by using utf8::encode or utf8::decode, though.
I don't know of anything that does it automatically and predictably.
How do I make /\d\s\w/ match ASCII-only?
use bytes only does that if the input string doesn't contains greater than
8-bit values. utf8:downgrade does the same.
use strict;
use warnings;
use utf8;
use Test::More tests => 2;
my $eacute = "\xE9";
utf8::upgrade( $eacute );
ok($eacute =~ /♠|\w/, "Control");
utf8::downgrade( $eacute );
ok($eacute !~ /♠|\w/, "Test");
> How, in general terms, do I say 'This is 8-bit data so
> keep your grubby Unicode hands off it'?
>
use bytes is lexically scoped, so use bytes will be completely ineffectual
at keeping grubby hands away. It just pretends everything is 8-bit data,
corrupting it if it isn't.
Thread Previous
|
Thread Next