develooper Front page | perl.perl5.porters | Postings from February 2010

Re: warding against bytes.pm

Thread Previous | Thread Next
From:
Eric Brine
Date:
February 24, 2010 23:40
Subject:
Re: warding against bytes.pm
Message ID:
f86994701002242339t6a62bf84n835de277eda69573@mail.gmail.com
On Wed, Feb 24, 2010 at 11:00 PM, Ben Morrow <ben@morrow.me.uk> wrote:

> Quoth pagaltzis@gmx.de (Aristotle Pagaltzis):
> > * Jesse Vincent <jesse@fsck.com> [2010-02-25 01:50]:
> >
> > > Or do we actually have a better way to do _everything_ bytes
> > > does?
> >
> > Well, no. We do have correct ways for every misguided use of
> > bytes.pm and I believe we have better ways for all its correct
> > uses, but I don’t think we have anything to offer to people with
> > insane uses. :-)
>
> OK, so how do I ensure that certain strings will never be upgraded


bytes doesn't affect "certain strings", and it doesn't prevent upgrading. It
just downgrades or encodes everything automatically, and you can do that
(with more predictable results) using utf8::downgrade or utf8::encode.

use strict;
use warnings;
use Test::More tests => 2;
use bytes;
my $x = "abcd\x{E9}fghij";
utf8::upgrade($x);
is($x, "abcd\x{E9}fghij", "prevents upgrade 1");  # pass
$x .= chr(0x2660);
chop($x) for 1..3;
is($x, "abcd\x{E9}fghij", "prevents upgrade 2");  # fail

You lose the automatic aspect by using utf8::encode or utf8::decode, though.
I don't know of anything that does it automatically and predictably.


How do I make /\d\s\w/ match ASCII-only?


use bytes only does that if the input string doesn't contains greater than
8-bit values. utf8:downgrade does the same.

use strict;
use warnings;
use utf8;
use Test::More tests => 2;
my $eacute = "\xE9";
utf8::upgrade( $eacute );
ok($eacute =~ /♠|\w/, "Control");
utf8::downgrade( $eacute );
ok($eacute !~ /♠|\w/, "Test");



> How, in general terms, do I say 'This is 8-bit data so
> keep your grubby Unicode hands off it'?
>

use bytes is lexically scoped, so use bytes will be completely ineffectual
at keeping grubby hands away. It just pretends everything is 8-bit data,
corrupting it if it isn't.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About