On Fri, Nov 16, 2001 at 11:39:27PM -0500, Keith C. Ivey wrote:
> Your character class works only in ASCII. So for correctness
> it's better to use the POSIX character class syntax.
I don't know if POSIX takes anything but ASCII into account.
I don't have the POSIX definition of isprint handy, but I do happen to
have ANSI/ISO/IEC 9899-1999 (ie. ANSI C 1999 edition) and it's
definition of isprint is the unhelpful:
"The isprint function tests for any printing character including space (' ')"
But they do clarify what "printing character" means:
7.4 Character handling <ctype.h>
The term 'printing character' refers to a member of a
locale-specific set of characters, each of which occupies one
printing position on a display device; the term 'control
character' refers to a member of a locale-specific set of
characters that are no printing characters. All letters and
digits are printing characters.
So you have to worry about locales. That can get very tricky once you
get into high ASCII.
If you can come up with something that avoids the join/map/split combo
*and* uses isprint(), you're good.
$str =~ s/([^\x20-\x7E])/isprint($1) ? $1 : $U2P{$1}/ge;
We cheat a little by assuming that anything between space and ~ is
printable, leaving the rest up to isprint(). You might have to do
some research to see if there are any locales with unprintable
characters in that range.
Here's the "Quick brown fox" with only two control characters:
u2p_dw_cached: 1 wallclock secs ( 1.24 usr + 0.00 sys = 1.24 CPU) @ 16129.03/s (n=20000)
u2p_dw_cached_isprint: 1 wallclock secs ( 1.77 usr + 0.00 sys = 1.77 CPU) @ 11299.44/s (n=20000)
u2p_pg: 15 wallclock secs (13.91 usr + 0.05 sys = 13.96 CPU) @ 1432.66/s (n=20000)
And the one with lots:
u2p_dw_cached: 2 wallclock secs ( 2.85 usr + 0.00 sys = 2.85 CPU) @ 7017.54/s (n=20000)
u2p_dw_cached_isprint: 5 wallclock secs ( 5.59 usr + 0.00 sys = 5.59 CPU) @ 3577.82/s (n=20000)
u2p_pg: 18 wallclock secs (17.49 usr + 0.05 sys = 17.54 CPU) @ 1140.25/s (n=20000)
So you lose some speed, but you gain correctness across locales.
--
Michael G. Schwern <schwern@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <perl-qa@perl.org> Kwalitee Is Job One
We're talkin' to you, weaselnuts.
http://www.goats.com/archive/000831.html
Thread Previous
|
Thread Next