develooper Front page | perl.fwp | Postings from November 2001

Re: isprint Golf Challenge

Thread Previous | Thread Next
From:
Michael G Schwern
Date:
November 16, 2001 21:08
Subject:
Re: isprint Golf Challenge
Message ID:
20011117000632.D784@blackrider
On Fri, Nov 16, 2001 at 11:39:27PM -0500, Keith C. Ivey wrote:
> Your character class works only in ASCII.  So for correctness 
> it's better to use the POSIX character class syntax.

I don't know if POSIX takes anything but ASCII into account.

I don't have the POSIX definition of isprint handy, but I do happen to
have ANSI/ISO/IEC 9899-1999 (ie. ANSI C 1999 edition) and it's
definition of isprint is the unhelpful:

"The isprint function tests for any printing character including space (' ')"

But they do clarify what "printing character" means:

    7.4 Character handling <ctype.h>

    The term 'printing character' refers to a member of a
    locale-specific set of characters, each of which occupies one
    printing position on a display device; the term 'control
    character' refers to a member of a locale-specific set of
    characters that are no printing characters.  All letters and
    digits are printing characters.

So you have to worry about locales.  That can get very tricky once you
get into high ASCII.

If you can come up with something that avoids the join/map/split combo
*and* uses isprint(), you're good.  

    $str =~ s/([^\x20-\x7E])/isprint($1) ? $1 : $U2P{$1}/ge;

We cheat a little by assuming that anything between space and ~ is
printable, leaving the rest up to isprint().  You might have to do
some research to see if there are any locales with unprintable
characters in that range.

Here's the "Quick brown fox" with only two control characters:

u2p_dw_cached:  1 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 16129.03/s (n=20000)
u2p_dw_cached_isprint:  1 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 11299.44/s (n=20000)
    u2p_pg: 15 wallclock secs (13.91 usr +  0.05 sys = 13.96 CPU) @ 1432.66/s (n=20000)

And the one with lots:

u2p_dw_cached:  2 wallclock secs ( 2.85 usr +  0.00 sys =  2.85 CPU) @ 7017.54/s (n=20000)
u2p_dw_cached_isprint:  5 wallclock secs ( 5.59 usr +  0.00 sys =  5.59 CPU) @ 3577.82/s (n=20000)
   u2p_pg: 18 wallclock secs (17.49 usr +  0.05 sys = 17.54 CPU) @ 1140.25/s (n=20000)


So you lose some speed, but you gain correctness across locales.


-- 

Michael G. Schwern   <schwern@pobox.com>    http://www.pobox.com/~schwern/
Perl Quality Assurance	    <perl-qa@perl.org>	       Kwalitee Is Job One
We're talkin' to you, weaselnuts.
	http://www.goats.com/archive/000831.html

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About