develooper Front page | perl.perl5.porters | Postings from September 2011

Re: The future of POSIX in core

Thread Previous | Thread Next
Nicholas Clark
September 2, 2011 12:22
Re: The future of POSIX in core
Message ID:
On Fri, Sep 02, 2011 at 04:08:08PM +0200, Abigail wrote:
> On Fri, Sep 02, 2011 at 08:06:16AM -0400, David Golden wrote:
> > On Fri, Sep 2, 2011 at 7:47 AM, Mark Overmeer <> wrote:

> > >   tolower This is identical to the C function, except that it can apply
> > >           to a single character or to a whole string.  Consider using the
> > >           "lc()" function, see "lc" in perlfunc, or the equivalent "\L"
> > >           operator inside doublequotish strings.
> > >
> > > "This is identical, *but*".  The ctype/tolower *macro* can only handle
> > > native characters.  POSIX::tolower takes any string containing scalar,
> > > ignores whether it is utf8 or not, and then treats all bytes in it as
> > > 7-bit characters (ascii or ebcdic), returning a new scalar with the result.
> > > We do not have a single character data-type in Perl.
> > >
> > > Let's asume that we are not implementing the programming language C. Then,
> > > the documentation should, IMO, be reduced into:
> > >
> > >  Legacy functions:
> > >     tolower     use build-in funtion lc()
> > 
> > This is a better example of where the existing POSIX functions should
> > probably warn and the documentation is best revised to say "don't use
> > this, use this other thing instead".   I'm supportive of this kind of
> > change.

tolower and toupper are actually implemented as wrappers around lc and uc
(But I didn't realise at all - I only found out because I was searching in
POSIX.xs for them, and they're not there)

The description above of what happens is accurate for the is*() functions,
and Mark's criticism valid. For example, islower() is

	SV *	charstring
	STRLEN	len;
	unsigned char *s = (unsigned char *) SvPV(charstring, len);
	unsigned char *e = s + len;
	for (RETVAL = 1; RETVAL && s < e; s++)
	    if (!islower(*s))
		RETVAL = 0;

> I'm not. There will be two kinds of people who use that function: those
> for whom it returns the wrong result, and those for whom it just works.

I'm also suspicious that it works for more people than it "should", at
first glance. There's a subtly. The code actually smashes everything
down to 8 bits, not 7. Because UTF-8 encodes Unicode codepoints > 127
as sequences of non-ASCII octets, and because for a C (or POSIX)
locale, all non-ASCII octets (IIRC) return false for lslower(), then
the above code will also be correct for any Unicode string.

I believe UTF-EBCDIC maps onto non-printing EBCDIC code points analogous
to how UTF-8 maps outside of ASCII. Hence the code won't be wrong under a
C (or POSIX) locale.

Also, I think that for many cases of locales, if the intent of the
programmer was only to match the relevant characters in the ASCII range,
the code above will *seem* to work most of the time, particularly for such
tests as isdigit() and isxdigit(). For example, UTF-8 octet sequences
misinterpreted as ISO-8859-1 are (mostly) a letter followed by 1 or more
symbols or C1 control codes, so islower() will fail on at least one
misinterpreted octet, and iscntrl() on at least one other. Most of the other
ISO-8859 variants seem to mix up letters and symbols more, KOI8-R has all the
cyrillic letters in the range of UTF-8 start byes, but Ë and ë in the range
of continuation bytes, Shift-JIS is less clear-cut etc. But the chance of
false positives is low. Bad buggy code has a fair chance of working.

> The first group is likely to find out something is wrong, even without
> a warning, and for the second group, additional warnings are just a PITA.

My hunch is that this is how it would pan out.

> In many aspects, adding a warning *IS* breakage.

I always viewed adding a warning in a maint release as potential for
unacceptable breakage, because (even without turning on fatal warnings),
it's possible to denial-of-service a system by filling logs, logs which were
previously clean. (You don't even have to fill the disk. On at least some
versions of AIX, for some ABIs, pp_fork can fail if any of the open files are
>2Gb. This is a really strange error condition to have to track down,
particularly as IIRC the process can still write to the files in question)

My assumption being that multiple someones are going to be, um, crazy enough
to upgrade to a new maint release without reading the documentation or
testing thoroughly. So it really needs to be idiotproof, else rightly or
wrongly they will be sending bug reports (or worse)

Whereas for a new major release, if anyone upgrades without testing, and has
the chutzpah to send a bug report about something, my opinion is that most
likely it should be rejected on the basis of "you get to keep both pieces",
particularly if it was a documented change. Hence in major release new
warnings are as tolerable as any other breakage. (ie not very tolerable)

> I don't think "needs a scalar" actually means "will not accept an array as
> argument".
> In fact, I consider the bug here to be:
>     $ perl -wE '@a = 2; exit @a; echo $?'
>     1

If so, that's a pretty old bug:

$ ~/Sandpit/5000/bin/perl -v
This is perl, version 5.000

Copyright 1987-1994, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.
$ ~/Sandpit/5000/bin/perl -le '@a = (2, 3, 4); exit @a'
$ echo $?

$ /usr/local/perl4/bin/perl4.036 -v

This is perl, version 4.0

$RCSfile: perl.c,v $$Revision: $$Date: 1993/02/05 19:39:30 $
Patch level: 36

Copyright (c) 1989, 1990, 1991, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 4.0 source kit.
$ /usr/local/perl4/bin/perl4.036  -le '@a = (2, 3, 4); exit @a'
$ echo $?

I don't have anything older than that to test with.
I don't have the documentation for either available to check.

> I actually expect this to print 2, and checking the manual page for exit,
> there's actually nothing in it that suggests it's evaluating its argument
> list in scalar context. Not only doesn't it say "needs a scalar", it doesn't
> even mention the word "scalar":
>   =item exit EXPR
>   X<exit> X<terminate> X<abort>
>   =item exit
>   Evaluates EXPR and exits immediately with that value.    Example:   

Yes, agree. It doesn't make any mention of forcing a scalar context.

I don't know how many other built ins in perlfunc.pod are also guilty
of this.

I think "Evaluates EXPR in scalar context and exits ..." would be enough to
be clear without being confusing, providing "in scalar context" is the
consistent term throughout the documentation, and is easily looked up itself.

Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About