Front page | perl.perl5.porters |
Postings from August 2001
This week on Perl5-porters
From: Simon Cozens
August 15, 2001 10:28
This week on Perl5-porters
Message ID: 20010815182330.A13848@deep-dark-truthful-mirror
[Apologies for the unannounced leave of absence: the conference had
me very busy. - SC]
The HTML version will be up in a few hours (if it isn't already) at
The current report is always available at
An RSS file of this week's report will be available at
The current RSS file is always available at
The archive of past reports:
To subscribe to this digest, send an empty message to
This week on perl5-porters (8 -- 15 August 2001)
* POD Specification
* Unicode Normalisation
* Threading Semantics
* Shrinking down the Perl install
Please send corrections and additions to
perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year
and month. Changes and additions to the perl5-porters biographies are
I'm back. This week saw the usual just-over-400 messages.
Sean Burke attempted to save the world:
Over the past several years, I have heard nothing but griping from
all quarters about the perpetually underspecified state of perlpod,
when considered as a pod specification. A markup language without a
clear specification simply invites everyone, including
implementors, to have their own shaky idea of what the language
means. And that, along with the general tendency for markup
language parsing to produce write-only code, explains much of the
upsetting current state of Pod::* modules.
And so he did something about it. He wrote two documents, a completely
rewritten perlpod and a new perlpodspec, which clarified the
sense of POD without adding anything much in the way of new features.
The spec in particular is extremely useful for those contemplating
writing POD formatters. Jarkko reported that he'd be including them in
5.8.0, but Russ complained that the translators didn't fully match the
specification. Jarkko's primary concern was cleaning up the horrendous
state of the L<..> tag, which is often abused. Russ picked up
primarily on Sean's request that translators assume the input text to
be UTF-8, although what that actually means is not specified. Sean
clarified what he meant in quite a long message:
First off, my intent is to declare Unicode to be POD's "the
reference character set" (ugh, and I thought I could get out of
this without using SGML jargon), for purposes of resolving
So, for example, if I say E<233>, that is to mean the e-acute
character, because in Unicode, code point 233 is e-acute.
Whether "print ord 233" on your terminal prints an e-acute, an
ess-tsett, a gimel, a "tsu" katakana, a double-dagger, or whether
it lasers a hole thru your monitor's glass, is a whole different
E<233> does NOT mean "simply pass a literal 233 blindly thru to the
formatter". E<233> and its exact synonym E<eacute> both merely mean
"make a reasonable attempt to make an e-acute".
This wandered off into the usual Unicode semantics debate. Philip
Newton threw a brilliantly unexpected spanner in the works:
(Oh, by the way, if someone writes POD on an EBCDIC machine, *all*
of the bytes will have code points > 127 AFAIK if the author sticks
only to letters and numbers.)
Peter Prymmer also chastised the use of the word "ASCII" in the
Sarathy suggested we mandate POD to be in Unicode, and move towards
assuming all Perl code to be in UTF8, and mandating unicode semantics
on things like chr $xwhen [$x] is less than 255. There did not appear
to be anyone forging his posts.
And while we're on the subject of Unicode, Sadahiro Tomoyuki rocks my
world. Oh my. Not only has he produced an alternative and much cleaner
(and more complete) module to handle normalization of Unicode data, he
achieved the near-impossible and implemented the Unicode collation
algorithm detailed in Unicode Technical Report 10. This is a major
bonus, since it allows us to correctly compare and store Unicode
(Incidentally, Nick Ing-Simmons pointed out that "normalize" is
incorrect, according to OED and Fowler. Surprisingly for p5p, this
generated less comment than the technical content of the thread.)
Artur reported on the remaining problems with iThreads:
* Request for threads->kill, to kill a given thread, is this
something we should have? Should we try to do what POSIX does with
all these cancelation points and cancelation callbacks? We know
what mutexes we own so we can make sure they are all canceled. My
belief [is] that this is needed.
On unix this can be done by a signal(?), I don't know about Win32.
* Right now, it seems like the parent interpreter must still stick
around for its children. I think it will have to stay like this
for 5.8 and something that can be fixed for 5.10 with a global SV
arena; right now we use the first interpreter as the store.
* Quitting the main thread; now, this kills all threads (as normal
threading). However this usually comes at bad times. And we don't
get proper cleanup (and segfaults). I think we should wait for all
threads to finish, letting the user kill them.
* lock, unlock, share, are now implmented in threads::shared, You
have to manually unlock(), it is not scope based. I think the
implementation of lock() unlock() should perhaps move into pp_lock
and pp_unlock, share() cond_*() needs new reference prototype.
* Sharing probably needs a new kind of magic, but I think that can
wait until we see sharing working as we want.
* A shared blessed object: should the destructor be called once for
each thread, or in the thread where it actually is destroyed? Is
there a way to catch blessings to magic/tied variables? If not, I
guess we can not rebless shared data structures.
Dan Sugalski suggested that people ought to write shutdown
synchronisation code themselves, to ensure that the shared data
structures are in a sane state. Benjamin Stuhl asked for a detach
method to disown other threads, so that the main interpreter can
shutdown while blissfully ignorant of what's going on elsewhere. Dan
made the very good point that we can provide surprising behaviour
When you come to threaded programming without any experience,
everything is surprising, and lots of things don't make sense.
Inexperienced thread programmers *will* screw themselves up.
Repeatedly. Threads are a very powerful tool, but one with no
guards to speak of.
The discussion got heated soon afterwards, with Artur trying to avoid
coredumps because coredumps are bad, but Dan maintaining that most
things that happen with threads are bad, and coredumps are
occasionally unavoidable - if you exit your interpreter without
shutting down all your threads, that's an abnormal exit and anything
Artur did, however, go ahead with the new shared_sv implementation,
adding two files to core and core functionality for sharing, locking
and correctly refcounting shared SVs. Wow.
Kirrily Robert (Skud) stormed onto the scene with a fantastic first
contribution: she submitted a document on Perl module style
conventions, which was generally pretty well received, with a few
The ensuing thread had some very interesting thoughts on various style
considerations. Benjamin Frantz did a benchmark of different parameter
passing conventions, finding that passing arrays is 200,000ths of a
second slower than passing hashes. So if you need a bit more speed
from your Perl code, turn your arrays into hashes. This massive
efficiency gain didn't impress Michael Schwern, though:
Speed isn't really the issue, clarity of interface is. You don't
want to confuse things in first-time module author docs by bringing
up efficiency arguments. It just confuses things.
This then turned into Yet Another CPAN Argument. Elaine Ashton did
plea for help on behalf of the CPAN testers; if you want to be a hero,
Skud also hinted she's working on a perltesttut. She also tried to
herd date and time module authors onto the datetime list. Not a bad
start. Think you can do better?
Shrinking down the Perl install
Alan Burlison gave out the now-common howl: Perl has doubled in size
from 5.005_03 to 5.6.1. However, he was trying to cram it onto the
Solaris miniroot CD so that people can write Jumpstart scripts in
Perl. Jarkko suggested throwing out pod/, (which Alan had already
done) unicode/ (with a caution that attempting to use Unicode stuff
would make demons fly out of your nose.) the headers from CORE/, plus
a bunch of the libraries.
Dan Sugalski suggested stuffing everything into miniperl, statically
linking in all the extensions.
Dave Mitchell asked what the big Unicode files were; Jarkko explained
that they could be stored compressed or left out completely, since the
information in them is stripped out into the various unicode/*.pl
files. (By the way, unicode/ has been renamed unicore/ so that we can
merge in Unicode::* modules without having to worry about case-folding
filesystems.) The only casualty would be UnicodeCD.
Andy Dougherty pointed to the Debian perl-base package, which is a
meagre 1.2Mb. This is possible by turning off the autoloader in some
of the packages, and chopping down lib/ to the absolute bare
essentials. The really serious cut, of course, would be to just ship
perl, libperl.so and Config.pm.
There was a reasonably big thrad on wild-card expansion on Windows
that I just couldn't follow at all.
Artur asked why perl_run and not perl_destruct called the END blocks;
Sarathy said it was a bug and wanted a patch, which Artur provided.
Jerrad Pierce asked how to get modules that exist in the core and not
on CPAN, such as the fixed-up version of English. Schwern encouraged
him to take the module back to CPAN.
I discovered two old, dust-covered opcodes; Abhijit Menon-Sen breathed
live into one of them, rcatline, which optimizes $a .= . Actually,
Abhijit did all sorts this week, including: fixing the calling of
FIRSTKEY on tied hashes, allowing localised tying on filehandles,
stopping FETCH being called twice on tainted data, and, to defend
accusations that he only fixes bugs without adding new functionality,
added the ever-useful panic operator.
Tony Bowden asked what would happen to the my Dog $sam declaration now
pseudohashes are dead. The responses were: i) pseudohashes aren't dead
yet, they just smell that way, and ii) nothing, since the fields
pragma would still do the trick.
The old more-than-256-files-open-on-Solaris question came up again:
this time, though, we had a sensible answer. PerlIO can be used to get
around the Solaris limitation. Alan explained that there was a section
in perlsolaris about the limitation, which was due to the extreme
backwards compatibility of Solaris harking back to a time before it
could conceive of more than 256 files.
Someone discovered that you can tie a variable with an object. The
utility of this was debated, and Schwern conceded that there'd be no
harm in documenting it. That'd be a nice little small task for
Andy Dougherty dropped in a couple of BSD patches, to the Makefile
generation and the hints file. Paul Johnson made B::Concise recognise
James Duncan allowed BEGIN blocks to be more visible from B. Robin
Houston asked whether or not we should move PL_minus_c from a bool to
a U8 to give us more flexibility; apparently we currently to
(PL_minus_c & 0x10) which, as Robin points out, is a "rather wrong
thing to do to a bool". I wonder when the Perl core was formally
forbidden from doing wrong things.
Until next week I remain, your humble and obedient servant,
This week on Perl5-porters
by Simon Cozens