develooper Front page | perl.perl5.porters | Postings from August 2001

This week on Perl5-porters

From:
Simon Cozens
Date:
August 15, 2001 10:28
Subject:
This week on Perl5-porters
Message ID:
20010815182330.A13848@deep-dark-truthful-mirror
[Apologies for the unannounced leave of absence: the conference had
 me very busy. - SC]

The HTML version will be up in a few hours (if it isn't already) at
        http://www.perl.com/pub/2001/08/p5pdigest/THISWEEK-20010815.html
 
The current report is always available at
        http://www.perl.com/p5pdigest.cgi  
 
An RSS file of this week's report will be available at
        http://www.perl.com/pub/2001/08/p5pdigest/THISWEEK-20010815.rss

The current RSS file is always available at
        http://www.perl.com/p5pdigest.cgi?s=rdf

The archive of past reports:
        http://www.perl.com/pub/q/archivep5p

To subscribe to this digest, send an empty message to
        perl5-porters-digest-subscribe@netthink.co.uk

To unsubscribe:
        perl5-porters-digest-unsubscribe@netthink.co.uk

For help:
        simon@brecon.co.uk

----------------------------------------------------------------

  This week on perl5-porters (8 -- 15 August 2001)
  
     * Notes
     * POD Specification
     * Unicode Normalisation
     * Threading Semantics
     * perlmodstyle
     * Shrinking down the Perl install
     * Various

  Notes
  
   Please send corrections and additions to
   perl-thisweek-YYYYMM@simon-cozens.org where YYYYMM is the current year
   and month. Changes and additions to the perl5-porters biographies are
   particularly welcome.
   
   I'm back. This week saw the usual just-over-400 messages.
   
  POD Specification
  
   Sean Burke attempted to save the world:
   
     Over the past several years, I have heard nothing but griping from
     all quarters about the perpetually underspecified state of perlpod,
     when considered as a pod specification. A markup language without a
     clear specification simply invites everyone, including
     implementors, to have their own shaky idea of what the language
     means. And that, along with the general tendency for markup
     language parsing to produce write-only code, explains much of the
     upsetting current state of Pod::* modules.
     
   And so he did something about it. He wrote two documents, a completely
   rewritten [1]perlpod and a new [2]perlpodspec, which clarified the
   sense of POD without adding anything much in the way of new features.
   The spec in particular is extremely useful for those contemplating
   writing POD formatters. Jarkko reported that he'd be including them in
   5.8.0, but Russ complained that the translators didn't fully match the
   specification. Jarkko's primary concern was cleaning up the horrendous
   state of the L<..> tag, which is often abused. Russ picked up
   primarily on Sean's request that translators assume the input text to
   be UTF-8, although what that actually means is not specified. Sean
   clarified what he meant in quite a [3]long message:
   
     First off, my intent is to declare Unicode to be POD's "the
     reference character set" (ugh, and I thought I could get out of
     this without using SGML jargon), for purposes of resolving
     E<number> sequences.
     
     So, for example, if I say E<233>, that is to mean the e-acute
     character, because in Unicode, code point 233 is e-acute.
     
     Whether "print ord 233" on your terminal prints an e-acute, an
     ess-tsett, a gimel, a "tsu" katakana, a double-dagger, or whether
     it lasers a hole thru your monitor's glass, is a whole different
     problem.
     
     E<233> does NOT mean "simply pass a literal 233 blindly thru to the
     formatter". E<233> and its exact synonym E<eacute> both merely mean
     "make a reasonable attempt to make an e-acute".
     
   This wandered off into the usual Unicode semantics debate. Philip
   Newton threw a brilliantly unexpected spanner in the works:
   
     (Oh, by the way, if someone writes POD on an EBCDIC machine, *all*
     of the bytes will have code points > 127 AFAIK if the author sticks
     only to letters and numbers.)
     
   Peter Prymmer also chastised the use of the word "ASCII" in the
   specification.
   
   Sarathy suggested we mandate POD to be in Unicode, and move towards
   assuming all Perl code to be in UTF8, and mandating unicode semantics
   on things like chr $xwhen [$x] is less than 255. There did not appear
   to be anyone forging his posts.
   
  Unicode Normalisation
  
   And while we're on the subject of Unicode, Sadahiro Tomoyuki rocks my
   world. Oh my. Not only has he produced an alternative and much cleaner
   (and more complete) module to handle normalization of Unicode data, he
   achieved the near-impossible and implemented the Unicode collation
   algorithm detailed in [4]Unicode Technical Report 10. This is a major
   bonus, since it allows us to correctly compare and store Unicode
   strings.
   
   (Incidentally, Nick Ing-Simmons pointed out that "normalize" is
   incorrect, according to OED and Fowler. Surprisingly for p5p, this
   generated less comment than the technical content of the thread.)
   
  Threading Semantics
  
   Artur reported on the remaining problems with iThreads:
   
     * Request for threads->kill, to kill a given thread, is this
       something we should have? Should we try to do what POSIX does with
       all these cancelation points and cancelation callbacks? We know
       what mutexes we own so we can make sure they are all canceled. My
       belief [is] that this is needed.
       On unix this can be done by a signal(?), I don't know about Win32.
     * Right now, it seems like the parent interpreter must still stick
       around for its children. I think it will have to stay like this
       for 5.8 and something that can be fixed for 5.10 with a global SV
       arena; right now we use the first interpreter as the store.
     * Quitting the main thread; now, this kills all threads (as normal
       threading). However this usually comes at bad times. And we don't
       get proper cleanup (and segfaults). I think we should wait for all
       threads to finish, letting the user kill them.
     * lock, unlock, share, are now implmented in threads::shared, You
       have to manually unlock(), it is not scope based. I think the
       implementation of lock() unlock() should perhaps move into pp_lock
       and pp_unlock, share() cond_*() needs new reference prototype.
     * Sharing probably needs a new kind of magic, but I think that can
       wait until we see sharing working as we want.
     * A shared blessed object: should the destructor be called once for
       each thread, or in the thread where it actually is destroyed? Is
       there a way to catch blessings to magic/tied variables? If not, I
       guess we can not rebless shared data structures.
       
   Dan Sugalski suggested that people ought to write shutdown
   synchronisation code themselves, to ensure that the shared data
   structures are in a sane state. Benjamin Stuhl asked for a detach
   method to disown other threads, so that the main interpreter can
   shutdown while blissfully ignorant of what's going on elsewhere. Dan
   made the very good point that we can provide surprising behaviour
   because:
   
     When you come to threaded programming without any experience,
     everything is surprising, and lots of things don't make sense.
     Inexperienced thread programmers *will* screw themselves up.
     Repeatedly. Threads are a very powerful tool, but one with no
     guards to speak of.
     
   The discussion got heated soon afterwards, with Artur trying to avoid
   coredumps because coredumps are bad, but Dan maintaining that most
   things that happen with threads are bad, and coredumps are
   occasionally unavoidable - if you exit your interpreter without
   shutting down all your threads, that's an abnormal exit and anything
   could happen.
   
   Artur did, however, go ahead with the new shared_sv implementation,
   adding two files to core and core functionality for sharing, locking
   and correctly refcounting shared SVs. Wow.
   
  perlmodstyle
  
   Kirrily Robert (Skud) stormed onto the scene with a fantastic first
   contribution: she submitted a document on [5]Perl module style
   conventions, which was generally pretty well received, with a few
   minor nits.
   
   The ensuing thread had some very interesting thoughts on various style
   considerations. Benjamin Frantz did a benchmark of different parameter
   passing conventions, finding that passing arrays is 200,000ths of a
   second slower than passing hashes. So if you need a bit more speed
   from your Perl code, turn your arrays into hashes. This massive
   efficiency gain didn't impress Michael Schwern, though:
   
     Speed isn't really the issue, clarity of interface is. You don't
     want to confuse things in first-time module author docs by bringing
     up efficiency arguments. It just confuses things.
     
   This then turned into Yet Another CPAN Argument. Elaine Ashton did
   plea for help on behalf of the CPAN testers; if you want to be a hero,
   see [6]testers.cpan.org.
   
   Skud also hinted she's working on a perltesttut. She also tried to
   herd date and time module authors onto the [7]datetime list. Not a bad
   start. Think you can do better?
   
  Shrinking down the Perl install
  
   Alan Burlison gave out the now-common howl: Perl has doubled in size
   from 5.005_03 to 5.6.1. However, he was trying to cram it onto the
   Solaris miniroot CD so that people can write Jumpstart scripts in
   Perl. Jarkko suggested throwing out pod/, (which Alan had already
   done) unicode/ (with a caution that attempting to use Unicode stuff
   would make demons fly out of your nose.) the headers from CORE/, plus
   a bunch of the libraries.
   
   Dan Sugalski suggested stuffing everything into miniperl, statically
   linking in all the extensions.
   
   Dave Mitchell asked what the big Unicode files were; Jarkko explained
   that they could be stored compressed or left out completely, since the
   information in them is stripped out into the various unicode/*.pl
   files. (By the way, unicode/ has been renamed unicore/ so that we can
   merge in Unicode::* modules without having to worry about case-folding
   filesystems.) The only casualty would be UnicodeCD.
   
   Andy Dougherty pointed to the Debian perl-base package, which is a
   meagre 1.2Mb. This is possible by turning off the autoloader in some
   of the packages, and chopping down lib/ to the absolute bare
   essentials. The really serious cut, of course, would be to just ship
   perl, libperl.so and Config.pm.
   
  Various
  
   There was a reasonably big thrad on wild-card expansion on Windows
   that I just couldn't follow at all.
   
   Artur asked why perl_run and not perl_destruct called the END blocks;
   Sarathy said it was a bug and wanted a patch, which Artur provided.
   
   Jerrad Pierce asked how to get modules that exist in the core and not
   on CPAN, such as the fixed-up version of English. Schwern encouraged
   him to take the module back to CPAN.
   
   I discovered two old, dust-covered opcodes; Abhijit Menon-Sen breathed
   live into one of them, rcatline, which optimizes $a .= . Actually,
   Abhijit did all sorts this week, including: fixing the calling of
   FIRSTKEY on tied hashes, allowing localised tying on filehandles,
   stopping FETCH being called twice on tainted data, and, to defend
   accusations that he only fixes bugs without adding new functionality,
   added the ever-useful panic operator.
   
   Tony Bowden asked what would happen to the my Dog $sam declaration now
   pseudohashes are dead. The responses were: i) pseudohashes aren't dead
   yet, they just smell that way, and ii) nothing, since the fields
   pragma would still do the trick.
   
   The old more-than-256-files-open-on-Solaris question came up again:
   this time, though, we had a sensible answer. PerlIO can be used to get
   around the Solaris limitation. Alan explained that there was a section
   in perlsolaris about the limitation, which was due to the extreme
   backwards compatibility of Solaris harking back to a time before it
   could conceive of more than 256 files.
   
   Someone discovered that you can tie a variable with an object. The
   utility of this was debated, and Schwern conceded that there'd be no
   harm in documenting it. That'd be a nice little small task for
   someone.
   
   Andy Dougherty dropped in a couple of BSD patches, to the Makefile
   generation and the hints file. Paul Johnson made B::Concise recognise
   padops.
   
   James Duncan allowed BEGIN blocks to be more visible from B. Robin
   Houston asked whether or not we should move PL_minus_c from a bool to
   a U8 to give us more flexibility; apparently we currently to
   (PL_minus_c & 0x10) which, as Robin points out, is a "rather wrong
   thing to do to a bool". I wonder when the Perl core was formally
   forbidden from doing wrong things.
   
   Until next week I remain, your humble and obedient servant,
     _________________________________________________________________
   
   Simon Cozens

References

   1. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-08/msg00614.html
   2. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-08/msg00615.html
   3. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-08/msg00668.html
   4. http://www.unicode.org/unicode/reports/tr10/
   5. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-08/msg00530.html
   6. http://testers.cpan.org/
   7. mailto:datetime-subscribe@perl.org



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About