develooper Front page | perl.perl5.porters | Postings from April 2009

[PATCH] code FIXES ++ docs ++ NEW test cases for Text::{Tabs,Wrap}

From:
Tom Christiansen
Date:
April 17, 2009 16:58
Subject:
[PATCH] code FIXES ++ docs ++ NEW test cases for Text::{Tabs,Wrap}
Message ID:
18532.1240012586@chthon
Please find attached below a tarball containing:

    lib/Text/TabsWrap/CHANGELOG
    lib/Text/Tabs.pm
    lib/Text/Wrap.pm
    lib/Text/TabsWrap/t/Tabs-ElCid.t [4801 bytes]
    lib/Text/TabsWrap/t/Wrap-JLB.t   [5553 bytes]

Here's what I did:

 0. started as a baseline with a git pull and build this morning,
    to v5.11.0 (GitLive-blead-887-g0a8c518*)

 1. fixed them so they do the right thing with combining characters

 2. documented these changes

 3. made sure they ran correctly on both v5.10 and v5.11

 4. wrote two new and substantive test suites for the new features
    (and just because I get bored with random data in test suites,
     the two test sets I wrote use famous passages from literature
     to work on)

 5. verified that these as well as all previous test units for those
    modules still ran correctly after my updates

I also discovered that iTerm on a Mac is hopeless for combining
characters, but that their own Terminal is fine with these.

Most curiously, xterm(1) may--or may not--work.

For example, under "XTerm/OpenBSD(234)", it does the right thing with
combining characters.  However, MacOS 10.5.6's xterm, which identifies
itself as "X.Org 6.8.99.903(235)", when fed identical arguments even
including font, it fails, and I've no idea why.

I did this all so I could fix Damian's Text::Autoformat accordingly.

But because Text::Autowrap uses Text::Tabs, I had to do that first.

Then I figured I shouldn't leave the TabsWrap pair that's included
in the standard perl distribution only half-done.

So I didn't.

I do *not* accept the argument that CORE utility modules are allowed to and
even expected to malfunction most incompetently when fed Unicode data, that
some specialized off-CPAN module need be called instead--if it even exists.

That's a lame cop-out from yesteryear, and it's just not right.

Too many programs out there say /./ or /[^\n] when they need to be saying
\X.  Then there's the nasty length() problem, not to mention pos() etc.
This is not altogether easy to deal with, and it *ought* to be.  Right now,
it's rather hard for trivial yet essential operations.  Easy things should
be easy: and I'm telling you that this one isn't--yet.

I didn't add \p{HYPHEN} or \p{DASH} etc to Text::Wrap, because I'm saving
the real work for Damian's module.  But those should be there, you know.

We're behind, but now a little bit less so with this update.  It reminds
me of when I had to fix all the filename modules that were misusing /./ and 
/$/.  This is something of the same class of trouble, and so I'm sure many 
other Text modules need similar modification to bring them to the Millennium.
Probably regexp ones, too.

Please see both the code and also the test suites enclosed below to better
understand what I'm talking about.  I didn't use /(?=\PC)\X/ even though
I wanted to.  BTW, running /(?=\pL)\X/ (if appropriate) is about a 10x
speed-up over large corpora compared with /\X/.  Isn't that interesting?

Enjoy.

--tom
-- 

		    "Patches speak louder than words."





nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About