develooper Front page | perl.perl5.porters | Postings from July 2011

Re: Solving the *real* Dot Problem (was: Is 5.16 the time toremove \N, the complement of \n, from being experimental?)

Thread Previous | Thread Next
From:
Abigail
Date:
July 7, 2011 02:50
Subject:
Re: Solving the *real* Dot Problem (was: Is 5.16 the time toremove \N, the complement of \n, from being experimental?)
Message ID:
20110707095158.GB4957@almanda
On Wed, Jul 06, 2011 at 07:33:17PM -0600, Tom Christiansen wrote:
> THESIS:  Perl’s /./ is fundamentally broken, be it (?s:.) or (?-s:.).
>          It’s long past time we fast‐forward a few decades in how we 
>          think about all this.
> 
> SUMMARY: Perl needs to stop making it so easy to do the wrong thing here,
>          and instead start making it easy to do the right thing.  Let’s 
>          stop wasting time/brain/etc diddling around with a 1980s‐style
>          ASCII solution in our Unicode world of the 2010s and beyond!


While Unicode is possible, almost all data I'm applying regexes to is
ASCII data. I use /./ all the time, and for me, it just works. Where it
doesn't, /(?s:.)/ does. /./ and /(?s:.)/ even works fine if I have mostly
ASCII data with some Unicode characters or words thrown in.

Full blown Unicode, which uses stuff where /./ or /(?s:.)/ won't work, 
I've yet to have the need to parse it. 

IMO, having match /./ anything but (the default) $/ is extremely handy and
useful.

> 
> Zsbán Ambrus <ambrus@math.bme.hu> wrote on Wed, 06 Jul 2011 20:58:57 +0200: 
> 
> > On Wed, Jul 6, 2011 at 1:48 AM, Jesse Vincent <jesse@fsck.com> wrote:
> 
> > [On the new \N regex escape that matches any one character except \n.]
> 
> >> Is it being used? (Are folks cpanning modules that use it?)
> 
> > It may get more use once perl 5.14 spreads, because there you can 'use
> > re "/s";' to make the dot have the more useful meaning and then \N has
> > the occasionally useful meaning.  Further, if 'use 5.016;' enabled
> > 'use re "/s";' by default, it would see even more use.
> 
> Upgrading the status of \N from experimental to something more solid is
> a timely and necessary, but sadly insufficient step, toward solving the
> Dot Problem.  Diddling around with . and \N and such ignores the *real*
> issue: that those are ASCII thingamaboogers — but Perl needs Unicode ones.
> 
> 
> By the Dot Problem, I mean a regex metacharacter matching just “one” of 
> “anything”, for a broad sense of anything but a narrow sense of one.
> 
>  {  NB: I am not referring to a literal FULL STOP nor its 3 other NFKD or
>         \p{SB=AT} aliases, let alone the \p{SB=ST} stuff.   Use NFKD eq “.”
>         or /\p{SB=AT}/ if that’s the sort of literal dot you want.  }
> 
> Here are 5 possible meanings for dot.  I start with the original and *LEAST
> USEFUL OF ALL POSSIBLE MEANINGS*, and progress to the most useful ones, the
> ones that I think people should usually be using these days:
> 
>     1 = no  re /s       (traditional and annoying)
>     2 = use re /s       (necessary but insufficient)
>     3 = \V              (improved #1)
>     4 = \X              (improved #2)
>     5 = \X unless \R    (improved #2, #3)
> 
> See?  How often do you guys write the *wrong* one of those?  

Never.



Abigail

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About