develooper Front page | perl.perl5.porters | Postings from July 2011

Re: Solving the *real* Dot Problem

Thread Previous | Thread Next
From:
Johan Vromans
Date:
July 7, 2011 00:19
Subject:
Re: Solving the *real* Dot Problem
Message ID:
m2y60ay2yi.fsf@phoenix.squirrel.nl
Tom Christiansen <tchrist@perl.com> writes:

> The Dot Problem will never be solved until people start thinking in
> Unicode not ASCII. Otherwise you’ll “solve” the “wrong” “problem”.

Not quite. I think you had the tiger by the tail one sentence earlier:

> […] let’s please step back and evaluate the original sense of “.” […]

This is what matters. What is the intended purpose of “.”?

Originally, the intention was to be able to match ‘lines’ in a blob of
data slurped from a disk file. Files at the time were newline separated
streams of single-byte characters, so “.” matched any byte except \x0a
(newline). That this assumption would not hold in the longer term became
apparent when Windows, Mac, VMS and EBCDIC files came into
consideration.

Should we want our new “.” to acquire the originally intended meaning,
we first must decide what makes up lines in files in the modern Unicode
world.

The same applies to many other ‘classical’ regex symbols.

However, in my personal opinion, Perl should not try to shoehorn Unicode
matching into the classical and too primitive regular expression
patterns, but provide a revolutionary new, sane, matching facility.

Perl6 rules/patterns may be the solution but I fear even they are too
strongly funded in ancient roots.[1]

-- Johan

[1] See, e.g., the Perl6 Regex FAQ
(https://github.com/perlpilot/perl6-docs/blob/master/intro/p6-regex-intro.pod)

Within the first page it describes “.” to match any character (well,
this could still be ok). Then \t matches “a tab character” (a tab? any
tab?) and, shudder, [ a..d ] that “matches one of "a", "b", "c", or "d".”

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About