develooper Front page | perl.perl5.porters | Postings from November 2008

Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
November 12, 2008 17:44
Subject:
Re: PATCH [perl #59342] chr(0400) =~ /\400/ fails for >= 400
Message ID:
12210.1226540605@chthon
Glenn Linderman <perl@NevCal.com> wrote:

| On approximately 11/12/2008 7:03 AM, came the following characters
| from the keyboard of Rafael Garcia-Suarez:

 ««  However, I see some value in still allowing [\000-\377] character  »» 
 ««  ranges, for example. Do we really want to deprecate that as well?  »» 
 ««  This doesn't seem necessary.                                       »» 

| The [below] items could be added to the language immediately, during the
| deprecation cycle for \nnn octal notation, giving people an extremely
| simple way to convert their octal constants: inside of strings/regices,
| insert o after \ and wrap the digits with {}; outside of strings/regices,
| insert o after leading 0.

I have obliged to change my Perl code at three and *only* three 
instances over the last TWENTY-ONE YEARS:

    1) When log() became a keyword, the inverse of exp().  If parens 
       had been mandatory on function calls of no arguments, a very wise
       practice, this wouldn't have been a problem.  This was in 1988,
       or perhaps 1989.

    2) When perl5 made arrays interpolate in "@strings" unconditionally.
       This was in 1994.  This was the right thing to do.

    3) When perl5.010 finally blew away $* (and $#). This too was the
       right thing to do.  This was early this year, 2008, and it
       was in the following singleton program, written before /m existed:

        #!/usr/local/bin/perl
        $/ = '';
        while (<>) {
            #$* = 1;
            s/^-- ?$//m if eof;
            s/^[-+]{2}\w+$//m if eof;
            next unless split(/\n/);
            $max = 0;
            #$* = 0;
            for (@_) {
                1 while s/\t+/' 'x (length($&) * 8 - length($`) % 8)/e;
                $max = ($max > length) ? $max : length;
            }
            $edge = "+" . "-" x ($max+2) . "+\n";
            print $edge;
            for (@_) { printf "| %-${max}s |\n", $_; }
            print $edge, "\n";
        }

I find the notion of rendering illegal the existing octal syntax of "\33"
is an *EXTRAÖRDINARILY* bad idea, a position I am prepared to defend at
laborious length--and, if necessary, appeal to the Decider-in-Chief, who's
always done everything possible to *NOT* break others' code without *VERY*
*STRONG* reason.  I submit that that very high bar has *NOT* been met; far
from it.  I'm rather hoping I shan't have to do any of that, but I certainly
shall if I must.

There's no reason at all to delete it: because regexes have \g{1} now, and
strings need never be written "\333" if you mean "\33" . "3".

There is GREAT reason *not* to delete it, as the quantity of code you would
see casually rendered illegal is incomprehensibly large, with the work
involved in updating code, databases, config files, and educating
programmers and users incalculable great.  To add insult to injury, this 
work you would see thrust upon others, not taken on yourself.

There is nothing fundamentally broken here, as there was for $*. This is
trying to create a language where it is impossible to "think bad thoughts".
One cannot succeed at that.

| I personally see no value in octal notation now that Unicode uses hex,
    ^^^^^^^^^^                                                 ^^^^^^^^
Good to see the prefatory warning that this your *personal* view. :-)
                                                               vvvvvvvv
As for "Unicode using hex", me, I've always thought of it as using bits.
Rather, I think of the various standards specifying code points in the
U+XXXXXX notation to mean code point at that hexadecimal number.  Not
the same thing at all.  That why I always write

    sub uchar(_) { pack( "U*", shift() ) }

because that way all of these 

    say "chr $_ is " => uchar for 181,  223, 231, 240, 241, 254;
    say "chr $_ is " => uchar for 0265,0337,0347,0360,0361,0376;
    say "chr $_ is " => uchar for 0xb5,0xdf,0xe7,0xf0,0xf1,0xfe;
    say "chr $_ is " => uchar for 0b10110101,0b11011111,0b11100111,
                                  0b11110000,0b11110001,0b11111110;

correctly say:

    chr 181 is µ
    chr 223 is ß
    chr 231 is ç
    chr 240 is ð
    chr 241 is ñ
    chr 254 is þ

and similarly

    say "uc ", uchar, " is ", uc uchar 
        for 181, 0xDF, 0347, 3*2**4*5, 0361, 0b11111110;

says 

    uc µ is M
    uc ß is SS
    uc ç is Ç
    uc ð is Ð
    uc ñ is Ñ
    uc þ is Þ

Because I'd be really annoyed if 

    sub uchar(_) { pack( "U*", hex shift() ) }
    say "chr $_ has ord " => ord uchar for 181, 223, 231, 240, 241, 254;

were giving me answers like:

    chr 181 has ord 385
    chr 223 has ord 547
    chr 231 has ord 561
    chr 240 has ord 576
    chr 241 has ord 577
    chr 254 has ord 596

| and most programmers are familiar with it.   [···] I daresay that hex
| is about the second thing most programmers learn, these days.  "This
| is a computer... this is hexadecimal numbering system... there are
| lots of computer languages..."

Hm, ok. If you say so.  Hadn't noticed it myself.

| Another approach would be to change the escape from \nnn to
| \o{nnnnn...} [···] The {} provide explicit delimiters, so octal
| numbers could then achieve parity with hex in the range of numbers
| available. If people think octal is still worth supporting, this looks
| like a better syntax to support it wholeheartedly.

That's not needed, unless you really want to promote octal for 
Unicode strings.  In a pattern, \g{1} now handles the situation
you're talking about.  For DQ-strings, one can always avoid it.

Type "man ascii"; note that the table given first is octal.

| Python 3.0 has moved to 0onnnnn for its octal integers (zero oh digit-
| sequence) after concluding that leading zeros alone are just too
| problematical, so the "o" indicator has a precedent (albeit recent) in
| addition to reasonably intuitively meaning octal to anyone that
| understands the hexadecimal notation and has ever heard of octal. The
| 0o syntax could also be added to Perl integer constants outside of
| strings/regices.

My only trouble with the 0o notation is on fonts without cross 0's,
and its gratuitous superfluousness.

--tom

--

    +------------------------------------------------------------+
    | SINGULAR                                PLURAL             |
    +-------------+----------------------------------------------+
    | NOMINATIVE  |   magnus rex              magni    reges     |
    | VOCATIVE    |   magne  rex              magni    reges     |
    | GENITIVE    |   magni  regis            magnorum regum     |
    | ACCUSATIVE  |   magnum regem            magnos   reges     |
    | DATIVE      |   magno  regi             magnis   regibus   |
    | ABLATIVE    |   magno  rege             magnis   regibus   |
    | LOCATIVE    |   magni  regi (or rege)   magnis   regibus   |
    +-------------+----------------------------------------------+

% man ascii

ASCII(7)                   OpenBSD Reference Manual                   ASCII(7)

NAME
     ascii - octal, hexadecimal and decimal ASCII character sets

DESCRIPTION
     The octal set:

     000 nul  001 soh  002 stx  003 etx  004 eot  005 enq  006 ack  007 bel
     010 bs   011 ht   012 nl   013 vt   014 np   015 cr   016 so   017 si
     020 dle  021 dc1  022 dc2  023 dc3  024 dc4  025 nak  026 syn  027 etb
     030 can  031 em   032 sub  033 esc  034 fs   035 gs   036 rs   037 us
     040 sp   041  !   042  "   043  #   044  $   045  %   046  &   047  '
     050  (   051  )   052  *   053  +   054  ,   055  -   056  .   057  /
     060  0   061  1   062  2   063  3   064  4   065  5   066  6   067  7
     070  8   071  9   072  :   073  ;   074  <   075  =   076  >   077  ?
     100  @   101  A   102  B   103  C   104  D   105  E   106  F   107  G
     110  H   111  I   112  J   113  K   114  L   115  M   116  N   117  O
     120  P   121  Q   122  R   123  S   124  T   125  U   126  V   127  W
     130  X   131  Y   132  Z   133  [   134  \   135  ]   136  ^   137  _
     140  `   141  a   142  b   143  c   144  d   145  e   146  f   147  g
     150  h   151  i   152  j   153  k   154  l   155  m   156  n   157  o
     160  p   161  q   162  r   163  s   164  t   165  u   166  v   167  w
     170  x   171  y   172  z   173  {   174  |   175  }   176  ~   177 del

     The hexadecimal set:

     00 nul   01 soh   02 stx   03 etx   04 eot   05 enq   06 ack   07 bel
     08 bs    09 ht    0a nl    0b vt    0c np    0d cr    0e so    0f si
     10 dle   11 dc1   12 dc2   13 dc3   14 dc4   15 nak   16 syn   17 etb
     18 can   19 em    1a sub   1b esc   1c fs    1d gs    1e rs    1f us
     20 sp    21  !    22  "    23  #    24  $    25  %    26  &    27  '
     28  (    29  )    2a  *    2b  +    2c  ,    2d  -    2e  .    2f  /
     30  0    31  1    32  2    33  3    34  4    35  5    36  6    37  7
     38  8    39  9    3a  :    3b  ;    3c  <    3d  =    3e  >    3f  ?
     40  @    41  A    42  B    43  C    44  D    45  E    46  F    47  G
     48  H    49  I    4a  J    4b  K    4c  L    4d  M    4e  N    4f  O
     50  P    51  Q    52  R    53  S    54  T    55  U    56  V    57  W
     58  X    59  Y    5a  Z    5b  [    5c  \    5d  ]    5e  ^    5f  _
     60  `    61  a    62  b    63  c    64  d    65  e    66  f    67  g
     68  h    69  i    6a  j    6b  k    6c  l    6d  m    6e  n    6f  o
     70  p    71  q    72  r    73  s    74  t    75  u    76  v    77  w
     78  x    79  y    7a  z    7b  {    7c  |    7d  }    7e  ~    7f del

     The decimal set:

       0 nul    1 soh    2 stx    3 etx    4 eot    5 enq    6 ack    7 bel
       8 bs     9 ht    10 nl    11 vt    12 np    13 cr    14 so    15 si
      16 dle   17 dc1   18 dc2   19 dc3   20 dc4   21 nak   22 syn   23 etb
      24 can   25 em    26 sub   27 esc   28 fs    29 gs    30 rs    31 us
      32 sp    33  !    34  "    35  #    36  $    37  %    38  &    39  '
      40  (    41  )    42  *    43  +    44  ,    45  -    46  .    47  /
      48  0    49  1    50  2    51  3    52  4    53  5    54  6    55  7
      56  8    57  9    58  :    59  ;    60  <    61  =    62  >    63  ?
      64  @    65  A    66  B    67  C    68  D    69  E    70  F    71  G
      72  H    73  I    74  J    75  K    76  L    77  M    78  N    79  O
      80  P    81  Q    82  R    83  S    84  T    85  U    86  V    87  W
      88  X    89  Y    90  Z    91  [    92  \    93  ]    94  ^    95  _
      96  `    97  a    98  b    99  c   100  d   101  e   102  f   103  g
     104  h   105  i   106  j   107  k   108  l   109  m   110  n   111  o
     112  p   113  q   114  r   115  s   116  t   117  u   118  v   119  w
     120  x   121  y   122  z   123  {   124  |   125  }   126  ~   127 del

FILES
     /usr/share/misc/ascii

HISTORY
     An ascii manual page appeared in Version 2 AT&T UNIX.

OpenBSD 4.4                      May 31, 2007                                2

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About