develooper Front page | perl.perl5.porters | Postings from February 2000

Re: [PATCH] Improved hibit text literals

From:
Gisle Aas
Date:
February 11, 2000 03:49
Subject:
Re: [PATCH] Improved hibit text literals
Message ID:
m3emak9dls.fsf@eik.g.aas.no
Ilya Zakharevich <ilya@math.ohio-state.edu> writes:

> Gisle Aas writes:
> >     - under 'use utf8', hibit chars that are illegal utf8 are encoded
> >       using utf8; basically automatically turns latin1 into utf8.
> >       This ensure that there will never be illegal UTF8 sequences in
> >       a literal string that has the UTF8 flag set.
> 
> Hmm???  Please no DWIM here.  Programmers would like to know what
> their string literals mean.  Or did I misunderstand you?

We have two options here.  Either to croak or to convert.  After my
patch the last thing happens.

$ ./perl -MDevel::Peek -e 'Dump("å")'
SV = PV(0x817de08) at 0x8156028
  REFCNT = 1
  FLAGS = (POK,READONLY,pPOK)
  PV = 0x815c088 "\345"\0
  CUR = 1
  LEN = 2

$ ./perl -MDevel::Peek -e 'use utf8; Dump("å")'
Malformed UTF-8 character at -e line 1.
SV = PV(0x81563b4) at 0x816009c
  REFCNT = 1
  FLAGS = (POK,READONLY,pPOK,UTF8)
  PV = 0x817b480 "\303\245"\0
  CUR = 2
  LEN = 3

This makes sure that you can assume PV point to a valid UTF8 string if
the UTF8 flag is set.  As you can see there is also a warning
generated.  When real line disciplines are in place I guess these
illegal sequences will never happen in S_scan_const().


> >     - Octal escapes like \400 and \777 will actually do the right thing now.
> >       Previously you only got the low 8-bits.
> 
> This always bothered me.  perl -0777?

Currently the -0 option is special cased so that any number greater
than -0377 will set $/ to undef.  Since this is documented and relied
on, this might be a good reason to make string literals like "\777"
croak instead of setting up incompatible expectations for the -0
option.  IMHO, then old "truncate to 8-bit"-behaviour for octal
escapes must anyway go.

But, it would be cool to be able to process a file with Unicode line
separators using:

   perl -020050 -pe '...'

Regards,
Gisle



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About