develooper Front page | perl.perl5.porters | Postings from February 2000

Re: [PATCH] Improved hibit text literals

Larry Wall
February 10, 2000 18:35
Re: [PATCH] Improved hibit text literals
Message ID:
Gisle Aas writes:
: This patch relative to 5.5.650 makes perl do the right thing for
: literals containing hibit charactets.  The follwing behaviour will
: change if you apply this patch:
:     - a \x{} escape will not force the UTF8 flag on, unless the value
:       is acutally higher than \xFF.


:     - the "\xff will produce malformed UTF-8 character; use \x{ff}"
:       warning is gone, since we now always do the right thing :-)


:     - under 'use utf8', hibit chars that are illegal utf8 are encoded
:       using utf8; basically automatically turns latin1 into utf8.
:       This ensure that there will never be illegal UTF8 sequences in
:       a literal string that has the UTF8 flag set.

I know I originally put in the comment, "could cvt latin-1 to utf8
here", but I'm currently thinking that if a file has utf8 mixed with
latin-1, it's probably already in serious trouble by the time it gets
to the latin-1, so it probably better croak.  Especially if the
filehandle was implicitly put into utf8 mode by thinking it saw utf8
earlier, when in fact it only saw bizarre latin-1.  The better approach
is to make them go back and insert "use charset 'latin-1'" or some such
at the beginning.

:     - Octal escapes like \400 and \777 will actually do the right thing now.
:       Previously you only got the low 8-bits.

Hmm.  An argument could be made that those should be illegal, though
I don't know that I want to make it.

: But, it still looks like the \N{} support will not work as it is
: now. It never sets the UTF8 flag on the string by itself.

Well, it should resolve to a character that's either above \xFF or not,
so it seems conceptually simple.  But I have to confess to not
understanding the \N code at all:

    print "\N{WHITE SMILING FACE}";


    constant(\N{...}): %^H is not localized at - line 2, within string

Talk about obscure error messages!  I think it means that \N will need
to be taught about pulling in the Unicode names by default.  Previously,
I think it assumed the Unicode names would come in with a "use utf8",
but that's going away, so we need to make it the default if \N doesn't
otherwise recognize its name, I imagine.

But thanks!  It's easy to sit on the sidelines and carp, but we need more
real code whackers like you.

Larry Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About