develooper Front page | perl.perl5.porters | Postings from February 2000

Re: [PATCH] Improved hibit text literals

From:
Gisle Aas
Date:
February 11, 2000 03:49
Subject:
Re: [PATCH] Improved hibit text literals
Message ID:
m3ael89cyu.fsf@eik.g.aas.no
Larry Wall <larry@wall.org> writes:

> :     - under 'use utf8', hibit chars that are illegal utf8 are encoded
> :       using utf8; basically automatically turns latin1 into utf8.
> :       This ensure that there will never be illegal UTF8 sequences in
> :       a literal string that has the UTF8 flag set.
> 
> I know I originally put in the comment, "could cvt latin-1 to utf8
> here", but I'm currently thinking that if a file has utf8 mixed with
> latin-1, it's probably already in serious trouble by the time it gets
> to the latin-1, so it probably better croak.  Especially if the
> filehandle was implicitly put into utf8 mode by thinking it saw utf8
> earlier, when in fact it only saw bizarre latin-1.  The better approach
> is to make them go back and insert "use charset 'latin-1'" or some such
> at the beginning.

I think we should revise this when line disiplines are up working.

> :     - Octal escapes like \400 and \777 will actually do the right thing now.
> :       Previously you only got the low 8-bits.
> 
> Hmm.  An argument could be made that those should be illegal, though
> I don't know that I want to make it.

It is at least trivial to make the new code croak on values > 0377
when/if we decide that is the best thing to do.  Ilya's perl -0777
argument is a good enough reason to make me favour croaking.

> : But, it still looks like the \N{} support will not work as it is
> : now. It never sets the UTF8 flag on the string by itself.
> 
> Well, it should resolve to a character that's either above \xFF or not,
> so it seems conceptually simple.

This would have been simple if we made charnames::charnames() return a
number instead of a string.  The string return is more general as it
allows \N{} to expand into longer sequences.  I guess scan_const()
should just look for the UTF8 flag on the string returned by
charnames() and then propagate it.

> But thanks!  It's easy to sit on the sidelines and carp, but we need more
> real code whackers like you.

Thank you!

Regards,
Gisle



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About