Larry Wall <larry@wall.org> writes: > : - under 'use utf8', hibit chars that are illegal utf8 are encoded > : using utf8; basically automatically turns latin1 into utf8. > : This ensure that there will never be illegal UTF8 sequences in > : a literal string that has the UTF8 flag set. > > I know I originally put in the comment, "could cvt latin-1 to utf8 > here", but I'm currently thinking that if a file has utf8 mixed with > latin-1, it's probably already in serious trouble by the time it gets > to the latin-1, so it probably better croak. Especially if the > filehandle was implicitly put into utf8 mode by thinking it saw utf8 > earlier, when in fact it only saw bizarre latin-1. The better approach > is to make them go back and insert "use charset 'latin-1'" or some such > at the beginning. I think we should revise this when line disiplines are up working. > : - Octal escapes like \400 and \777 will actually do the right thing now. > : Previously you only got the low 8-bits. > > Hmm. An argument could be made that those should be illegal, though > I don't know that I want to make it. It is at least trivial to make the new code croak on values > 0377 when/if we decide that is the best thing to do. Ilya's perl -0777 argument is a good enough reason to make me favour croaking. > : But, it still looks like the \N{} support will not work as it is > : now. It never sets the UTF8 flag on the string by itself. > > Well, it should resolve to a character that's either above \xFF or not, > so it seems conceptually simple. This would have been simple if we made charnames::charnames() return a number instead of a string. The string return is more general as it allows \N{} to expand into longer sequences. I guess scan_const() should just look for the UTF8 flag on the string returned by charnames() and then propagate it. > But thanks! It's easy to sit on the sidelines and carp, but we need more > real code whackers like you. Thank you! Regards, Gisle