Ilya Zakharevich <ilya@math.ohio-state.edu> writes: > Gisle Aas writes: > > - under 'use utf8', hibit chars that are illegal utf8 are encoded > > using utf8; basically automatically turns latin1 into utf8. > > This ensure that there will never be illegal UTF8 sequences in > > a literal string that has the UTF8 flag set. > > Hmm??? Please no DWIM here. Programmers would like to know what > their string literals mean. Or did I misunderstand you? We have two options here. Either to croak or to convert. After my patch the last thing happens. $ ./perl -MDevel::Peek -e 'Dump("å")' SV = PV(0x817de08) at 0x8156028 REFCNT = 1 FLAGS = (POK,READONLY,pPOK) PV = 0x815c088 "\345"\0 CUR = 1 LEN = 2 $ ./perl -MDevel::Peek -e 'use utf8; Dump("å")' Malformed UTF-8 character at -e line 1. SV = PV(0x81563b4) at 0x816009c REFCNT = 1 FLAGS = (POK,READONLY,pPOK,UTF8) PV = 0x817b480 "\303\245"\0 CUR = 2 LEN = 3 This makes sure that you can assume PV point to a valid UTF8 string if the UTF8 flag is set. As you can see there is also a warning generated. When real line disciplines are in place I guess these illegal sequences will never happen in S_scan_const(). > > - Octal escapes like \400 and \777 will actually do the right thing now. > > Previously you only got the low 8-bits. > > This always bothered me. perl -0777? Currently the -0 option is special cased so that any number greater than -0377 will set $/ to undef. Since this is documented and relied on, this might be a good reason to make string literals like "\777" croak instead of setting up incompatible expectations for the -0 option. IMHO, then old "truncate to 8-bit"-behaviour for octal escapes must anyway go. But, it would be cool to be able to process a file with Unicode line separators using: perl -020050 -pe '...' Regards, Gisle