Tom Christiansen writes: : [Very long explanation of prospective parsing approach for 5.6 elided] : : > 3) Perl runs into a high bit in your script. At that point it : > takes a look at what it has in its buffer. If it looks like : > utf8, mark the script filehandle as utf8 and continue. If not, : > mark the script filehandle as binary (equivalent to latin-1) : > and continue. : : Does this mean that we'll be able to use, for example, %déjà_vu : now, without any other special indications? I think so. At this point in my existence I don't think we need to distinguish variable names from string literals, as far as recognition of the binary/utf8 distinction goes. : Or will some LC_* envariable like to be set? Or a pragma? You can use the bytes or charset pragmas I mentioned to force the issue in the binary direction. I believe the linux-utf8 mailing list folks would assume that if LC_CTYPE is set to UTF-8, Perl should assume its script is in UTF-8, though I don't know how universal that sentiment will become. The Linux folks are assuming they can just cut everything over to UTF-8 at some point, and life tends to be a little more complicated than that. Larry