On 8/6/21 5:34 PM, Aaron Priven wrote: >> On Aug 4, 2021, at 4:33 PM, Dan Book <grinnz@gmail.com >> <mailto:grinnz@gmail.com>> wrote: >> They are "text by default" in the ASCII sense, not the Unicode sense. >> The :crlf layer is enabled by default on Windows and translates CR LF >> to LF, but there is no default translation of bytes to characters. So >> you need to use binmode or :raw to make a filehandle binary-compatible >> on Windows, but you also need to apply an :encoding layer if you want >> to read/write characters instead of bytes. > > I don’t think it’s true that there’s no default treatment of bytes as > characters. By default, perl treats bytes as Latin-1. So if you open a > file without an encoding layer, read some data, and then output it to a > file opened with an encoding layer, that encoding layer will assume that > the data being output is in Latin-1, and convert that to characters > accordingly. Perl doesn't treat bytes as Latin-1 by default. It treats non-ASCII-range bytes as not being in any character set. All such match \W in patterns, for example, and uc etc return the input unchanged. Feature unicode-strings is necessary to get a Latin-1 treatment, or converting to UTF-8. > > So an open filehandle is, in perl, a text filehandle using encoding > Latin-1, unless a layer or binmode is used. It wouldn’t be unreasonable > to decide that, in some future version of perl, an open filehandle would > be treated as a text filehandle using encoding UTF-8 instead. > > The problem, of course, is that on some but not all operating systems > the text filehandle returned by open can be used as a binary filehandle > without loss. /Conceptually/ it’s a text filehandle, the meaning in the > perl language is that it’s a text filehandle, but people misuse it as a > binary one because there’s no actual breakage, as long as it’s only > running on those operating systems. > > So I understand that in practice, it might well be more trouble than > it’s worth to have “use 5.036” or even “use v7” make perl default to > text filehandles using the UTF-8 encoding, instead of defaulting to text > filehandles using the Latin-1 encoding. But I think it’s worth considering. > > -- > Aaron Priven, aaron@priven.com <mailto:aaron@priven.com>, > www.priven.com/aaron <http://www.priven.com/aaron>Thread Previous | Thread Next