On 2021-08-06 8:22 a.m., Ricardo Signes wrote: > At the PSC, we had a long talk about this, and another proposal was made: > > We introduce a new stricture, which I'll call "source_encoding". Under "use > strict 'source_encoding'", the compiler will raise an exception when the source > contains non-ASCII content unless the utf8 pragma is in effect. The error > raised can drive the programmer to documentation explaining the various > trade-offs. That is: you can turn on utf8 and deal with how this affects your > I/O, or you can disable the stricture, or you can restate your non-ASCII content > as ASCII by using escaping constructs. > > I'm not /sure/ this is an improvement, but I think it is. This prevents the "I > forgot to add utf8 and so only discovered after runtime that I have > doubly-encoded my output" bug. +1 Personally I feel that this change is a great improvement, assuming I understand it right. So just to be clear, when you say ASCII, you mean pure 7-bit ASCII, which is a proper subset of both UTF-8 and all the Latin encodings, and thus any source files written in that will "just work" in both the most common Unicode AND non-Unicode environments. Would your new on as part of use 5.36 stricture then be failing every source file that has any octet with a 1 in the 8th bit when that file doesn't also have an explicit declaration of source encoding? Because that is what I would expect given what you said. For my part, I expressly designed my portable data format MUON https://github.com/muldis/Muldis_Object_Notation/blob/master/spec/Muldis_Object_Notation_Syntax_Plain_Text.md so that the non-7-bit-ASCII character repertoire is forbidden literally in a file except within quoted character string literals, and so one can parse everything outside the quoted strings, the actual document structure, completely without even having to know what the encoding is (it can be done in binary mode), at least between UTF-8 vs Latin etc (and even for encodings that aren't), and decoding the inside of strings is deferrable. -- Darren DuncanThread Previous | Thread Next