On Sun, Oct 3, 2021, at 2:56 PM, Ricardo Signes wrote: > *ONE:* What's the end state we'd like to get to? > > *TWO:* What's a good next step, keeping in mind that we might not ever get past that next step? > > My take is this: The end state I'd like is that strings are in one of three states: declared text, declared bytes, unknown. Semantics exist for how to combine these and deal with I/O discipline. The source code is Unicode and string literals are assumed to be text. A new string literal syntax exists for byte strings, like `qb"..."`. > > For my money, a useful next step is that we encourage people to opt-in to "source code is unicode and string literals are text." This means that the programmer is then responsible for thinking about how this will affect their I/O. That concern is already there, we're just pushing around the complexity like a lump under the rug. I think this push is a good one. It lets us enable non-ASCII syntax, and it's pretty well understood. Also, we already have something for qb"...." in the form of "do { use bytes; qq{...} }" but we could probably add a qb, too, if we needed it. I want to bump this thread, noting: I filed a draft RFC <https://github.com/Perl/RFCs/pull/5> on this, and think it's good to move forward. (I think we can separate the question of "what utf8 do you get with *use utf8*" to future consideration and to make that consistent. I don't think there's a practical argument to be made that we should keep its current weirdness.) I do think that creating improvements for non-ASCII syntax is a compelling step we can take in the near future, but for now, I would like to still have source encoding as a pragma like this, which can be made ASCII by default under use vX. -- rjbsThread Previous | Thread Next