Front page | perl.perl5.porters |
Postings from October 2017
Re: source encoding
From: Father Chrysostomos
October 25, 2017 21:09
Re: source encoding
Message ID: email@example.com
> Smylers wrote:
> >Neither of these are helpful to a user who was happy with the way their
> >non-Ascii characters were working until they were told to declare that
> >they are using UTF-8,
> Ah, here's your error. The deprecation message should absolutely not
> advise people to declare that they are using UTF-8, because *they're not*.
> It needs to advise that the program be converted to a supported encoding,
> either ASCII using \x escapes or to UTF-8. Only after converting the
> source to UTF-8 is adding the "use utf8" declaration correct. This is
> something we'd have to be clear about in documentation: conversion to
> UTF-8 is not just a matter of adding the pragma.
Before perl supported Unicode, that was the most obvious way to handle
utf8 correctly in perl: write your code (your literal strings) in
utf8, and make sure that the input is also in utf8. I have code writ-
ten that way that predates good Unicode support in perl. I also con-
tinue to write code in that style, because the code runs faster and
avoids buggy edge cases that often crop up.
The fact that perl has always just passed the data through I have
always considered a feature. And it is never a problem if the source
code and the input data are in the same encoding (something I can
guarantee in my case).
If this stops working, I might find it annoying enough to stick with
an old perl for 'real' work.
> No, the handling of non-ASCII filenames is another aspect of program
> behaviour, not an aspect of source encoding. It's a thing that we
> definitely need to make some changes about, but it's separate from this,
> and a big topic in its own right. We also need to be careful to avoid
> churn in filename semantics: when we eventually change them we need to
> be pretty sure that we're changing them to the right permanent semantics.
I would hope that we could solve this before we make any significant
changes in the way source encoding is handled, precisely because
changes in source encoding might change the internal representation of
strings, which currently (buggily) affects system calls.
> With "=encoding"
> being more workable in practice
Maybe this is an alternative solution to enforcing a uniform encoding
on all source code: Make =encoding affect the source code. (Just an
idea; maybe not a good one.)
Also, have you considered the existing support perl has for UTF-16?