develooper Front page | perl.perl5.porters | Postings from October 2017

Re: source encoding

Thread Previous | Thread Next
From:
Father Chrysostomos
Date:
October 25, 2017 21:23
Subject:
Re: source encoding
Message ID:
20171025212343.28494.qmail@lists-nntp.develooper.com
Zefram wrote:
> But inside the text of the code file is also the wrong
> place: in principle that's read too late,

In practice, it only matters if perl supports EBCDIC-based and ASCII-
based encodings on the same OS.

Well, there is also the issue of UTF-16 and UTF-32, but there is
already code for handling UTF-16 in toke.c, though I have never
tested it.

> and the mechanisms we've tried
> with pragmata yield the wrong scope.  Switching encoding on lexical
> boundaries could in theory be made to work, but that's not how files
> are written, and it sits uneasily with having a buffer of supposedly
> decoded text waiting to be parsed.

Currently source filters are applied based on line boundaries, not on
lexical scope (unless they implement the latter themselves, which is
somewhat irrelevant here).

However, they look the same as pragmata: use foo;

> So the only viable approach that's not incredibly difficult

I think my suggestion above is also viable.

> is to
> remove the need for an encoding declaration, by making the encoding
> the same for all code files.

I suspect that changing the default will cause too much breakage and
be deemed unacceptable.  If so, then we could make 'use utf8' act like
a source filter.  It does not need to be implemented as one, but it
could apply to the end of the file, regardless of how we implement it.

> The fixed encoding would, of course,
> be UTF-8.

What about on EBCDIC systems?

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About