develooper Front page | perl.perl5.porters | Postings from February 2012

[perl #109828] PerlIO::scalar does not handle UTF-8

Thread Next
From:
Father Chrysostomos via RT
Date:
February 12, 2012 14:02
Subject:
[perl #109828] PerlIO::scalar does not handle UTF-8
Message ID:
rt-3.6.HEAD-14510-1329084169-1968.109828-15-0@perl.org
On Mon Feb 06 07:19:37 2012, xdaveg@gmail.com wrote:
> Then when something wants to use that string as a source of bytes,
> should Perl (a) just dump out whatever bytes it uses internally for
> its implementation?  Or (b) should it convert the internal
> representation to some standard representation?  Or (c) should it blow
> up?

(a) is what Perl currently does, as Leon Timmerman said.

By (b) I presume you mean to treat \xff as \xff regardless of how it is
stored internally, which makes sense.

But what happens if I open a reading handle to a scalar containing
\x{100}?  Here we have a choice between (b) and (c).

An in-memory scalar could be considered a byte stream.  Or it could just
be considered a string of characters.

The latter does make some sense.  If I print \xff to an in-memory file
with no layers applied, I simply get \xff in my scalar.  So if I print
\x{100}, it would make sense to get \x{100} in my scalar, no?  But if
the scalar is considered byte-sized, I should get \x{100} utf8-encoded,
accompanied by a wide character warning; and reading a scalar with
\x{100} would croak.

That it is currently buggy is not being questioned.  But which model
should be followed in fixing it is debatable.  Would it be reasonable to
implement the byte-sized version for now and upgrade it later?

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=109828

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About