develooper Front page | perl.perl5.porters | Postings from October 2016

[perl #129298] [PATCH] Update documentation about UTF-8

Thread Previous
Tony Cook via RT
October 11, 2016 05:02
[perl #129298] [PATCH] Update documentation about UTF-8
Message ID:
On Mon Sep 19 09:27:54 2016, wrote:
> On Sunday 18 September 2016 16:27:48 James E Keenan via RT wrote:
> > 1. The Encode library is "cpan upstream," i.e., it is primarily
> > maintained on CPAN.  Hence, requests for changes in its documentation
> > -- your patches 0008, 0009, 0010 -- should be filed via bug-
> > or via the web interface at
> >
> >
> > 2. Because at least 7 different files are touched by the patches
> > attached to this ticket, I think we should get multiple eyeballs on
> > them.  Paging our experts on Unicode and IO layers!
> Ok! Anyway, all changes are only to documentation sections so other
> people could look at it too. There is no code change.
> And Encode patches are there too as they are referenced by core perl
> pod
> files. So before sending them to cpan upstream it could be great if
> you
> can review them too...


@@ -280,7 +280,7 @@ Files opened without an encoding argument will be in UTF-8:
      $ export PERL_UNICODE=D
-     use open qw(:utf8);
+     use open qw(:encoding(UTF-8));
 =head2 ℞ 18: Make all I/O and args default to utf8
Unfortunately this makes the examples no longer equivalent.


@@ -3764,8 +3764,8 @@ many elements these have.  For that, use C<scalar @array> and C<scalar keys
 Like all Perl character operations, L<C<length>|/length EXPR> normally
 deals in logical
 characters, not physical bytes.  For how many bytes a string encoded as
-UTF-8 would take up, use C<length(Encode::encode_utf8(EXPR))> (you'll have
-to C<use Encode> first).  See L<Encode> and L<perlunicode>.
+UTF-8 would take up, use C<bytes::length(EXPR)> (you'll have to
+C<use bytes ()> first).  See L<C<use bytes>|bytes> pragma and L<perlunicode>.
 =item __LINE__

This is just plain incorrect.  Whether the length returned by bytes::length() is the UTF-8 encoded length depends on the internal encoding of the string:

$ perl -Mbytes -MEncode -le '$x = "\xA0"; print bytes::length $x; print length Encode::encode("UTF-8", $x)'


+C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see
+L</What's the difference between UTF-8 and utf8?> under.

"under" what?  This would normally be "below" instead, I think.


A string of what is the issue.  Maybe C< $characters > instead of
C< $string >, but that's more Dan's decision.


via perlbug:  queue: perl5 status: open

Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About