develooper Front page | perl.perl5.porters | Postings from October 2016

[perl #129298] [PATCH] Update documentation about UTF-8

Thread Previous
From:
Tony Cook via RT
Date:
October 11, 2016 05:02
Subject:
[perl #129298] [PATCH] Update documentation about UTF-8
Message ID:
rt-4.0.24-22532-1476162171-722.129298-15-0@perl.org
On Mon Sep 19 09:27:54 2016, pali@cpan.org wrote:
> On Sunday 18 September 2016 16:27:48 James E Keenan via RT wrote:
> > 1. The Encode library is "cpan upstream," i.e., it is primarily
> > maintained on CPAN.  Hence, requests for changes in its documentation
> > -- your patches 0008, 0009, 0010 -- should be filed via bug-
> > Encode@rt.cpan.org or via the web interface at
> > https://rt.cpan.org/Dist/Display.html?Name=Encode.
> >
> > 2. Because at least 7 different files are touched by the patches
> > attached to this ticket, I think we should get multiple eyeballs on
> > them.  Paging our experts on Unicode and IO layers!
> 
> Ok! Anyway, all changes are only to documentation sections so other
> people could look at it too. There is no code change.
> 
> And Encode patches are there too as they are referenced by core perl
> pod
> files. So before sending them to cpan upstream it could be great if
> you
> can review them too...

0001:

@@ -280,7 +280,7 @@ Files opened without an encoding argument will be in UTF-8:
  or
      $ export PERL_UNICODE=D
  or
-     use open qw(:utf8);
+     use open qw(:encoding(UTF-8));
 
 =head2 ℞ 18: Make all I/O and args default to utf8
 
Unfortunately this makes the examples no longer equivalent.

0003:

@@ -3764,8 +3764,8 @@ many elements these have.  For that, use C<scalar @array> and C<scalar keys
 Like all Perl character operations, L<C<length>|/length EXPR> normally
 deals in logical
 characters, not physical bytes.  For how many bytes a string encoded as
-UTF-8 would take up, use C<length(Encode::encode_utf8(EXPR))> (you'll have
-to C<use Encode> first).  See L<Encode> and L<perlunicode>.
+UTF-8 would take up, use C<bytes::length(EXPR)> (you'll have to
+C<use bytes ()> first).  See L<C<use bytes>|bytes> pragma and L<perlunicode>.
 
 =item __LINE__
 X<__LINE__>

This is just plain incorrect.  Whether the length returned by bytes::length() is the UTF-8 encoded length depends on the internal encoding of the string:

$ perl -Mbytes -MEncode -le '$x = "\xA0"; print bytes::length $x; print length Encode::encode("UTF-8", $x)'
1
2

0004:

+C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see
+L</What's the difference between UTF-8 and utf8?> under.

"under" what?  This would normally be "below" instead, I think.

0009:

A string of what is the issue.  Maybe C< $characters > instead of
C< $string >, but that's more Dan's decision.

Tony

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=129298

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About