Front page | perl.perl5.porters |
Postings from October 2016
[perl #129298] [PATCH] Update documentation about UTF-8
Thread Previous
From:
Tony Cook via RT
Date:
October 11, 2016 05:02
Subject:
[perl #129298] [PATCH] Update documentation about UTF-8
Message ID:
rt-4.0.24-22532-1476162171-722.129298-15-0@perl.org
On Mon Sep 19 09:27:54 2016, pali@cpan.org wrote:
> On Sunday 18 September 2016 16:27:48 James E Keenan via RT wrote:
> > 1. The Encode library is "cpan upstream," i.e., it is primarily
> > maintained on CPAN. Hence, requests for changes in its documentation
> > -- your patches 0008, 0009, 0010 -- should be filed via bug-
> > Encode@rt.cpan.org or via the web interface at
> > https://rt.cpan.org/Dist/Display.html?Name=Encode.
> >
> > 2. Because at least 7 different files are touched by the patches
> > attached to this ticket, I think we should get multiple eyeballs on
> > them. Paging our experts on Unicode and IO layers!
>
> Ok! Anyway, all changes are only to documentation sections so other
> people could look at it too. There is no code change.
>
> And Encode patches are there too as they are referenced by core perl
> pod
> files. So before sending them to cpan upstream it could be great if
> you
> can review them too...
0001:
@@ -280,7 +280,7 @@ Files opened without an encoding argument will be in UTF-8:
or
$ export PERL_UNICODE=D
or
- use open qw(:utf8);
+ use open qw(:encoding(UTF-8));
=head2 ℞ 18: Make all I/O and args default to utf8
Unfortunately this makes the examples no longer equivalent.
0003:
@@ -3764,8 +3764,8 @@ many elements these have. For that, use C<scalar @array> and C<scalar keys
Like all Perl character operations, L<C<length>|/length EXPR> normally
deals in logical
characters, not physical bytes. For how many bytes a string encoded as
-UTF-8 would take up, use C<length(Encode::encode_utf8(EXPR))> (you'll have
-to C<use Encode> first). See L<Encode> and L<perlunicode>.
+UTF-8 would take up, use C<bytes::length(EXPR)> (you'll have to
+C<use bytes ()> first). See L<C<use bytes>|bytes> pragma and L<perlunicode>.
=item __LINE__
X<__LINE__>
This is just plain incorrect. Whether the length returned by bytes::length() is the UTF-8 encoded length depends on the internal encoding of the string:
$ perl -Mbytes -MEncode -le '$x = "\xA0"; print bytes::length $x; print length Encode::encode("UTF-8", $x)'
1
2
0004:
+C<decode('UTF-8', ...)> and C<encode('UTF-8', ...)>; see
+L</What's the difference between UTF-8 and utf8?> under.
"under" what? This would normally be "below" instead, I think.
0009:
A string of what is the issue. Maybe C< $characters > instead of
C< $string >, but that's more Dan's decision.
Tony
---
via perlbug: queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=129298
Thread Previous