develooper Front page | perl.perl5.porters | Postings from February 2015

Re: Pod to Assume CP-1252

Thread Previous | Thread Next
David E. Wheeler
February 24, 2015 22:53
Re: Pod to Assume CP-1252
Message ID:
On Feb 13, 2015, at 2:44 PM, David E. Wheeler <> wrote:

> FYI, this is now on CPAN in a test release, 3.29_3. Details here:
> Please test!

And as of yesterday, 3.30 is released. I therefore recommend applying the patch against perlpodspec. Here it is again.’



diff --git a/pod/perlpod.pod b/pod/perlpod.pod
index 12b156b..d675de7 100644
--- a/pod/perlpod.pod
+++ b/pod/perlpod.pod
@@ -286,7 +286,7 @@ users won't need this; but if your encoding isn't US-ASCII,
then put a C<=encoding I<encodingname>> command very early in the document so
that pod formatters will know how to decode the document.  For
I<encodingname>, use a name recognized by the L<Encode::Supported>
-module.  Some pod formatters may try to guess between a Latin-1 versus
+module.  Some pod formatters may try to guess between a CP-1252 versus
UTF-8 encoding, but they may guess wrong.  It's best to be explicit if
you use anything besides strict ASCII.  Examples:

@@ -496,7 +496,7 @@ e with an acute (/-shaped) accent.


-The ASCII/Latin-1/Unicode character with that number.  A
+The ASCII/CP-1252/Unicode character with that number.  A
leading "0x" means that I<number> is hex, as in
C<EE<lt>0x201EE<gt>>.  A leading "0" means that I<number> is octal,
as in C<EE<lt>075E<gt>>.  Otherwise I<number> is interpreted as being
@@ -505,7 +505,7 @@ in decimal, as in C<EE<lt>181E<gt>>.
Note that older Pod formatters might not recognize octal or
hex numeric escapes, and that many formatters cannot reliably
render characters above 255.  (Some formatters may even have
-to use compromised renderings of Latin-1 characters, like
+to use compromised renderings of CP-1252 characters, like
rendering C<EE<lt>eacuteE<gt>> as just a plain "e".)

diff --git a/pod/perlpodspec.pod b/pod/perlpodspec.pod
index f2af63e..a2a4f8f 100644
--- a/pod/perlpodspec.pod
+++ b/pod/perlpodspec.pod
@@ -607,7 +607,7 @@ as signaling that the file is Unicode encoded as in UTF-16 (whether
big-endian or little-endian) or UTF-8, Pod parsers should do the
same.  Otherwise, the character encoding should be understood as
being UTF-8 if the first highbit byte sequence in the file seems
-valid as a UTF-8 sequence, or otherwise as Latin-1.
+valid as a UTF-8 sequence, or otherwise as CP-1252.

Future versions of this specification may specify
how Pod can accept other encodings.  Presumably treatment of other
@@ -641,7 +641,7 @@ I<and> whether the next byte is in the range
0x80 - 0xBF.  If so, the parser may conclude that this file is in
UTF-8, and all highbit sequences in the file should be assumed to
be UTF-8.  Otherwise the parser should treat the file as being
-in Latin-1.  (A better check is to pass a copy of the sequence to
+in CP-1252.  (A better check is to pass a copy of the sequence to
L<utf8::decode()|utf8> which performs a full validity check on the
sequence and returns TRUE if it is valid UTF-8, FALSE otherwise.  This
function is always pre-loaded, is fast because it is written in C, and

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About