develooper Front page | perl.perl5.porters | Postings from February 2015

Pod to Assume CP-1252

Thread Next
David E. Wheeler
February 9, 2015 20:09
Pod to Assume CP-1252
Message ID:
Hello Porters,

Just a quick announcement that the next release of Pod::Simple will change its selection of default encoding from Latin-1 to CP-1252. This is because CP-1252 is a superset of Latin-1, so probably a more useful default. This will be especially useful for Pod created on Windows platforms, where the use of the subset of characters is more likely to occur.

This change has come about after a fair bit of discussion on pod-people, including the input of Pod::Simple’s original author, Sean Burke. The change is being made by Karl Williamson as an extension of his EBCIDC fixes to the module.

A test release of Pod-Simple will likely go out this week, and after a couple weeks of soak by cpan-testers, assuming few or only minor issues, a final release will be made.

The Pod spec, however, is part of core, not Pod::Simple, so here is a patch to reflect the change to the specification. I have not changed references to E<> entities, though, as I expect those are still just Latin-1. Karl, can you confirm?

Related: There’s an out-of date copy of the spec in ext/Pod-Html/testdir/perlpodspec-copy.pod; should that be changed to point to the original?



diff --git a/pod/perlpod.pod b/pod/perlpod.pod
index 12b156b..d675de7 100644
--- a/pod/perlpod.pod
+++ b/pod/perlpod.pod
@@ -286,7 +286,7 @@ users won't need this; but if your encoding isn't US-ASCII,
 then put a C<=encoding I<encodingname>> command very early in the document so
 that pod formatters will know how to decode the document.  For
 I<encodingname>, use a name recognized by the L<Encode::Supported>
-module.  Some pod formatters may try to guess between a Latin-1 versus
+module.  Some pod formatters may try to guess between a CP-1252 versus
 UTF-8 encoding, but they may guess wrong.  It's best to be explicit if
 you use anything besides strict ASCII.  Examples:
@@ -496,7 +496,7 @@ e with an acute (/-shaped) accent.
-The ASCII/Latin-1/Unicode character with that number.  A
+The ASCII/CP-1252/Unicode character with that number.  A
 leading "0x" means that I<number> is hex, as in
 C<EE<lt>0x201EE<gt>>.  A leading "0" means that I<number> is octal,
 as in C<EE<lt>075E<gt>>.  Otherwise I<number> is interpreted as being
@@ -505,7 +505,7 @@ in decimal, as in C<EE<lt>181E<gt>>.
 Note that older Pod formatters might not recognize octal or
 hex numeric escapes, and that many formatters cannot reliably
 render characters above 255.  (Some formatters may even have
-to use compromised renderings of Latin-1 characters, like
+to use compromised renderings of CP-1252 characters, like
 rendering C<EE<lt>eacuteE<gt>> as just a plain "e".)
diff --git a/pod/perlpodspec.pod b/pod/perlpodspec.pod
index f2af63e..a2a4f8f 100644
--- a/pod/perlpodspec.pod
+++ b/pod/perlpodspec.pod
@@ -607,7 +607,7 @@ as signaling that the file is Unicode encoded as in UTF-16 (whether
 big-endian or little-endian) or UTF-8, Pod parsers should do the
 same.  Otherwise, the character encoding should be understood as
 being UTF-8 if the first highbit byte sequence in the file seems
-valid as a UTF-8 sequence, or otherwise as Latin-1.
+valid as a UTF-8 sequence, or otherwise as CP-1252.
 Future versions of this specification may specify
 how Pod can accept other encodings.  Presumably treatment of other
@@ -641,7 +641,7 @@ I<and> whether the next byte is in the range
 0x80 - 0xBF.  If so, the parser may conclude that this file is in
 UTF-8, and all highbit sequences in the file should be assumed to
 be UTF-8.  Otherwise the parser should treat the file as being
-in Latin-1.  (A better check is to pass a copy of the sequence to
+in CP-1252.  (A better check is to pass a copy of the sequence to
 L<utf8::decode()|utf8> which performs a full validity check on the
 sequence and returns TRUE if it is valid UTF-8, FALSE otherwise.  This
 function is always pre-loaded, is fast because it is written in C, and

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About