develooper Front page | perl.perl5.changes | Postings from March 2018

[perl.git] branch blead updated. v5.27.9-86-g578a6a873a

From:
Karl Williamson
Date:
March 2, 2018 19:37
Subject:
[perl.git] branch blead updated. v5.27.9-86-g578a6a873a
Message ID:
E1erqUb-0000PP-Ir@git.dc.perl.space
In perl.git, the branch blead has been updated

<https://perl5.git.perl.org/perl.git/commitdiff/578a6a873a320fe64743b060dbd467f1865d205c?hp=d8255c827dc80db97c8439ea38afc130902a7c1e>

- Log -----------------------------------------------------------------
commit 578a6a873a320fe64743b060dbd467f1865d205c
Author: Karl Williamson <khw@cpan.org>
Date:   Fri Mar 2 12:13:55 2018 -0700

    Reword warning for deviations from UTF-8 locales
    
    Some locales are UTF-8, but not exactly what Perl is expecting.  Revise
    the message raised in this circumstance.
    
    Originally I thought these were violations of Unicode, but based on
    feedback from Craig Berry, I came to realize that these are legitimate
    interpretations of the Unicode standard.  But perl persists with its own
    interpretation that differs from these, hence the warning.

-----------------------------------------------------------------------

Summary of changes:
 README.hpux                    |  9 --------
 locale.c                       |  5 ++---
 pod/perldelta.pod              |  8 +++++++
 pod/perldiag.pod               | 47 ++++++++++++++++++++++++++++--------------
 t/porting/known_pod_issues.dat |  1 +
 5 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/README.hpux b/README.hpux
index ce000dd887..e1857e08dc 100644
--- a/README.hpux
+++ b/README.hpux
@@ -563,15 +563,6 @@ questions about 64-bit numbers when Configure asks you, you may get a
 configuration that cannot be compiled, or that does not function as
 expected.
 
-=head2 Locales on HP-UX
-
-HP-UX installs the locale C<univ.utf8>  and C<en_US.utf8> on all systems.
-Up to and including HP-UX 11.23, this local is defective in that it
-does not thinks that the characters C<< $ + < = > ^ ` | >> and C<~> are
-punctuation, which they are according to the Unicode standards.
-
-This appears to be fixed on HP-UX 11.31.
-
 =head2 Oracle on HP-UX
 
 Using perl to connect to Oracle databases through DBI and DBD::Oracle
diff --git a/locale.c b/locale.c
index ead73e5554..d6d91ea2b9 100644
--- a/locale.c
+++ b/locale.c
@@ -1656,10 +1656,9 @@ S_new_ctype(pTHX_ const char *newctype)
             if (UNLIKELY(bad_count) && PL_in_utf8_CTYPE_locale) {
                 PL_warn_locale = Perl_newSVpvf(aTHX_
                      "Locale '%s' contains (at least) the following characters"
-                     " which have\nnon-standard meanings: %s\nThe Perl program"
-                     " will use the standard meanings",
+                     " which have\nunexpected meanings: %s\nThe Perl program"
+                     " will use the expected meanings",
                       newctype, bad_chars_list);
-
             }
             else {
                 PL_warn_locale = Perl_newSVpvf(aTHX_
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index fb56b110e9..58781e37ee 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -227,6 +227,14 @@ allow entering the I<first> argument of an operator that takes a fixed
 number of arguments, since this is a case that will not cause stack
 corruption.  [perl #132854]
 
+=item *
+
+The warning added in 5.27.8 concerning UTF-8 locale compatibility was
+misleading.  The new wording and explanation are at
+L<perldiag/Locale '%s' contains (at least) the following characters which
+have unexepected meanings: %s  The Perl program will use the exepected
+meanings>
+
 =back
 
 =head1 Utility Changes
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index c24be8a334..3abc301f7a 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -3371,28 +3371,43 @@ said library was compiled against.  Reinstalling the XS module will
 likely fix this error.
 
 =item Locale '%s' contains (at least) the following characters which
-have non-standard meanings: %s  The Perl program will use the standard
+have unexepected meanings: %s  The Perl program will use the exepected
 meanings
 
 (W locale) You are using the named UTF-8 locale.  UTF-8 locales are
-expected to adhere to the Unicode standard.  This message arises when
-perl found some anomalies in the locale, and is notifying you that there
-are potential problems.
-
-The most common cause of this warning is that, contrary to the claims,
-Unicode is not completely locale insensitive.  Turkish and some related
-languages have two types of C<"I"> characters.  One is dotted in both
-upper- and lowercase, and the other is dotless in both cases.  Unicode
-allows a locale to use either these rules, or the rules used in all
-other instances, where there is only one type of C<"I">, which is
-dotless in the uppercase, and dotted in the lower.  The perl core does
-not (yet) handle the Turkish case, and this warns you of that.  Instead,
+expected to have very particular behavior, which most do.  This message
+arises when perl found some departures from the expectations, and is
+notifying you that the expected behavior overrides these differences.
+In some cases the differences are caused by the locale definition being
+defective, but the most common causes of this warning are when there are
+ambiguities and conflicts in following the Standard, and the locale has
+chosen an approach that differs from Perl's.
+
+One of these is because that, contrary to the claims, Unicode is not
+completely locale insensitive.  Turkish and some related languages have
+two types of C<"I"> characters.  One is dotted in both upper- and
+lowercase, and the other is dotless in both cases.  Unicode allows a
+locale to use either the Turkish rules, or the rules used in all other
+instances, where there is only one type of C<"I">, which is dotless in
+the uppercase, and dotted in the lower.  The perl core does not (yet)
+handle the Turkish case, and this message warns you of that.  Instead,
 the L<Unicode::Casing> module allows you to mostly implement the Turkish
 casing rules.
 
-But there are other locales which are defective in not following the
-Unicode standard, and this message is raised if one of these is
-detected.
+The other common cause is for the characters
+
+ $ + < = > ^ ` | ~
+
+These are probematic.  The C standard says that these should be
+considered punctuation in the C locale (and the POSIX standard defers to
+the C standard), and Unicode is generally considered a superset of the C
+locale.  But Unicode has added an extra category, "Symbol", and
+classifies these particular characters as being symbols.  Most UTF-8
+locales have them treated as punctuation, so that L<ispunct(2)> returns
+non-zero for them.  But a few locales have it return 0.   Perl takes the
+first approach, not using C<ispunct()> at all (see L<Note [5] in
+perlrecharclass|perlrecharclass/[5]>), and this message is raised to
+notify you that you are getting Perl's approach, not the locale's.
 
 =item Locale '%s' may not work well.%s
 
diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat
index 78e0ec659d..5856f805f3 100644
--- a/t/porting/known_pod_issues.dat
+++ b/t/porting/known_pod_issues.dat
@@ -147,6 +147,7 @@ ioctl(2)
 IPC::Run
 IPC::Shareable
 IPC::Signal
+ispunct(2)
 kill(3)
 langinfo(3)
 LaTeX::Encode

-- 
Perl5 Master Repository



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About