develooper Front page | perl.perl5.porters | Postings from August 2017

[perl #131683] Encode::ONLY_PRAGMA_WARNINGS in$PerlIO::encoding::fallback

Thread Previous
From:
Tony Cook via RT
Date:
August 28, 2017 01:38
Subject:
[perl #131683] Encode::ONLY_PRAGMA_WARNINGS in$PerlIO::encoding::fallback
Message ID:
rt-4.0.24-32174-1503884295-73.131683-15-0@perl.org
On Mon, 21 Aug 2017 06:25:01 -0700, pali@cpan.org wrote:
> And the goal of those changes is how Encode handle warnings. PerlIO is
> related just because $PerlIO::encoding::fallback is affected by Encode
> changes.
> 
> I would like to move forward and would like to hear if those Encode
> changes together with extending Encode flags and default value for
> $PerlIO::encoding::fallback are OK, or if changes needs to be reworked
> ... or if whole idea for fixing those problems is wrong.

Most of this could be fixed by PerlIO::encoding being a bit smarter with the check value - only setting WARN_ON_ERR when ckWARN(WARN_UTF8) is true.

The only issue would be the utf8 subcategory warnings, like for surrogates, which your Encode patch goes to a lot of effort to pass through.

But a lot of that effort is wasted, for example:

@@ -407,23 +439,29 @@ CODE:
     }
     while (s < e && s+UTF8SKIP(s) <= e) {
 	STRLEN len;
-	UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
-                                               |UTF8_WARN_SURROGATE
-                                               |UTF8_DISALLOW_FE_FF
-                                               |UTF8_WARN_FE_FF
-                                               |UTF8_WARN_NONCHAR));
-	s += len;
-	if (size != 4 && invalid_ucs2(ord)) {
+	U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE;
+	if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER;
+	if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE;
+	if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR;
+	UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
+	if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {

utf8n_to_uvuni() will only warns if those warnings are lexically enabled, so here you're adding extra checks for each category that aren't needed.

The same is true for the calls to uvuni_to_utf8_flags().

In another case you're adding a completely new warning:

+	    if (encode_ckWARN(check, WARN_NONCHAR)) {
+	        warner(packWARN(WARN_NONCHAR),
+		      "%" SVf ":Unicode character %" UVxf " is illegal",
+		      *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+		      ord);
+	    }
+	    ord = FBCHAR;
 	}

which could probably just be made lexically scoped whether the new flag is set or not, since some of the others will be made so due to the changes to decode() and encode().

Of course, that change might be considered a backward incompatibility, since some warnings that were previously produced (because Encode does C<use warnings;> might no longer be (since the new scope might not.)

Tony

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=131683

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About