[ID 20000323.059] mistruths in perlunicode.pod

March 23, 2000 20:56
[ID 20000323.059] mistruths in perlunicode.pod
Message ID:

This is a bug report for perl from,
generated with the help of perlbug 1.27 running under perl v5.5.670.

[Please enter your report here]

       o   Strings and patterns may contain characters that have
           an ordinal value larger than 255.  In Perl v5.6, this
           is only enabled if the lexical scope has a use utf8
           declaration (due to compatibility needs) but future
           versions may enable this by default.

This is apparently untrue.

  print ord(v2000), "\n";
  print ord("\x{7d0}"), "\n";

Both print "2000" regardless of whether use utf8 is in effect.  

I'm really not sure HOW this is supposed to work, but if this is correct
behavior, it seems like the explanation would have to refer to the fact
that there is a bit that indicates whether strings are byte- or UTF-8-
encoded and that, no matter whether use utf8 is in effect, there are
constructs that generate UTF-8-encoded strings.  If this is undesirable
(personally, I think being able to generate UTF-8-encoded strings
without use utf8 *is* desirable) then the holes that allow the
generation of UTF-8-encoded strings in this version of perl without 
use utf8 should be closed.

Also, it seems like there ought to be a built-in function (available
WITHOUT use utf8) that can be used to determine whether a string is
UTF-8-encoded.  Or at least it should be in a separate pragma module
or available in a way that doesn't inflict the semantics of utf8 on
programs.  Something like isutf8($str) = TRUE if $str is UTF-8-encoded,
or perhaps encoding($str) = 'UTF-8'.  Maybe there is such a built-in
or pragma already but I haven't found it in the docs.  (Yet.)

