develooper Front page | perl.perl5.porters | Postings from February 2017

Proposal: Rename utf8::is_utf8() to utf8::is_upgraded()

Thread Next
From:
pali
Date:
February 17, 2017 10:37
Subject:
Proposal: Rename utf8::is_utf8() to utf8::is_upgraded()
Message ID:
20170217102916.GA19728@pali
Hi!

In more perl modules and perl code I see incorrect usage of
utf8::is_utf8(). Most common incorrect pattern is found in modules is:

  use utf8;

  my $value = func();
  if (utf8::is_utf8($value)) {
    utf8::encode($value);
  }

As utf8::is_utf8() does not tell if value is already encoded in utf8
(and in perl it is not possible to detect it) such code is wrong. In
case func() returns string which is internally stored as Latin1 nothing
happen. But when is internally stored as UTF8 then string is converted
to UTF-8 octets. Which means such code pattern encode string to UTF-8
octets based on internal perl flag which does not make any sense for in
such condition.

Maybe corrected pattern could be (probably under eval to handle errors):

  my $value = func();
  if (utf8::is_utf8($value) {
    utf8::downgrade($value);
  }

Which at least does not modify content of $value. Operator 'eq' on
$value is same despite if condition was true or false.

As first pattern in more common I would propose to rename function
utf8::is_utf8() to some better name, e.g. utf8::is_upgraded() which does
not say anything about UTF-8 encoding.

And ideally deprecate utf8::is_utf8() function or at least start
throwing warning when is used as most usage of utf8::is_utf8() is
incorrect.

What do you think about it?

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About