develooper Front page | perl.perl5.porters | Postings from February 2017

Re: Proposal: Rename utf8::is_utf8() to utf8::is_upgraded()

Thread Previous | Thread Next
From:
pali
Date:
February 17, 2017 14:52
Subject:
Re: Proposal: Rename utf8::is_utf8() to utf8::is_upgraded()
Message ID:
20170217145157.GA7797@pali
On Friday 17 February 2017 15:36:09 H.Merijn Brand wrote:
> On Fri, 17 Feb 2017 11:29:16 +0100, pali@cpan.org wrote:
> 
> > Hi!
> > 
> > In more perl modules and perl code I see incorrect usage of
> > utf8::is_utf8(). Most common incorrect pattern is found in modules is:
> > 
> >   use utf8;
> > 
> >   my $value = func();
> >   if (utf8::is_utf8($value)) {
> >     utf8::encode($value);
> >   }
> > 
> > As utf8::is_utf8() does not tell if value is already encoded in utf8
> > (and in perl it is not possible to detect it) such code is wrong. In
> > case func() returns string which is internally stored as Latin1 nothing
> > happen. But when is internally stored as UTF8 then string is converted
> > to UTF-8 octets. Which means such code pattern encode string to UTF-8
> > octets based on internal perl flag which does not make any sense for in
> > such condition.
> > 
> > Maybe corrected pattern could be (probably under eval to handle errors):
> > 
> >   my $value = func();
> >   if (utf8::is_utf8($value) {
> >     utf8::downgrade($value);
> >   }
> > 
> > Which at least does not modify content of $value. Operator 'eq' on
> > $value is same despite if condition was true or false.
> > 
> > As first pattern in more common I would propose to rename function
> > utf8::is_utf8() to some better name, e.g. utf8::is_upgraded() which does
> > not say anything about UTF-8 encoding.
> 
> That *will* break a lot of code that uses the function as it is
> supposed to be used.

The only reason to use utf8::is_utf8() function is probably when you
need to deal with broken XS module. As in pure perl code is internal
storage of string irrelevant and fully invisible. Strings "\x{A0}" and
"\N{U+A0}" are same.

But utf8::is_utf8() is used on other places which means that code is
with high probability not correct.

> You cannot see from the parser if the code that uses this function is
> using it right or wrong.

Yes, I know.

I did not mean to introduce some "heuristic" which will "rewrite" some
code pattern to another. Such thing will never work...

> > And ideally deprecate utf8::is_utf8() function or at least start
> > throwing warning when is used as most usage of utf8::is_utf8() is
> > incorrect.
> > 
> > What do you think about it?
> 
> Don't rename this function, even if the purpose is debatable

Maybe better description what I mean:

Introduce new function utf8::is_upgraded() which will be copy of
utf8::is_utf8().

And then starting discussion about either deprecating utf8::is_utf8()
or start throwing warning when utf8::is_utf8() is used.

That should not break any existing code. And new people who do know what
is utf8::is_utf8() doing will stop using it in new code.

Idea is that people will stop using utf8::is_utf8() function which has
really bad name.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About