On Fri, 17 Feb 2017 11:29:16 +0100, pali@cpan.org wrote: > Hi! > > In more perl modules and perl code I see incorrect usage of > utf8::is_utf8(). Most common incorrect pattern is found in modules is: > > use utf8; > > my $value = func(); > if (utf8::is_utf8($value)) { > utf8::encode($value); > } > > As utf8::is_utf8() does not tell if value is already encoded in utf8 > (and in perl it is not possible to detect it) such code is wrong. In > case func() returns string which is internally stored as Latin1 nothing > happen. But when is internally stored as UTF8 then string is converted > to UTF-8 octets. Which means such code pattern encode string to UTF-8 > octets based on internal perl flag which does not make any sense for in > such condition. > > Maybe corrected pattern could be (probably under eval to handle errors): > > my $value = func(); > if (utf8::is_utf8($value) { > utf8::downgrade($value); > } > > Which at least does not modify content of $value. Operator 'eq' on > $value is same despite if condition was true or false. > > As first pattern in more common I would propose to rename function > utf8::is_utf8() to some better name, e.g. utf8::is_upgraded() which does > not say anything about UTF-8 encoding. That *will* break a lot of code that uses the function as it is supposed to be used. You cannot see from the parser if the code that uses this function is using it right or wrong. > And ideally deprecate utf8::is_utf8() function or at least start > throwing warning when is used as most usage of utf8::is_utf8() is > incorrect. > > What do you think about it? Don't rename this function, even if the purpose is debatable -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.25 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/Thread Previous | Thread Next