On Tue, 18 Jul 2017 10:53:53 +1000, Tony Cook <tony@develop-help.com> wrote: > On Mon, Jul 17, 2017 at 10:46:59AM +0200, Sawyer X wrote: > > [Top-posted] > > > > I have mixed thoughts about this. > > > > I'm sympathetic to both considerations: Having properly-named functions > > to reduce confusion for future developers (we hope to have some, right?) > > but not introduce additional cognitive load for existing developers. > > > > A few ways to make such a situation easier: > > > > * Document utf8::is_utf8() to prevent this confusion: This is by far the > > first thing that should be done. I have double checked the wording for > > utf8::is_utf8() from my blead (978b185): > > > > (Since Perl 5.8.1) Test whether $string is marked internally as > > encoded in UTF-8. Functionally the same as "Encode::is_utf8()". > > > > This is confusing, to say the least. "Marked internally" is the words > > core hackers are looking for and recognize, but "UTF-8" is what non-core > > hackers (those without the cognitive bias in core terms) see and > > understand. If we head over to Encode::is_utf8() we see: > > > > [INTERNAL] Tests whether the UTF8 flag is turned on in the /STRING/. > > If /CHECK/ is true, also checks whether /STRING/ contains > > well-formed UTF-8. Returns true if successful, false otherwise. > > > > As of Perl 5.8.1, utf8 <https://metacpan.org/pod/utf8> also has the > > |utf8::is_utf8| function. > > > > I like this wording better for several reasons: It is under the title > > "Messing with Perl's Internals"; it notes the "UTF8" flag, and it adds > > that it checks for well-formed UTF-8 only if that flag is true. There > > are improvements to be made here too. We can note what the flag means > > (subtle, complicated, bike-shed-able) or at the very least add a nice > > "this isn't the flag you're looking for" warning. We can also suggest > > when to use and when not to use the function (otherwise it's left to the > > reader, who can easily get it wrong, which is why we're here). > > utf8::is_utf8() doesn't accept the second parameter and does no > validity checks (we have utf8::valid() for that), despite the note in > utf8.pm. > > > If the document on both was better, then we could have possibly left > > this as unfortunate naming errors we're carrying with us (along with > > "wantarray" for noting whether the context is scalar, list, or void). > ... > > Overall, I'm still undecided. Maybe we could start with improving the > > existing documentation? > > Perhaps something like: > > >> > > =item * C<$flag = utf8::is_utf8($string)> > > (Since Perl 5.8.1) Test whether I<$string> is marked internally as > encoded in UTF-8. Functionally the same as C<Encode::is_utf8($string)>. > Typically only necessary for debugging. > > If you need to force Unicode semantics for code that needs to be > compatible with perls older than 5.12, call C<utf8::upgrade($string)> > unconditionally. > > Using this flag to decide whether a string should be treated as > already encoded bytes or characters is wrong, this should be decided > as part of the interface of your function. > > If you're accepting bytes: > > utf8::downgrade($string); # throws an exception if code point over 0xFF > > utf8::downgrade($string, 1) # our own error handling > or die "\$string must be representable as bytes" > > or if you're accepting characters and need encoded bytes: > > utf8::encode($string); # unconditionally > > The only exception is if you're dealing with filenames, since perl > uses the internal representation of the string for system calls. > > << > > Are there any other cases someone might be tempted to call > utf8::is_utf8()? > > Tony I like this. What I miss here is a small example of how to guarantee preventing double encoding/decoding, as I think that is what is function is most often (erroneously) used for. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/Thread Previous | Thread Next