On 07/19/2017 08:58 AM, Tony Cook wrote: > On Tue, Jul 18, 2017 at 10:53:53AM +1000, Tony Cook wrote: >> On Mon, Jul 17, 2017 at 10:46:59AM +0200, Sawyer X wrote: >>> [Top-posted] >>> >>> I have mixed thoughts about this. >>> >>> I'm sympathetic to both considerations: Having properly-named functions >>> to reduce confusion for future developers (we hope to have some, right?) >>> but not introduce additional cognitive load for existing developers. >>> >>> A few ways to make such a situation easier: >>> >>> * Document utf8::is_utf8() to prevent this confusion: This is by far the >>> first thing that should be done. I have double checked the wording for >>> utf8::is_utf8() from my blead (978b185): >>> >>> (Since Perl 5.8.1) Test whether $string is marked internally as >>> encoded in UTF-8. Functionally the same as "Encode::is_utf8()". >>> >>> This is confusing, to say the least. "Marked internally" is the words >>> core hackers are looking for and recognize, but "UTF-8" is what non-core >>> hackers (those without the cognitive bias in core terms) see and >>> understand. If we head over to Encode::is_utf8() we see: >>> >>> [INTERNAL] Tests whether the UTF8 flag is turned on in the /STRING/. >>> If /CHECK/ is true, also checks whether /STRING/ contains >>> well-formed UTF-8. Returns true if successful, false otherwise. >>> >>> As of Perl 5.8.1, utf8 <https://metacpan.org/pod/utf8> also has the >>> |utf8::is_utf8| function. >>> >>> I like this wording better for several reasons: It is under the title >>> "Messing with Perl's Internals"; it notes the "UTF8" flag, and it adds >>> that it checks for well-formed UTF-8 only if that flag is true. There >>> are improvements to be made here too. We can note what the flag means >>> (subtle, complicated, bike-shed-able) or at the very least add a nice >>> "this isn't the flag you're looking for" warning. We can also suggest >>> when to use and when not to use the function (otherwise it's left to the >>> reader, who can easily get it wrong, which is why we're here). >> utf8::is_utf8() doesn't accept the second parameter and does no >> validity checks (we have utf8::valid() for that), despite the note in >> utf8.pm. >> >>> If the document on both was better, then we could have possibly left >>> this as unfortunate naming errors we're carrying with us (along with >>> "wantarray" for noting whether the context is scalar, list, or void). >> ... >>> Overall, I'm still undecided. Maybe we could start with improving the >>> existing documentation? >> Perhaps something like: >> >> =item * C<$flag = utf8::is_utf8($string)> >> >> (Since Perl 5.8.1) Test whether I<$string> is marked internally as >> encoded in UTF-8. Functionally the same as C<Encode::is_utf8($string)>. >> Typically only necessary for debugging. >> >> If you need to force Unicode semantics for code that needs to be >> compatible with perls older than 5.12, call C<utf8::upgrade($string)> >> unconditionally. >> >> Using this flag to decide whether a string should be treated as >> already encoded bytes or characters is wrong, this should be decided >> as part of the interface of your function. >> >> If you're accepting bytes: >> >> utf8::downgrade($string); # throws an exception if code point over 0xFF >> >> utf8::downgrade($string, 1) # our own error handling >> or die "\$string must be representable as bytes" >> >> or if you're accepting characters and need encoded bytes: >> >> utf8::encode($string); # unconditionally >> >> The only exception is if you're dealing with filenames, since perl >> uses the internal representation of the string for system calls. >> >> << >> >> Are there any other cases someone might be tempted to call >> utf8::is_utf8()? > Thinking about it further, I'm pretty sure this doesn't all belong > here. > > L<perlunifaq/What is "the UTF8 flag"?> provides a good description of > the flag is_utf8() returns, and the whole of perlunifaq covers some of > the things the above tries to cover. > > perlunicook largely works at a higher level than the functions in > utf8::* work at. +1 on the suggested text. I think this addition is useful, even if it is also covered in more documents. We could also link to those documents for further learning.Thread Previous | Thread Next