On Tuesday 21 February 2017 09:46:06 Kent Fredric wrote: > On 21 February 2017 at 01:55, Leon Timmermans <fawaka@gmail.com> wrote: > > which doesn't fit in a function description. > > I'd start by saying that this function has no bearing on whether the > *data* in the scalar is actually utf8 encoded or not. > > That's what most people are thinking I think, that this is a query > about the *content* of the string, when that state > is independent of the state of this flag. > > As an analogy, its about as useful as poking in perl internals to see > if a scalar is a PVIV or not and assuming because its the string "0" > and hte IV slot hasn't been filled yet, that its "not a number" ... > which is useful, but not to people who are simply wanting to see if a > value is safe for math or not. > > As another analogy, utf8ness of strings is like signedness of ints in C. > > If somebody unpacked 4 bytes of data into an unsigned-int when they > should have unpacked it into a signed int, the language will treat the > data wrong. *asking* "is it a signed int" doesn't reallly tell us > anything except about the container. However, if you know there's 4 > bytes of data sitting around in an unsigned int which is really a > signed int, you can locally say "ok, use signed int logic here" > > So with that said: > > > * "$flag = utf8::is_upgraded($string)" > > (Since Perl 5.28) Test whether $string 's internal bytes are marked > for interpretation via utf8 semantics or not. Note this bears > no impact on whether that > data is actually utf8, only how perl functions such as > "length" should treat its bytes. What about this? Test whether $string can internally store wide characters (Unicode code points above U+0000FF). It does not say anything if $string already contains such wide characters or not. You should not use this function except you are dealing with broken XS modules. > > * "$flag = utf8::is_utf8($string)" > (Since Perl 5.8.1) Compatibility-supporting (but poorly named) alias of > utf8::is_upgraded > > > Its a bit wordy, but probably progress. > > Though that said, I think we can find a clearer name than "is_upgraded" Yes, if you (or anybody else) find better name let us know. Clear name for such test function is really needed. Anyway, what about having this function in Internals:: instead in utf8:: ?Thread Previous | Thread Next