develooper Front page | perl.perl5.porters | Postings from July 2017

Re: [perl #131685] Rename utf8::is_utf8() (and other functions)

Thread Previous | Thread Next
From:
pali
Date:
July 4, 2017 09:03
Subject:
Re: [perl #131685] Rename utf8::is_utf8() (and other functions)
Message ID:
20170704090331.GA13115@pali
On Tuesday 04 July 2017 01:52:29 yves orton via RT wrote:
> On 4 July 2017 at 09:19,  <pali@cpan.org> wrote:
> > On Tuesday 04 July 2017 10:38:26 Tony Cook wrote:
> >> But it does deprecate the old names, which is an issue, I can't
> >> imagine us removing these functions.
> >
> > Warning can be removed from patch. It is just question how you decide.
> > Also functions stay there, but we can instruct people via documentation
> > to use new functions for a new code... Again it is question if you call
> > it deprecation or aliasing. In any case functions are not going to be
> > deleted, so in final case it does not matter for old code.
> >
> > And for old code can be defined this function easily:
> >
> >   *new_name = *old_name;
> >
> > Reason for this patch series is:
> > * document those utf8:: functions
> > * allow developers to call those functions via non-cryptic names
> 
> I dont mind adding new aliases for these functions, I object to your
> proposal to put them in Internals however; I think that they should go
> in 'scalar', which we decided at the last PerlQA is the designated
> place for functions that operate on scalars.

I proposed Internals, because that flag is internal for perl and
invisible for pure perl code. But if more people are happy with scalar
namespace, I'm fine with it.

> scalar::is_unicode_string()
> scalar::is_binary_string()

But this is wrong! SVf_UTF8 does not tell if scalar string is unicode
or binary. It just tell type of internal storage.

Name is_binary_string is misleading in same way as current name is_utf8.

If you say that binary string is one with codes only in range 0x00-0xFF
then you can have that binary string also with SVf_UTF8 flag and your
function name "is_binary_string" would return false for your binary
string. Such name would lead to another problems.

> I don't like the wide-storage thing, (although I admit i think it
> better than "is_utf8"), a latin1 string in utf8 does not use
> wide-storage,

Of course it can. Unicode code points 0x80 .. 0xFF (which are Latin1
extension from ASCII) contains two bytes when encoded in UTF-8 and
therefore are wide in UTF-8 too.

> and the unicode flag has significance beyond the storage
> format; utf8-on strings get unicode semantics in case insensitive
> operations.
> 
> cheers,
> Yves

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About