Front page | perl.perl5.porters |
Postings from October 2017
Re: RFC: Add new string comparison macros in handy.h
Thread Previous
|
Thread Next
From:
Karl Williamson
Date:
October 26, 2017 02:27
Subject:
Re: RFC: Add new string comparison macros in handy.h
Message ID:
3d67bcd9-4108-e4af-8182-157a79118bf9@khwilliamson.com
On 06/01/2017 02:53 PM, demerphq wrote:
> On 11 May 2017 at 17:22, Karl Williamson <public@khwilliamson.com> wrote:
>> I would like to add the macros given below to handy.h. The situations they
>> handle occur reasonably frequently in the core, and these can save
>> developers from thinking they have to manually count the characters in a
>> string.
>>
>> I am not confident at all about the names, and would like to see if people
>> have better ones.
>
> I think creating a new set of macros with clearer names is a good
> idea, but how easy is it for us to deprecate the old ones?
>
> I wanted to give a summary of the history at stake here:
>
> We have had the following macros since the history of perl:
>
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 478)
> #define strNE(s1,s2) (strcmp(s1,s
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 479)
> #define strEQ(s1,s2) (!strcmp(s1,
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 480)
> #define strLT(s1,s2) (strcmp(s1,s
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 481)
> #define strLE(s1,s2) (strcmp(s1,s
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 482)
> #define strGT(s1,s2) (strcmp(s1,s
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 483)
> #define strGE(s1,s2) (strcmp(s1,s
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 485)
> #define strnNE(s1,s2,l) (strncmp(
> ^8d063cd (Larry Wall 1987-12-18 00:00:00 +0000 486)
> #define strnEQ(s1,s2,l) (!strncmp
>
> We have had these since 1996:
>
> 36477c24 (Perl 5 Porters 1996-12-06 18:56:00 +1200 497) #
> define memNE(s1,s2,l) (memcmp(
> 36477c24 (Perl 5 Porters 1996-12-06 18:56:00 +1200 498) #
> define memEQ(s1,s2,l)
>
> We have had these since 2007:
>
> 568a785a (Nicholas Clark 2007-03-23 16:55:13 +0000 505)
> #define memEQs(s1, l, s2) \
> 777fa2cb (Yves Orton 2016-10-19 10:32:29 +0200 506)
> (((sizeof(s2)-1) == (l))
> 568a785a (Nicholas Clark 2007-03-23 16:55:13 +0000 507)
> #define memNEs(s1, l, s2) !memEQs
>
> You added these in September 2016:
>
> 062b6850 (Karl Williamson 2016-09-10 08:54:36 -0600 515)
> #define memLT(s1,s2,l) (memcmp(s1
> 062b6850 (Karl Williamson 2016-09-10 08:54:36 -0600 516)
> #define memLE(s1,s2,l) (memcmp(s1
> 062b6850 (Karl Williamson 2016-09-10 08:54:36 -0600 517)
> #define memGT(s1,s2,l) (memcmp(s1
> 062b6850 (Karl Williamson 2016-09-10 08:54:36 -0600 518)
> #define memGE(s1,s2,l) (memcmp(s1
>
> I added these in October 2016 (in a post I just send I realize they
> were misnamed and should have been called strnNEs(), note the missing
> 'n' to comply with strnNE(). )
>
> 62946e08 (Yves Orton 2016-10-19 10:30:44 +0200 492)
> #define strNEs(s1,s2) (strncmp(s1
> 62946e08 (Yves Orton 2016-10-19 10:30:44 +0200 493)
> #define strEQs(s1,s2) (!strncmp(s
>
> and these:
>
> 777fa2cb (Yves Orton 2016-10-19 10:32:29 +0200 511)
> #define _memEQs(s1, s2) \
> 777fa2cb (Yves Orton 2016-10-19 10:32:29 +0200 512)
> (memEQ((s1), ("" s2 ""),
> 777fa2cb (Yves Orton 2016-10-19 10:32:29 +0200 513)
> #define _memNEs(s1, s2) (memNE((s
>
>
>> I also would like to document memEQs, memLE, memLT, memGE, and memGT. And
>> move all similar macros to a new section, "String comparison functions",
>> from the current "Miscellaneous".
>>
>> strSTARTS_WITHs
>> Test if the "NUL"-terminated string "s1" begins with the
>> substring
>> given by the string literal "s2", returning non-zero if so
>> (including if the two are identical); zero otherwise.
>>
>> bool strSTARTS_WITHs(char* s1, char* s2)
>
> So this is equivalent to the current strEQs().
>
> To comply with existing convention strEQs() should be renamed strnEQs().
>
> I think adding a long form equivalent is ok, but i think the old
> naming convention (assuming the name is corrected to include the 'n')
> make sense too.
>
>> memSTARTS_WITHs
>> Test if the string buffer "s1" with length "l1" begins with the
>> substring given by the string literal "s2", returning non-zero
>> if
>> so (including if the two are identical); zero otherwise. The
>> comparison does not include the final "NUL" of "s2". "s1" does
>> not
>> have to be "NUL"-terminated,
>
> So the difference with the 'str' version is that str() considers a
> null byte to be end of string, and mem() does not. Is there any case
> where using memcmp() instead of str[n]cmp() is wrong for this type of
> macro? If not maybe we should just have one (using memcmp).
>
>
>> bool memSTARTS_WITHs(char* s1, STRLEN l1, char* s2)
>>
>> memENDS_WITHs
>> Test if the string buffer "s1" with length "l1" ends with the
>> substring given by the string literal "s2", returning non-zero
>> if
>> so (including if the two are identical); zero otherwise. The
>> comparison does not include the final "NUL" of "s2". "s1" does
>> not
>> have to be "NUL"-terminated,
>>
>> bool memENDS_WITHs(char* s1, STRLEN l1, char* s2)
>
> Do we actually have/use this? Beyond the comments above about "mem" vs
> "str" I dont have any problem with this.
>
>>
>> memFOO_STARTING_WITHs
>> Test if the string buffer "s1" with length "l1" begins with the
>> substring given by the string literal "s2", and that "s1" is
>> longer than "s2", returning non-zero if so; zero otherwise. In
>> other words, "s2" begins "s1" but is not all of "s1". The
>> comparison does not include the final "NUL" of "s2". "s1" does
>> not
>> have to be "NUL"-terminated,
>>
>> bool memFOO_STARTING_WITHs(char* s1, STRLEN l1,
>> char* s2)
>>
>> memFOO_ENDING_WITHs
>> Test if the string buffer "s1" with length "l1" ends with the
>> substring given by the string literal "s2", and that "s1" is
>> longer than "s2", returning non-zero if so; zero otherwise. In
>> other words, "s2" ends "s1" but is not all of "s1". The
>> comparison
>> does not include the final "NUL" of "s2". "s1" does not have to
>> be
>> "NUL"-terminated,
>>
>> bool memFOO_ENDING_WITHs(char* s1, STRLEN l1,
>> char* s2)
>
> So we need something better than FOO.
>
> Personally i would prefer to see a convention more like:
>
> (mem|str)IS_(PREFIX|SUFFIX|EQ|NE|LT|GT|GE|LE)[ls]*
>
> With the appropriate mix of arguments specified by the suffix.
>
> That would mean all of the macros of the form strIS() and memIS() come
> from the new convention, and everything else is historical.
>
> So i could imagine a macro
>
> as well as
>
> strIS_EQ(s1,s2)
> strIS_EQs(s1,s2)
> strIS_EQls(s1,l1,s2)
> strIS_EQl(s1,l1,s2)
> strIS_EQll(s1,l1,s2,l2)
>
> and possibly a few other permutations.
>
> I like the idea of standardizing this stuff with conventions that well
> described and predictable so if we have to add a new variant it is
> well defined what it should be called.
>
> cheers,
> Yves
>
I have finally looked at Yves' proposal, and I'm not convinced we
currently have an inconsistent interface that needs to be fixed, and so
I think we should stay with what we have as a template. That interface
closely follows the C library ones, which is a good thing since those
using it are programming in C.
That interface I believe is
(mem|strn?)(EQ|NE|LT|GT|GE|LE)s?
The optional 'n' in 'str' calls follows the C convention of meaning
"There is a third parameter at the end, giving the maximum number of
bytes to use in the comparison"
The trailing 's' means the 2nd parameter is a C literal double-quoted
string. It's length is known by at compile time, and is not explicitly
specified.
The mem functions all require an explicit length parameter. If the same
length applies to both of the buffer parameters, it is the third
parameter. If it applies to just the first buffer parameter, it
immediately follows that one. (All the cases so far where there might
be a different length for the second parameter use the trailing 's' form.)
That's the existing convention. I don't see any inconsistencies with
the existing macros, except in the macros Yves added: strNEs and strEQs.
And those macros are protected from non-core usage in 5.26 by
restricting them to PERL_CORE.
If we added a mem macro where the 2nd buffer needed a different explicit
length parameter to be supplied, then I'm fine with using 'l' as a
suffix in the macro name to mean that, and documenting that in handy.h
as our plan.
The strEQs that is confined to core really is looking to see if the
second string is an initial substring of the first. What I would have
thought it meant instead is: are two strings the same? given the second
is a compile time constant.
My original point is that it would be clearer to have a name that
indicates the initial substring check. I'm not happy with what I
proposed, and would be OK with using strPREFIX and strPREFIXs, 'PREFIX'
being a term that Yves proposed.
But there is an occasional need for checking that the initial substring
is not the complete main string. In mathematical terms, that it is a
"proper" substring. We could say strPROPER_PREFIX, but that's getting a
bit long. We could say strPPREFIX, but I worry that the doubled-P
wouldn't stand out enough as different from a single 'P'. I'm now
leaning to strBEGIN and strPBEGIN. or just strBEG, strPBEG.
Similarly there are cases where we are looking for the final substring,
both proper and not. I now think strEND and strPEND are ok for this.
strINITIAL_SUBSTRING is accurate but I think too long.
Whatever name we choose to signify the concepts 'initial' and 'final',
it could and should also be applied to the mem functions if the need arose.
I suspect that a bunch of strnEQ calls are really looking for an initial
substring, and changing such to say so means the code reader doesn't
have to do any counting to see what the real effect is.
To summarize, I think that there are no inconsistencies that aren't
already the same as C library calls, in the macros usable outside of
core, so I don't believe we need to come up with a consistent set.
Doing so might actually confuse C programmers. I do want new macros
that test if the second parameter is an initial or final substring of
the first parameter. Such additions should be consistent, across str
and mem forms, and could be documented in handy.h.
One version of what I'm thinking is
(mem|strn?)(EQ|NE|LT|GT|GE|LE|P?(BEG|END))s?
Thread Previous
|
Thread Next