develooper Front page | perl.perl5.porters | Postings from June 2017

Re: RFC: Add new string comparison macros in handy.h

Thread Previous | Thread Next
June 3, 2017 00:47
Re: RFC: Add new string comparison macros in handy.h
Message ID:
Reposting as I replied to karl directly and forgot the list.

On 2 June 2017 at 18:16, Karl Williamson <> wrote:
> On 06/02/2017 06:46 AM, demerphq wrote:
>> On 2 June 2017 at 13:30, Ævar Arnfjörð Bjarmason <> wrote:
>>> On Thu, May 11, 2017 at 5:22 PM, Karl Williamson
>>> <> wrote:
>>>>      memSTARTS_WITHs
>>>>              Test if the string buffer "s1" with length "l1" begins with
>>>> the
>>>>              substring given by the string literal "s2", returning
>>>> non-zero
>>>> if
>>>>              so (including if the two are identical); zero otherwise.
>>>> The
>>>>              comparison does not include the final "NUL" of "s2". "s1"
>>>> does
>>>> not
>>>>              have to be "NUL"-terminated,
>>>>                      bool    memSTARTS_WITHs(char* s1, STRLEN l1, char*
>>>> s2)
>>> I don't have to use these and don't really care, but just a question:
>>> Is there a reason for why the prototype for the the mem* functions
>>> doesn't also pass the STRLEN for the needle as well as the haystack?
>> The whole point of the 's' family macros is to handle cases where one
>> of the arguments is a constant string in the C code, and therefore the
>> length can be computed by the macro. In other words cases like this:
>> STRLEN len;
>> char *pv= SvPV(thing,len);
>> if (memSTARTS_WITHs(pv,len,"someprefix")) { ... }
>> That is why I mentioned the variants I did, which I will relist with
>> better arguments:
>> strIS_EQ(pv,pv)
>> strIS_EQs(pv,"string")
>> strIS_EQls(pv,len,"string")
>> strIS_EQl(pv,len,pv)
>> strIS_EQll(pv,len,pv,len)
>>> Right now the interface only allows the haystack not the needle to
>>> contain \0, which seems like a needless arbitrary limitation for
>>> something that's essentially a fancy strstr() & memmem(). I.e. you
>>> have feature-parity with strstr() (and extra features like "begins
>>> with?"), but not with memmem().
>> With the 's' macros we know the length of the string by using
>> sizeof(). The 's' macros are composed of the STR_WITH_LEN() macro
>> trick:
>> #define STR_WITH_LEN(s) "" s "", sizeof(s)-1
>> the "" s "" thing guarantees the argument is a C string, not a
>> pointer, and the sizeof(s)-1 tells us its length.
>> With the api I proposed in a reply to Karl the 'll' variants would
>> cover the cases you are thinking of.
>> To recap and refine that proposal:
>> (mem|str)IS_(PREFIX_|SUFFIX_)?(EQ|NE|LT|GT|GE|LE)[ls]*
>> More specifically the suffixes would be:
>> '' :  none, both arguments are pv's without a length.
>> 's': second argument is a constant string
>> 'l' : first argument has a length, second argument is a pointer
>> 'ls': first argument has a length, second argument is a constant string
>> 'll':  both arguments are char *'s and have lengths.
>> Not all suffixes would apply to 'mem', but i think they all apply to
>> 'str'.
>> Whether we should have 'str' at all is a different question.
>> cheers,
>> Yves
> I haven't had a chance to fully evaluate this, but a couple of quick things.
> Yes, we do need 'str'.  There are a bunch of places where the length is not
> known, and one of the arguments is a C string (and so for most purposes the
> other argument not being a C string turns out to not affect the result).

Ok. Thanks!

> I went through the core, and the existing macros plus the ones I proposed
> are sufficient to handle the existing cases and make the code much easier to
> grok without detailed examination.
> They are also easier to program right, as the coder doesn't have to count
> the length manually.

I think there is broad consensus on this.

> The ENDS with is used in a few places, like seeing if a path ends in '.pm'

Fair enough.

> I'm not a fan of the trailing 's' in the name mean a literal string. Both
> arguments are always strings.  I had thought 'l' for 'literal', but you have
> used that one up.  Maybe 'q' for 'quoted'
> I also have never liked a prefix IS (or 'is').  It's just extra typing that
> doesn't really help readability.

I just proposed that because we have the existing macros which have
inconsistent interfaces, particularly the mem() ones, and I would like
 us to have a nice clean and *complete* set of rules to use so its
easy to remember which one does what. (That is a key part of my

I don't have much of an axe to grind about the 's', but I do think
brevity is useful in this context.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About