develooper Front page | perl.perl5.porters | Postings from June 2017

Re: RFC: Add new string comparison macros in handy.h

Thread Previous | Thread Next
From:
Karl Williamson
Date:
June 2, 2017 16:22
Subject:
Re: RFC: Add new string comparison macros in handy.h
Message ID:
2861fa97-3633-6add-939b-a798c294b2fc@khwilliamson.com
On 06/02/2017 10:16 AM, Karl Williamson wrote:
> On 06/02/2017 06:46 AM, demerphq wrote:
>> On 2 June 2017 at 13:30, Ævar Arnfjörð Bjarmason <avarab@gmail.com> 
>> wrote:
>>> On Thu, May 11, 2017 at 5:22 PM, Karl Williamson
>>> <public@khwilliamson.com> wrote:
>>>
>>>>      memSTARTS_WITHs
>>>>              Test if the string buffer "s1" with length "l1" begins 
>>>> with the
>>>>              substring given by the string literal "s2", returning 
>>>> non-zero
>>>> if
>>>>              so (including if the two are identical); zero 
>>>> otherwise. The
>>>>              comparison does not include the final "NUL" of "s2". 
>>>> "s1" does
>>>> not
>>>>              have to be "NUL"-terminated,
>>>>
>>>>                      bool    memSTARTS_WITHs(char* s1, STRLEN l1, 
>>>> char* s2)
>>>
>>> I don't have to use these and don't really care, but just a question:
>>> Is there a reason for why the prototype for the the mem* functions
>>> doesn't also pass the STRLEN for the needle as well as the haystack?
>>
>> The whole point of the 's' family macros is to handle cases where one
>> of the arguments is a constant string in the C code, and therefore the
>> length can be computed by the macro. In other words cases like this:
>>
>> STRLEN len;
>> char *pv= SvPV(thing,len);
>>
>> if (memSTARTS_WITHs(pv,len,"someprefix")) { ... }
>>
>> That is why I mentioned the variants I did, which I will relist with
>> better arguments:
>>
>> strIS_EQ(pv,pv)
>> strIS_EQs(pv,"string")
>> strIS_EQls(pv,len,"string")
>> strIS_EQl(pv,len,pv)
>> strIS_EQll(pv,len,pv,len)
>>
>>> Right now the interface only allows the haystack not the needle to
>>> contain \0, which seems like a needless arbitrary limitation for
>>> something that's essentially a fancy strstr() & memmem(). I.e. you
>>> have feature-parity with strstr() (and extra features like "begins
>>> with?"), but not with memmem().
>>
>> With the 's' macros we know the length of the string by using
>> sizeof(). The 's' macros are composed of the STR_WITH_LEN() macro
>> trick:
>>
>> #define STR_WITH_LEN(s) "" s "", sizeof(s)-1
>>
>> the "" s "" thing guarantees the argument is a C string, not a
>> pointer, and the sizeof(s)-1 tells us its length.
>>
>> With the api I proposed in a reply to Karl the 'll' variants would
>> cover the cases you are thinking of.
>>
>> To recap and refine that proposal:
>>
>> (mem|str)IS_(PREFIX_|SUFFIX_)?(EQ|NE|LT|GT|GE|LE)[ls]*
>>
>> More specifically the suffixes would be:
>>
>> '' :  none, both arguments are pv's without a length.
>> 's': second argument is a constant string
>> 'l' : first argument has a length, second argument is a pointer
>> 'ls': first argument has a length, second argument is a constant string
>> 'll':  both arguments are char *'s and have lengths.
>>
>> Not all suffixes would apply to 'mem', but i think they all apply to 
>> 'str'.
>>
>> Whether we should have 'str' at all is a different question.
>>
>> cheers,
>> Yves
>>
>>
> 
> I haven't had a chance to fully evaluate this, but a couple of quick 
> things.
> 
> Yes, we do need 'str'.  There are a bunch of places where the length is 
> not known, and one of the arguments is a C string (and so for most 
> purposes the other argument not being a C string turns out to not affect 
> the result).
> 
> I went through the core, and the existing macros plus the ones I 
> proposed are sufficient to handle the existing cases and make the code 
> much easier to grok without detailed examination.
> 
> They are also easier to program right, as the coder doesn't have to 
> count the length manually.
> 
> The ENDS with is used in a few places, like seeing if a path ends in '.pm'
> 
> I'm not a fan of the trailing 's' in the name mean a literal string. 
> Both arguments are always strings.  I had thought 'l' for 'literal', but 
> you have used that one up.  Maybe 'q' for 'quoted'
> 
> I also have never liked a prefix IS (or 'is').  It's just extra typing 
> that doesn't really help readability.
> 

And, I would oppose deprecating any existing macros.  There's not much 
upside and plenty of downside in doing so.

And we do need to be able to have a proper substring concept.  There are 
places that use that.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About