Front page | perl.perl5.porters |
Postings from June 2017
Re: RFC: Add new string comparison macros in handy.h
Thread Previous
|
Thread Next
From:
demerphq
Date:
June 3, 2017 00:47
Subject:
Re: RFC: Add new string comparison macros in handy.h
Message ID:
CANgJU+Xy25BLsJq2xGiOPrCVodUP_8ZZSf2dFYGOx8j1axMshA@mail.gmail.com
Reposting as I replied to karl directly and forgot the list.
On 2 June 2017 at 18:16, Karl Williamson <public@khwilliamson.com> wrote:
> On 06/02/2017 06:46 AM, demerphq wrote:
>>
>> On 2 June 2017 at 13:30, Ãvar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>>
>>> On Thu, May 11, 2017 at 5:22 PM, Karl Williamson
>>> <public@khwilliamson.com> wrote:
>>>
>>>> memSTARTS_WITHs
>>>> Test if the string buffer "s1" with length "l1" begins with
>>>> the
>>>> substring given by the string literal "s2", returning
>>>> non-zero
>>>> if
>>>> so (including if the two are identical); zero otherwise.
>>>> The
>>>> comparison does not include the final "NUL" of "s2". "s1"
>>>> does
>>>> not
>>>> have to be "NUL"-terminated,
>>>>
>>>> bool memSTARTS_WITHs(char* s1, STRLEN l1, char*
>>>> s2)
>>>
>>>
>>> I don't have to use these and don't really care, but just a question:
>>> Is there a reason for why the prototype for the the mem* functions
>>> doesn't also pass the STRLEN for the needle as well as the haystack?
>>
>>
>> The whole point of the 's' family macros is to handle cases where one
>> of the arguments is a constant string in the C code, and therefore the
>> length can be computed by the macro. In other words cases like this:
>>
>> STRLEN len;
>> char *pv= SvPV(thing,len);
>>
>> if (memSTARTS_WITHs(pv,len,"someprefix")) { ... }
>>
>> That is why I mentioned the variants I did, which I will relist with
>> better arguments:
>>
>> strIS_EQ(pv,pv)
>> strIS_EQs(pv,"string")
>> strIS_EQls(pv,len,"string")
>> strIS_EQl(pv,len,pv)
>> strIS_EQll(pv,len,pv,len)
>>
>>> Right now the interface only allows the haystack not the needle to
>>> contain \0, which seems like a needless arbitrary limitation for
>>> something that's essentially a fancy strstr() & memmem(). I.e. you
>>> have feature-parity with strstr() (and extra features like "begins
>>> with?"), but not with memmem().
>>
>>
>> With the 's' macros we know the length of the string by using
>> sizeof(). The 's' macros are composed of the STR_WITH_LEN() macro
>> trick:
>>
>> #define STR_WITH_LEN(s) "" s "", sizeof(s)-1
>>
>> the "" s "" thing guarantees the argument is a C string, not a
>> pointer, and the sizeof(s)-1 tells us its length.
>>
>> With the api I proposed in a reply to Karl the 'll' variants would
>> cover the cases you are thinking of.
>>
>> To recap and refine that proposal:
>>
>> (mem|str)IS_(PREFIX_|SUFFIX_)?(EQ|NE|LT|GT|GE|LE)[ls]*
>>
>> More specifically the suffixes would be:
>>
>> '' : none, both arguments are pv's without a length.
>> 's': second argument is a constant string
>> 'l' : first argument has a length, second argument is a pointer
>> 'ls': first argument has a length, second argument is a constant string
>> 'll': both arguments are char *'s and have lengths.
>>
>> Not all suffixes would apply to 'mem', but i think they all apply to
>> 'str'.
>>
>> Whether we should have 'str' at all is a different question.
>>
>> cheers,
>> Yves
>>
>>
>
> I haven't had a chance to fully evaluate this, but a couple of quick things.
>
> Yes, we do need 'str'. There are a bunch of places where the length is not
> known, and one of the arguments is a C string (and so for most purposes the
> other argument not being a C string turns out to not affect the result).
Ok. Thanks!
> I went through the core, and the existing macros plus the ones I proposed
> are sufficient to handle the existing cases and make the code much easier to
> grok without detailed examination.
>
> They are also easier to program right, as the coder doesn't have to count
> the length manually.
I think there is broad consensus on this.
> The ENDS with is used in a few places, like seeing if a path ends in '.pm'
Fair enough.
> I'm not a fan of the trailing 's' in the name mean a literal string. Both
> arguments are always strings. I had thought 'l' for 'literal', but you have
> used that one up. Maybe 'q' for 'quoted'
>
> I also have never liked a prefix IS (or 'is'). It's just extra typing that
> doesn't really help readability.
I just proposed that because we have the existing macros which have
inconsistent interfaces, particularly the mem() ones, and I would like
us to have a nice clean and *complete* set of rules to use so its
easy to remember which one does what. (That is a key part of my
proposal).
I don't have much of an axe to grind about the 's', but I do think
brevity is useful in this context.
cheers,
Yves
Thread Previous
|
Thread Next