develooper Front page | perl.perl5.changes | Postings from September 2019

[perl.git] branch smoke-me/khw-lexact created.v5.31.4-268-g3ea97a0606

Karl Williamson
September 28, 2019 20:07
[perl.git] branch smoke-me/khw-lexact created.v5.31.4-268-g3ea97a0606
Message ID:
In perl.git, the branch smoke-me/khw-lexact has been created


        at  3ea97a06065e9a67bee5b625f93c7f09cae956c0 (commit)

- Log -----------------------------------------------------------------
commit 3ea97a06065e9a67bee5b625f93c7f09cae956c0
Author: Karl Williamson <>
Date:   Sat Sep 28 14:01:41 2019 -0600

    perl.h: Silence warning when compiled with C++
    This silences a warning that the pragma it surrounds is not valid on
    C++.  We don't need to know that, and it clutters the compilation

commit 0684e139ecbc8d90db49227b68f3d38150688fdc
Author: Karl Williamson <>
Date:   Sat Sep 28 11:58:59 2019 -0600

    regex: Add LEXACT_ONLY8 node type
    This is like LEXACT, but it is known that only strings encoded in UTF-8
    will match it, so don't even have to try if that condition isn't met.

commit 98939af56c3520f3536f528dc2f2d27ea4e55904
Author: Karl Williamson <>
Date:   Thu Sep 26 21:38:46 2019 -0600

    regex: Create and handle LEXACT nodes
    See the previous commit for info on these.
    I am not changing trie code to recognize these at this time.

commit d86cba52cd569d4d1710ff7f88f6ec5aabbb8073
Author: Karl Williamson <>
Date:   Wed Sep 25 10:12:32 2019 -0600

    Add regnode LEXACT, for long strings
    This commit adds a new regnode for strings that don't fit in a regular
    one, and adds a structure for that regnode to use.  Actually using them
    is deferred to the next commit.
    This new regnode structure is needed because the previous structure only
    allows for an 8 bit length field, 255 max bytes.  This commit puts the
    length instead in a new field, the same place single-argument regnodes
    put their argument.  Hence this long string is an extra 32 bits of
    overhead, but at no string length is this node ever bigger than the
    combination of the smaller nodes it replaces.
    I also considered simply combining the original 8 bit length field
    (which is now unused) with the first byte of the string field to get a
    16 bit length, and have the actual string be offset by 1.  But I
    rejected that because it would mean the string would usually not be
    aligned, slowing down memory accesses.
    This new LEXACT regnode can hold up to what 1024 regular EXACT ones hold,
    using 4K fewer overhead bytes to do so.  That means it can handle
    strings containing 262000 bytes.  The comments give ideas for expanding
    that should it become necessary or desirable.
    Besides the space advantage, any hardware acceleration in memcmp
    can be done in much bigger chunks, and otherwise the memcmp inner loop
    (often written in assembly) will run many more times in a row, and our
    outer loop that calls it, correspondingly fewer.

commit ccb136a78cfe06305afe9e44433a55b7e8ec5d51
Author: Karl Williamson <>
Date:   Sun Sep 22 16:12:07 2019 -0600

    regcomp.c: Change handling of filled EXACT nodes
    This changes the detection mechanism to, just before we otherwise would
    write to the node, we see if that would be out of bounds, and if so,
    instead break out of the loop to handle a full node.
    This improves the packing of nodes, especially under /i, from the
    previous mechanism.  But more importantly, it set things up so that we
    can potentially increase the node size as we go along.
    This also changes the handling of XXX

commit 0c04670963db3decdfba978014ae409b2b95ccd5
Author: Karl Williamson <>
Date:   Sun Sep 22 15:26:03 2019 -0600

    regcomp.h: Add comments

commit e36b75d9b0818975f71be775d71b228b003fae45
Author: Karl Williamson <>
Date:   Sun Sep 22 15:25:23 2019 -0600

    regcomp.h: Remove obsolete macro
    This is no longer used

commit 25c8cb721d3579b8ee4e99284655740c5d631116
Author: Karl Williamson <>
Date:   Sat Sep 21 14:34:20 2019 -0600

    regcomp.c: Rename three variables
    One of the variables is misnamed,  the upper_fill indicates that the
    node has to be left not completely filled.  Comments will be added in a
    later commit.
    The other two are renamed in preparation for future changes to more
    accurately describe their new purposes.

commit 54c5756b9c03baca809c54f0a9863439a2a3345f
Author: Karl Williamson <>
Date:   Sat Sep 21 13:31:37 2019 -0600

    regcomp.c: White-space only, comments
    Outdent a block that was doubly indented.  Change some other white space
    and fix grammar in a comment

commit 268c999f85630ac8630fd698a59b4448e1c4838d
Author: Karl Williamson <>
Date:   Sat Sep 21 13:24:33 2019 -0600

    regcomp: Use new set macro to store a value
    This is in preparation for the current mechanism in a later commit to
    become a not legal lhs

commit 0ca2ddc66c78d79e6059006b3b23851785c1e233
Author: Karl Williamson <>
Date:   Wed Jun 26 13:02:35 2019 -0600

    XXX Configure


Perl5 Master Repository Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About