develooper Front page | perl.perl5.changes | Postings from August 2022

[Perl/perl5] 539ba0: regen/mk_invlists.pl - under DEBUG=1 show somepro...

From:
Yves Orton via perl5-changes
Date:
August 1, 2022 12:09
Subject:
[Perl/perl5] 539ba0: regen/mk_invlists.pl - under DEBUG=1 show somepro...
Message ID:
Perl/perl5/push/refs/heads/yves/regnode_typedefs/20c016-b8c2ff@github.com
  Branch: refs/heads/yves/regnode_typedefs
  Home:   https://github.com/Perl/perl5
  Commit: 539ba001547fc7bbd57700657dece45ae8996799
      https://github.com/Perl/perl5/commit/539ba001547fc7bbd57700657dece45ae8996799
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - under DEBUG=1 show some progress output


  Commit: 16ef862cf922219aa5121facc5d8d87d51674e9e
      https://github.com/Perl/perl5/commit/16ef862cf922219aa5121facc5d8d87d51674e9e
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c - replace repeated OP(n) with variable 'op'.

Declutter code.


  Commit: 115ea8c00f7859b47a9f69bba13405717b95fdee
      https://github.com/Perl/perl5/commit/115ea8c00f7859b47a9f69bba13405717b95fdee
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c - replace OP(n) macro with variable 'op' in S_dumpuntil()

This declutters the code and allows us to remove the casting as well.

As a byproduct the loop control logic is a bit simplified.


  Commit: bb7fb5fa08a0b0716ed8c3d512f1e7733875a171
      https://github.com/Perl/perl5/commit/bb7fb5fa08a0b0716ed8c3d512f1e7733875a171
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M pod/perlreguts.pod
    M regcomp.c
    M regcomp.h
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - Make regarglen available as PL_regarglen in regexec.c

In a follow up patch we will use this data from regexec.c which
currently cannot see the variable.

This changes a comment in regen/mk_invlists.pl which necessitated
rebuilding several files related to unicode. Only the hashes associated
with mk_invlists.pl were changed.


  Commit: e2e04fd93133a445881d8b97f2e35ad6d152d069
      https://github.com/Perl/perl5/commit/e2e04fd93133a445881d8b97f2e35ad6d152d069
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - create typedefs for all regnode types

Currently we hard code the struct used by the different regop types.
This makes it awkward to change the structure used by a specific regop
as the struct it uses might be used in many contexts, and each cases
of a regop using that structure must be reviewed to see if it needs
to be changed.

This patch adds a typedef for each regnode. The typedefs are named
'tregnode_OP', for instance 'tregnode_TRIE' is typedefed to 'struct
charclass' (at the time of this commit). This allows the code to do
things like 'sizeof(tregnode_TRIE)' and should the exact struct used
for TRIE regops change in the future then no code need be reviewed
or changed.


  Commit: e82c63d11ac162d34dafc62ae95af43e4c70b3dd
      https://github.com/Perl/perl5/commit/e82c63d11ac162d34dafc62ae95af43e4c70b3dd
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M dist/Devel-PPPort/parts/base/5003007
    M embed.fnc
    M embed.h
    M pod/perlreguts.pod
    M proto.h
    M regcomp.c
    M regcomp.h
    M regexec.c

  Log Message:
  -----------
  regcomp.c - rename NEXTOPER to REGNODE_AFTER and related logic

It is really easy to get confused about the difference between
NEXTOPER() and regnext() of a regnode. The two concepts are related,
similar, but importantly distinct. NEXTOPER() is also defined in such a
way that it is easy to abuse and misunderstand and encourages producing
code that is fragile to larger change, effectively "baking in"
assumptions to the code that are difficult to discover by searching.
Changing the type and storage requirements of a regnode may break things
in subtle and hard to debug ways.

An example of how NEXTOPER() is problematic is that this:
NEXTOPER(NEXTOPER(branch)) does not mean "find the second node after the
branch node", it means "jump forward by a regnode which happens to be
two regnodes large". In other words NEXTOPER is just a fancy way of
writing "node+1".

This patch replaces NEXTOPER() with three new macros:

    REGNODE_AFTER_dynamic(node)
    REGNODE_AFTER_opcode(node,op)
    REGNODE_AFTER_type(node,tregnode_OPNAME)

The first is the most generic case, it jumps forward by the size of the
node, and determines that size by consulting OP(node). The second is
where you have already extracted OP(node), and the third is where you
know the actual structure that you want to jump forward by. Every
regnode type has a corresponding type, which is known at compile time,
so using the third will produce the most efficient code. However in many
cases the code operates on one of several types, whose size may be the
same now, but may change in the future, in which case one of the other
forms is preferred. The run time logic in regexec.c should probably
only use the REGNODE_AFTER_type() interface.

Note that there is also a REGNODE_BEFORE() which replaces PREVOPER(),
which is used in a specific piece of legacy logic but should not be
used otherwise. It is not safe to go backwards from an arbitrary node,
we simply have no way to know how large the previous node is and thus
where it starts.

This patch includes some logic that validates assumptions during DEBUG
mode which should catch errors from resizing regnodes.

After this patch changing the size of an existing regnode should be
relatively safe and errors related to sizing should trigger assertion
fails.

This patch includes changes to perlreguts.pod to explain this stuff
better.


  Commit: 1dec042f18a654b2d979b52fff423770600fc703
      https://github.com/Perl/perl5/commit/1dec042f18a654b2d979b52fff423770600fc703
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regen/regcomp.pl

  Log Message:
  -----------
  regex engine rename -> reg_off_by_arg


  Commit: f7945ff85b10fde536b96801497d87116b115069
      https://github.com/Perl/perl5/commit/f7945ff85b10fde536b96801497d87116b115069
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename reg_off_by_arg to PL_reg_off_by_arg

This is in preparation for a future patch, so we can access
PL_reg_off_by_arg() from an inline function in regexec.c


  Commit: 7380aceea95c355a53c029f80bf77650b898dc42
      https://github.com/Perl/perl5/commit/7380aceea95c355a53c029f80bf77650b898dc42
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regcomp.pl - use the regnode typedefs in EXTRA_SIZE calculations


  Commit: 0249e0e8e5688f8f7fea1f140ed3ce3656c5231a
      https://github.com/Perl/perl5/commit/0249e0e8e5688f8f7fea1f140ed3ce3656c5231a
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - fix documentation (add missing PL_ prefix)


  Commit: c4ea44c2f1b94d491ecd32dec360a199dfb01abe
      https://github.com/Perl/perl5/commit/c4ea44c2f1b94d491ecd32dec360a199dfb01abe
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regen/regcomp.pl

  Log Message:
  -----------
  regen/regcomp.pl - add a way to dump the node/state table

For debugging and enhancements, etc.


  Commit: 5c45ba54fd68d333375e64be8bc4e62ae178336b
      https://github.com/Perl/perl5/commit/5c45ba54fd68d333375e64be8bc4e62ae178336b
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M pod/perldebguts.pod
    M regcomp.sym
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - add PL_regargvaries


  Commit: e4e5c46ae87bf1024ca3210a58d4ffb8bc8bcf0d
      https://github.com/Perl/perl5/commit/e4e5c46ae87bf1024ca3210a58d4ffb8bc8bcf0d
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M pod/perlreguts.pod
    M regcomp.c
    M regcomp.h
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_regarglen to PL_regnode_arg_len


  Commit: a04fb410a01a5841c6f2d29fbc4984a9d5d6268d
      https://github.com/Perl/perl5/commit/a04fb410a01a5841c6f2d29fbc4984a9d5d6268d
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_regargvaries to PL_regnode_arg_len_varies


  Commit: 9d6aa64760226a1ef94774829c48af24772f0871
      https://github.com/Perl/perl5/commit/9d6aa64760226a1ef94774829c48af24772f0871
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regcomp.h
    M regen/regcomp.pl
    M regexec.c
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_regkind to PL_regnode_kind


  Commit: 5ef5497303b17b6d8d75dc89ff768aacc416a48e
      https://github.com/Perl/perl5/commit/5ef5497303b17b6d8d75dc89ff768aacc416a48e
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_reg_off_by_arg to PL_regnode_off_by_arg


  Commit: 5cf44af180c69a87be22be918496eff634158470
      https://github.com/Perl/perl5/commit/5cf44af180c69a87be22be918496eff634158470
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regen/regcomp.pl
    M regexec.c
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_reg_name to PL_regnode_name


  Commit: 11b7e546d087bc980ddf7f9523643369b0de2ec6
      https://github.com/Perl/perl5/commit/11b7e546d087bc980ddf7f9523643369b0de2ec6
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M globvar.sym

  Log Message:
  -----------
  globvar.sym - sort PL_reg*


  Commit: 28cf12226be0d74fc245dda4bd72ce55cbfdd9c8
      https://github.com/Perl/perl5/commit/28cf12226be0d74fc245dda4bd72ce55cbfdd9c8
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c - initial support for EXACTish nodes in regnode_after()

This is a first step, we add the support for EXACTish nodes
here, but we do not use it. In a following commit we will move
it to a new file. This patch is just to keep the move clean.


  Commit: 4967dd778e3286ca60699c4c3bfc1c13d608022b
      https://github.com/Perl/perl5/commit/4967dd778e3286ca60699c4c3bfc1c13d608022b
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M regcomp.c
    M regcomp.h
    M regexec.c

  Log Message:
  -----------
  regex engine - rename REGNODE_AFTER_dynamic() REGNODE_AFTER()

Now that REGNODE_AFTER() can handle all cases it makes sense
to remove the dynamic() suffix.


  Commit: 632971a7db4fd9bc7eb2e9b9974c1b9ecf814bb3
      https://github.com/Perl/perl5/commit/632971a7db4fd9bc7eb2e9b9974c1b9ecf814bb3
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M MANIFEST
    M embed.fnc
    M embed.h
    M pod/perlreguts.pod
    M proto.h
    M regcomp.c
    M regcomp.h
    A reginline.h

  Log Message:
  -----------
  regex engine - integrate regnode_after() support for EXACTish nodes

This adds REGNODE_AFTER_varies() which is used when the called *knows*
that the current regnode is variable length. We then use it to handle
EXACTish style nodes as determined by PL_regnode_arg_len_varies.

As part of this patch Perl_regnext() Perl_regnode_after() and
Perl_check_regnode_after() are moved to reginline.h, which is loaded via
regcomp.c only when we are compiling the regex engine.


  Commit: b8c2ff3e6526cf627e770b0d3955deacd9745982
      https://github.com/Perl/perl5/commit/b8c2ff3e6526cf627e770b0d3955deacd9745982
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-01 (Mon, 01 Aug 2022)

  Changed paths:
    M pod/perlreguts.pod
    M regcomp.h

  Log Message:
  -----------
  regex engine - improved comments explaining REGNODE_AFTER()

This rewrites one comment to include more explanation of the difference
between Perl_regnext() and REGNODE_AFTER().


Compare: https://github.com/Perl/perl5/compare/20c0168cd6ea...b8c2ff3e6526



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About