develooper Front page | perl.perl5.changes | Postings from August 2022

[Perl/perl5] d1907b: regen/mk_invlists.pl - under DEBUG=1 show somepro...

From:
Yves Orton via perl5-changes
Date:
August 3, 2022 09:07
Subject:
[Perl/perl5] d1907b: regen/mk_invlists.pl - under DEBUG=1 show somepro...
Message ID:
Perl/perl5/push/refs/heads/blead/4dd482-182f0b@github.com
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: d1907b9404696dcfd0b4dbd7fe1b07f9beff8585
      https://github.com/Perl/perl5/commit/d1907b9404696dcfd0b4dbd7fe1b07f9beff8585
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - under DEBUG=1 show some progress output


  Commit: 19adc068606205f0a7cb0c4ebcc9bf5d9b153772
      https://github.com/Perl/perl5/commit/19adc068606205f0a7cb0c4ebcc9bf5d9b153772
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c - replace repeated OP(n) with variable 'op'.

Declutter code.


  Commit: c0a7907be0f2bff18c64a6b2f3d09cb88c192dc3
      https://github.com/Perl/perl5/commit/c0a7907be0f2bff18c64a6b2f3d09cb88c192dc3
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c - replace OP(n) macro with variable 'op' in S_dumpuntil()

This declutters the code and allows us to remove the casting as well.

As a byproduct the loop control logic is a bit simplified.


  Commit: f946e55ad047822276d1420651e4dc2d9caf3fce
      https://github.com/Perl/perl5/commit/f946e55ad047822276d1420651e4dc2d9caf3fce
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M pod/perlreguts.pod
    M regcomp.c
    M regcomp.h
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - Make regarglen available as PL_regarglen in regexec.c

In a follow up patch we will use this data from regexec.c which
currently cannot see the variable.

This changes a comment in regen/mk_invlists.pl which necessitated
rebuilding several files related to unicode. Only the hashes associated
with mk_invlists.pl were changed.


  Commit: ec5e6b1346dbbfc24682d87768357653663ef1eb
      https://github.com/Perl/perl5/commit/ec5e6b1346dbbfc24682d87768357653663ef1eb
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - create typedefs for all regnode types

Currently we hard code the struct used by the different regop types.
This makes it awkward to change the structure used by a specific regop
as the struct it uses might be used in many contexts, and each cases
of a regop using that structure must be reviewed to see if it needs
to be changed.

This patch adds a typedef for each regnode. The typedefs are named
'tregnode_OP', for instance 'tregnode_TRIE' is typedefed to 'struct
charclass' (at the time of this commit). This allows the code to do
things like 'sizeof(tregnode_TRIE)' and should the exact struct used
for TRIE regops change in the future then no code need be reviewed
or changed.


  Commit: 0e48b698ea391da32a38a514b2361250e0bc2201
      https://github.com/Perl/perl5/commit/0e48b698ea391da32a38a514b2361250e0bc2201
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M dist/Devel-PPPort/parts/base/5003007
    M embed.fnc
    M embed.h
    M pod/perlreguts.pod
    M proto.h
    M regcomp.c
    M regcomp.h
    M regexec.c

  Log Message:
  -----------
  regcomp.c - rename NEXTOPER to REGNODE_AFTER and related logic

It is really easy to get confused about the difference between
NEXTOPER() and regnext() of a regnode. The two concepts are related,
similar, but importantly distinct. NEXTOPER() is also defined in such a
way that it is easy to abuse and misunderstand and encourages producing
code that is fragile to larger change, effectively "baking in"
assumptions to the code that are difficult to discover by searching.
Changing the type and storage requirements of a regnode may break things
in subtle and hard to debug ways.

An example of how NEXTOPER() is problematic is that this:
NEXTOPER(NEXTOPER(branch)) does not mean "find the second node after the
branch node", it means "jump forward by a regnode which happens to be
two regnodes large". In other words NEXTOPER is just a fancy way of
writing "node+1".

This patch replaces NEXTOPER() with three new macros:

    REGNODE_AFTER_dynamic(node)
    REGNODE_AFTER_opcode(node,op)
    REGNODE_AFTER_type(node,tregnode_OPNAME)

The first is the most generic case, it jumps forward by the size of the
node, and determines that size by consulting OP(node). The second is
where you have already extracted OP(node), and the third is where you
know the actual structure that you want to jump forward by. Every
regnode type has a corresponding type, which is known at compile time,
so using the third will produce the most efficient code. However in many
cases the code operates on one of several types, whose size may be the
same now, but may change in the future, in which case one of the other
forms is preferred. The run time logic in regexec.c should probably
only use the REGNODE_AFTER_type() interface.

Note that there is also a REGNODE_BEFORE() which replaces PREVOPER(),
which is used in a specific piece of legacy logic but should not be
used otherwise. It is not safe to go backwards from an arbitrary node,
we simply have no way to know how large the previous node is and thus
where it starts.

This patch includes some logic that validates assumptions during DEBUG
mode which should catch errors from resizing regnodes.

After this patch changing the size of an existing regnode should be
relatively safe and errors related to sizing should trigger assertion
fails.

This patch includes changes to perlreguts.pod to explain this stuff
better.


  Commit: e794fa41fb8d8ec5b090155365d648e1d3e9daf5
      https://github.com/Perl/perl5/commit/e794fa41fb8d8ec5b090155365d648e1d3e9daf5
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regen/regcomp.pl

  Log Message:
  -----------
  regex engine rename -> reg_off_by_arg


  Commit: 609756f2528a92f7d426129a05d53878dcf547d5
      https://github.com/Perl/perl5/commit/609756f2528a92f7d426129a05d53878dcf547d5
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename reg_off_by_arg to PL_reg_off_by_arg

This is in preparation for a future patch, so we can access
PL_reg_off_by_arg() from an inline function in regexec.c


  Commit: 23dfdc3a51cb5d66ec97245090f7f4fdb61d7334
      https://github.com/Perl/perl5/commit/23dfdc3a51cb5d66ec97245090f7f4fdb61d7334
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regcomp.pl - use the regnode typedefs in EXTRA_SIZE calculations


  Commit: fc1edcc94f2b7d7ea60575eb55b860c759f9cee9
      https://github.com/Perl/perl5/commit/fc1edcc94f2b7d7ea60575eb55b860c759f9cee9
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - fix documentation (add missing PL_ prefix)


  Commit: e33d308b9d9b39f8fb1bf04e82e9d280ca2d11fa
      https://github.com/Perl/perl5/commit/e33d308b9d9b39f8fb1bf04e82e9d280ca2d11fa
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regen/regcomp.pl

  Log Message:
  -----------
  regen/regcomp.pl - add a way to dump the node/state table

For debugging and enhancements, etc.


  Commit: 689eab88ca31da31ff61f38d1aecdd06b9adcaea
      https://github.com/Perl/perl5/commit/689eab88ca31da31ff61f38d1aecdd06b9adcaea
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M pod/perldebguts.pod
    M regcomp.sym
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl - add PL_regargvaries


  Commit: 83ca6c9dc53fc3a29f6f59f535adb99e2d036b6d
      https://github.com/Perl/perl5/commit/83ca6c9dc53fc3a29f6f59f535adb99e2d036b6d
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M pod/perlreguts.pod
    M regcomp.c
    M regcomp.h
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_regarglen to PL_regnode_arg_len


  Commit: 0e4dd64743700131e6045389cf3e22aeaac7871c
      https://github.com/Perl/perl5/commit/0e4dd64743700131e6045389cf3e22aeaac7871c
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_regargvaries to PL_regnode_arg_len_varies


  Commit: 3bfb2e3bfb2415adc56f1e24f6a3b96d49b575b0
      https://github.com/Perl/perl5/commit/3bfb2e3bfb2415adc56f1e24f6a3b96d49b575b0
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regcomp.h
    M regen/regcomp.pl
    M regexec.c
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_regkind to PL_regnode_kind


  Commit: 366fc8089b728e2f722dab2da854445f0a5e1d69
      https://github.com/Perl/perl5/commit/366fc8089b728e2f722dab2da854445f0a5e1d69
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_reg_off_by_arg to PL_regnode_off_by_arg


  Commit: 673824149c1532385e1cf97c397996187265e3fe
      https://github.com/Perl/perl5/commit/673824149c1532385e1cf97c397996187265e3fe
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym
    M regcomp.c
    M regen/regcomp.pl
    M regexec.c
    M regnodes.h

  Log Message:
  -----------
  regex engine - Rename PL_reg_name to PL_regnode_name


  Commit: f37e1724fa45c62f80fd0d1f084a4678740627ed
      https://github.com/Perl/perl5/commit/f37e1724fa45c62f80fd0d1f084a4678740627ed
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M globvar.sym

  Log Message:
  -----------
  globvar.sym - sort PL_reg*


  Commit: 6f83e0eceec58363a6a1027ef9a408c9cf4f28b9
      https://github.com/Perl/perl5/commit/6f83e0eceec58363a6a1027ef9a408c9cf4f28b9
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regcomp.c

  Log Message:
  -----------
  regcomp.c - initial support for EXACTish nodes in regnode_after()

This is a first step, we add the support for EXACTish nodes
here, but we do not use it. In a following commit we will move
it to a new file. This patch is just to keep the move clean.


  Commit: 19a5f8d316d541b9a9adb54c5a918787760100e1
      https://github.com/Perl/perl5/commit/19a5f8d316d541b9a9adb54c5a918787760100e1
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M regcomp.c
    M regcomp.h
    M regexec.c

  Log Message:
  -----------
  regex engine - rename REGNODE_AFTER_dynamic() REGNODE_AFTER()

Now that REGNODE_AFTER() can handle all cases it makes sense
to remove the dynamic() suffix.


  Commit: 1db310d0443729d6e2d04e8cb339e9b79089b2ad
      https://github.com/Perl/perl5/commit/1db310d0443729d6e2d04e8cb339e9b79089b2ad
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M MANIFEST
    M embed.fnc
    M embed.h
    M pod/perlreguts.pod
    M proto.h
    M regcomp.c
    M regcomp.h
    A reginline.h

  Log Message:
  -----------
  regex engine - integrate regnode_after() support for EXACTish nodes

This adds REGNODE_AFTER_varies() which is used when the called *knows*
that the current regnode is variable length. We then use it to handle
EXACTish style nodes as determined by PL_regnode_arg_len_varies.

As part of this patch Perl_regnext() Perl_regnode_after() and
Perl_check_regnode_after() are moved to reginline.h, which is loaded via
regcomp.c only when we are compiling the regex engine.


  Commit: 182f0ba91d539c8b2bea45168fd6e72f9f0073bd
      https://github.com/Perl/perl5/commit/182f0ba91d539c8b2bea45168fd6e72f9f0073bd
  Author: Yves Orton <demerphq@gmail.com>
  Date:   2022-08-03 (Wed, 03 Aug 2022)

  Changed paths:
    M pod/perlreguts.pod
    M regcomp.h

  Log Message:
  -----------
  regex engine - improved comments explaining REGNODE_AFTER()

This rewrites one comment to include more explanation of the difference
between Perl_regnext() and REGNODE_AFTER().


Compare: https://github.com/Perl/perl5/compare/4dd48237e573...182f0ba91d53



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About