develooper Front page | perl.perl5.changes | Postings from October 2020

[Perl/perl5] c2bb0b: regen_lib: Output blanks; not tabs

From:
Karl Williamson via perl5-changes
Date:
October 14, 2020 14:46
Subject:
[Perl/perl5] c2bb0b: regen_lib: Output blanks; not tabs
Message ID:
Perl/perl5/push/refs/heads/blead/206c20-526e4b@github.com
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: c2bb0b9930e2c73d35f279e89dd2a241de96e887
      https://github.com/Perl/perl5/commit/c2bb0b9930e2c73d35f279e89dd2a241de96e887
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regen/regen_lib.pl

  Log Message:
  -----------
  regen_lib: Output blanks; not tabs

This makes it easier to calculate widths; and our policy is to not use
tabs anyway.


  Commit: 519d76f5929997820de5bb942a6e6be7f1bf60bd
      https://github.com/Perl/perl5/commit/519d76f5929997820de5bb942a6e6be7f1bf60bd
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regen/regcomp.pl

  Log Message:
  -----------
  regen/regcomp.pl: Change variable name

The more specific name this is changed to will make code clearer in
future commits.


  Commit: ce553cf576f837c8b843c307e2e1c957d8bab24d
      https://github.com/Perl/perl5/commit/ce553cf576f837c8b843c307e2e1c957d8bab24d
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regen/regcomp.pl
    M regnodes.h

  Log Message:
  -----------
  regen/regcomp.pl: Generate #defines for UTF8ness

This causes #defines to be generated for regexec.c to use in switch
statements, so that for each opcode there that is a case: there are
actually 4 cases, for the the target being UTF-8 or not, combined with
the pattern being UTF-8 or not.

This will be used in future commits to simplify things.


  Commit: dd8dc88c6c318c49836493d65c4faf0e5ede57b2
      https://github.com/Perl/perl5/commit/dd8dc88c6c318c49836493d65c4faf0e5ede57b2
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: S_find_byclass(): utf8ness in switch()

This uses the #defines created in the previous commit to make the switch
statement in this function incorporate the UTF8ness of both the pattern
and the target string.

The reason for this is that the first statement in nearly every case of
the switch is to test if the target string being matched is UTF-8 or
not.  By putting that information into the the case number, those
conditionals can be eliminated, leading to cleaner, more modular code.
I had hoped that this would also improve performance since there are
fewer conditionals, but Sergey Aleynikov did performance testing of this
change for me, and found no real noticeable gain nor loss.

Further, the cases involving matching EXACTish nodes have to also test
if the pattern is UTF-8 or not before doing anything else.  I added that
information as well to the case number, so that those conditionals can
be eliminated.  For the non-EXACTish nodes, it simply means that that
two case statements execute the same code.

This is an intermediate commit, which only does the expansion of the
current cases into four for each.  The refactoring that takes advantage
of this is in the following commit.


  Commit: 56ff0609361466f7eb706d56bdaf69e44342c2e1
      https://github.com/Perl/perl5/commit/56ff0609361466f7eb706d56bdaf69e44342c2e1
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: find_byclass(): Restructure

This is a follow-on to the previous commit.  The case number of the main
switch statement now includes three things: the regnode op, the UTF8ness
of the target, and the UTF8ness of the pattern.

This allows the conditionals within the previous cases (which only
encoded the op), to be removed, and things to be moved around so that
there is more fall throughs and fewer gotos, and the macros that are
called no longer have to test for UTF8ness; so I teased the UTF8 ones
apart from the non_UTF8 ones.


  Commit: 25f81fd589673867331d0217a5c6ef17ed4d2e70
      https://github.com/Perl/perl5/commit/25f81fd589673867331d0217a5c6ef17ed4d2e70
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Rename a static variable

This is to distinguish it from a similar variable being added in a
future commit


  Commit: b272adb45fa3fca1b787d7ff479196523e7d6336
      https://github.com/Perl/perl5/commit/b272adb45fa3fca1b787d7ff479196523e7d6336
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Macroize a common paradigm


  Commit: 526e4b9dadeed7e61d0154aceb7927fb02dd85bb
      https://github.com/Perl/perl5/commit/526e4b9dadeed7e61d0154aceb7927fb02dd85bb
  Author: Karl Williamson <khw@cpan.org>
  Date:   2020-10-14 (Wed, 14 Oct 2020)

  Changed paths:
    M regexec.c

  Log Message:
  -----------
  regexec.c: Macroize another common paradigm


Compare: https://github.com/Perl/perl5/compare/206c207c12a8...526e4b9dadee



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About