develooper Front page | perl.perl5.changes | Postings from June 2012

[perl.git] branch blead, updated. v5.17.1-258-g61984ee

From:
Karl Williamson
Date:
June 29, 2012 21:23
Subject:
[perl.git] branch blead, updated. v5.17.1-258-g61984ee
Message ID:
E1SkpDD-00069t-TU@camel.ams6.corp.booking.com
In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/61984ee1c56aaa8a989b7eed4cbc2effd74177c5?hp=94b67eb26513907eccd2427a005de4d512e8a127>

- Log -----------------------------------------------------------------
commit 61984ee1c56aaa8a989b7eed4cbc2effd74177c5
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 13:30:36 2012 -0600

    perlguts: Document that PV can point to non-string

M	pod/perlguts.pod

commit 3a64b5154fffec75126d34d25954f0aef30d9f8a
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Jun 27 16:24:43 2012 -0600

    regcomp.c: Optimize /[0-9]/ into /\d/a
    
    The commonly used [0-9] can be optimized into a smaller, faster node
    that means the same thing.

M	regcomp.c

commit 3172e3fd885a9c54105d3b6156f18dc761fe29e5
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Jun 27 14:43:41 2012 -0600

    regcomp.c: Optimize e.g., /[^\w]/, /[[^:word:]]/ into /\W/
    
    This optimizes character classes that have a single element that is one
    of the ops that have the same meaning outside (namely \d, \h, \s, \w,
    \v, :word:, :digit: and their complements) to that op.  Those
    ops take less space than a character class and run faster.   An initial
    '^' for complementing the class is also handled.

M	regcomp.c
M	regcomp.sym

commit 693fefec6759ebf0a9ec40a0f59346d86831349c
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Jun 27 13:48:16 2012 -0600

    regcomp.c: Simply some node calculations
    
    For the node types that have differing versions depending on the
    character set regex modifiers, /d, /l, /u, /a, and /aa, we can use the
    enum values as offsets from the base node number to derive the correct
    one.  This eliminates a number of tests.
    
    Because there is no DIGITU node type, I added placeholders for it (and
    NDIGITU) to avoid some special casing of it (more important in future
    commits).  We currently have many available node types, so can afford to
    waste these two.

M	op_reg_common.h
M	regcomp.c
M	regcomp.sym
M	regnodes.h

commit 8c1182fda8158a86281b1ea6464176d1c68f2f18
Author: Karl Williamson <public@khwilliamson.com>
Date:   Wed Jun 27 13:28:13 2012 -0600

    regcomp.sym: Reorder a couple of nodes
    
    This causes all the nodes that depend on the regex modifier, BOUND,
    BOUNDL, etc. to have the same relative ordering.  This will enable a
    future commit to simplify generation of the correct node.

M	regcomp.sym
M	regnodes.h

commit 31ae3604e91b534f99f9dd92647e555601952cf2
Author: Karl Williamson <public@khwilliamson.com>
Date:   Tue Jun 26 18:14:23 2012 -0600

    reg_fold.t: Make test cases non-optimizable away
    
    This commit changes the bracketed character classes to include a
    non-related character.  This is in preparation for a future commit which
    would cause the current character classes to be optimized into EXACTish
    nodes which would start passing TODO tests, but don't fix the underlying
    problem with character classes.  That bug is that you can't split a
    multi-char fold across nodes. It probably is not fixable in Perl without
    a total restructuring of the regular expression mechanism.  For example,
    "\N{LATIN SMALL LIGATURE FFI}" doesn't match /[f][f][i]/i.  But it would
    if those got optimized into a single EXACTF node.  (The problem is not
    limited to character classes, /(f)(f)(i)/i also doesn't match, and
    can't, as $1, $2, and $3 are not well-defined.)

M	t/re/reg_fold.t

commit ea364ff596d82b2599af75ca11c936a786c68ea9
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Jun 24 14:16:44 2012 -0600

    regcomp.c: Simplify compile time [^..] complement
    
    This simply moves the code that populates the bitmap and combines the
    two inversion lists to after the inversion (the differences are shown
    much greater than there really are, since a move is done.)  This greatly
    simplifies complementing the character class.

M	regcomp.c

commit cfbb2758d67aedfe8cfc4682385ae11a84a7a7c4
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sun Jun 24 14:02:48 2012 -0600

    regcomp.c: Rename variable to reflect new purpose
    
    This variable really holds the list of all code points the bracketed
    character class matches; it's not just the ones not in the bitmap.

M	regcomp.c

commit c2df36c4545a01ce4682675cf3feb5a42463b03f
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 21:25:36 2012 -0600

    regcomp.c: Have a subroutine do the work
    
    Since this code was originally written, the fold function has added
    input flags that cause it to do the same thing this code does.  So do it
    in the subroutine.

M	regcomp.c

commit 3e89468b103b7ba52e5b0b098b16444b3f3c9fc5
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 15:48:42 2012 -0600

    regcomp.c: Remove obsolete code
    
    A previous commit has removed all calls to these two functions (moving a
    large portion of the bit_fold() one to another place, and no longer sets
    the variable.

M	embed.fnc
M	embed.h
M	proto.h
M	regcomp.c

commit 8f850557b51d83272e1afa15860f3f043b36e3c7
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 14:19:02 2012 -0600

    regcomp.c: White-space, comments only
    
    This indents, outdents previous code, based on new/removed outer blocks.
    It reflows comments and code to fit into 80 columns, add/removes blank
    lines, minor comment rewording

M	regcomp.c

commit a30585c71699142f26d0acd91456fddcae948304
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 15:24:38 2012 -0600

    regcomp.c: Remove unnecessary 'if' test
    
    A previous commit has refactored things, so this test is always true

M	regcomp.c

commit 68823f48ffedb1e9641d519d6045b2c0a9fc80ce
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 15:00:26 2012 -0600

    regcomp.c: Use more inversion lists in [] char classes
    
    This changes the building of bracketed character classes to use
    inversion lists instead of a bitmap/inversion list combination.
    
    This will lead in later commits to simplification and extending
    optimizations to beyond the Latin1 range.

M	regcomp.c

commit bdd8600f35ec7851722b0fe8b4902e0e04ab2800
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 12:57:54 2012 -0600

    handy.h: Fix isBLANK_uni and isBLANK_utf8
    
    These macros have never worked outside the Latin1 range, so this extends
    them to work.
    
    There are no tests I could find for things in handy.h, except that many
    of them are called all over the place during the normal course of
    events.  This commit adds a new file for such testing, containing for
    now only with a few tests for the isBLANK's

M	MANIFEST
M	embed.fnc
M	embed.h
M	embedvar.h
M	ext/XS-APItest/APItest.pm
M	ext/XS-APItest/APItest.xs
A	ext/XS-APItest/t/handy.t
M	handy.h
M	intrpvar.h
M	perl.c
M	proto.h
M	sv.c
M	utf8.c

commit f74da94c18a7b3cbdb577015ae60665509e912e8
Author: Karl Williamson <public@khwilliamson.com>
Date:   Sat Jun 23 12:03:42 2012 -0600

    no_utf8_pm.t: Add blank between 'not' and 'ok' in .t

M	t/re/no_utf8_pm.t
-----------------------------------------------------------------------

Summary of changes:
 MANIFEST                  |    1 +
 embed.fnc                 |    4 +-
 embed.h                   |    4 +-
 embedvar.h                |    1 +
 ext/XS-APItest/APItest.pm |    2 +-
 ext/XS-APItest/APItest.xs |   14 +
 ext/XS-APItest/t/handy.t  |   14 +
 handy.h                   |    7 +-
 intrpvar.h                |    1 +
 op_reg_common.h           |    4 +-
 perl.c                    |    2 +
 pod/perlguts.pod          |   11 +-
 proto.h                   |   26 +-
 regcomp.c                 | 1212 +++++++++++++++++++++------------------------
 regcomp.sym               |   22 +-
 regnodes.h                |  308 ++++++------
 sv.c                      |    1 +
 t/re/no_utf8_pm.t         |    2 +-
 t/re/reg_fold.t           |    6 +-
 utf8.c                    |   24 +
 20 files changed, 831 insertions(+), 835 deletions(-)
 create mode 100644 ext/XS-APItest/t/handy.t

diff --git a/MANIFEST b/MANIFEST
index 38f5da2..6f0e95e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -3972,6 +3972,7 @@ ext/XS-APItest/t/gv_fetchmeth_autoload.t	XS::APItest: tests for gv_fetchmeth_aut
 ext/XS-APItest/t/gv_fetchmethod_flags.t	XS::APItest: tests for gv_fetchmethod_flags() and variants
 ext/XS-APItest/t/gv_fetchmeth.t		XS::APItest: tests for gv_fetchmeth() and variants
 ext/XS-APItest/t/gv_init.t	XS::APItest: tests for gv_init and variants
+ext/XS-APItest/t/handy.t	XS::APItest: tests for handy.h
 ext/XS-APItest/t/hash.t		XS::APItest: tests for hash related APIs
 ext/XS-APItest/t/keyword_multiline.t	test keyword plugin parsing across lines
 ext/XS-APItest/t/keyword_plugin.t	test keyword plugin mechanism
diff --git a/embed.fnc b/embed.fnc
index c16dde8..6976ab6 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -594,6 +594,7 @@ ApPR	|bool	|is_uni_alnum	|UV c
 ApPR	|bool	|is_uni_idfirst	|UV c
 ApPR	|bool	|is_uni_alpha	|UV c
 ApPR	|bool	|is_uni_ascii	|UV c
+ApPR	|bool	|is_uni_blank	|UV c
 ApPR	|bool	|is_uni_space	|UV c
 ApPR	|bool	|is_uni_cntrl	|UV c
 ApPR	|bool	|is_uni_graph	|UV c
@@ -645,6 +646,7 @@ ApR	|bool	|is_utf8_idcont	|NN const U8 *p
 ApR	|bool	|is_utf8_xidcont	|NN const U8 *p
 ApR	|bool	|is_utf8_alpha	|NN const U8 *p
 ApR	|bool	|is_utf8_ascii	|NN const U8 *p
+ApR	|bool	|is_utf8_blank	|NN const U8 *p
 ApR	|bool	|is_utf8_space	|NN const U8 *p
 ApR	|bool	|is_utf8_perl_space	|NN const U8 *p
 ApR	|bool	|is_utf8_perl_word	|NN const U8 *p
@@ -1050,8 +1052,6 @@ Ap	|SV*	|regclass_swash	|NULLOK const regexp *prog \
 				|NN const struct regnode *node|bool doinit \
 				|NULLOK SV **listsvp|NULLOK SV **altsvp
 #ifdef PERL_IN_REGCOMP_C
-EMi	|U8	|set_regclass_bit|NN struct RExC_state_t* pRExC_state|NN regnode* node|const U8 value|NN SV** invlist_ptr|NN AV** alternate_ptr
-EMs	|U8	|set_regclass_bit_fold|NN struct RExC_state_t *pRExC_state|NN regnode* node|const U8 value|NN SV** invlist_ptr|NN AV** alternate_ptr
 EMs	|void	|add_alternate	|NN AV** alternate_ptr|NN U8* string|STRLEN len
 EMsR	|SV*	|_new_invlist_C_array|NN UV* list
 #endif
diff --git a/embed.h b/embed.h
index 720e253..0a4d76a 100644
--- a/embed.h
+++ b/embed.h
@@ -226,6 +226,7 @@
 #define is_uni_alpha_lc(a)	Perl_is_uni_alpha_lc(aTHX_ a)
 #define is_uni_ascii(a)		Perl_is_uni_ascii(aTHX_ a)
 #define is_uni_ascii_lc(a)	Perl_is_uni_ascii_lc(aTHX_ a)
+#define is_uni_blank(a)		Perl_is_uni_blank(aTHX_ a)
 #define is_uni_cntrl(a)		Perl_is_uni_cntrl(aTHX_ a)
 #define is_uni_cntrl_lc(a)	Perl_is_uni_cntrl_lc(aTHX_ a)
 #define is_uni_digit(a)		Perl_is_uni_digit(aTHX_ a)
@@ -249,6 +250,7 @@
 #define is_utf8_alnum(a)	Perl_is_utf8_alnum(aTHX_ a)
 #define is_utf8_alpha(a)	Perl_is_utf8_alpha(aTHX_ a)
 #define is_utf8_ascii(a)	Perl_is_utf8_ascii(aTHX_ a)
+#define is_utf8_blank(a)	Perl_is_utf8_blank(aTHX_ a)
 #define is_utf8_char		Perl_is_utf8_char
 #define is_utf8_char_buf	Perl_is_utf8_char_buf
 #define is_utf8_cntrl(a)	Perl_is_utf8_cntrl(aTHX_ a)
@@ -945,8 +947,6 @@
 #define reguni(a,b,c)		S_reguni(aTHX_ a,b,c)
 #define regwhite		S_regwhite
 #define scan_commit(a,b,c,d)	S_scan_commit(aTHX_ a,b,c,d)
-#define set_regclass_bit(a,b,c,d,e)	S_set_regclass_bit(aTHX_ a,b,c,d,e)
-#define set_regclass_bit_fold(a,b,c,d,e)	S_set_regclass_bit_fold(aTHX_ a,b,c,d,e)
 #define study_chunk(a,b,c,d,e,f,g,h,i,j,k)	S_study_chunk(aTHX_ a,b,c,d,e,f,g,h,i,j,k)
 #  endif
 #  if defined(PERL_IN_REGCOMP_C) || defined(PERL_IN_REGEXEC_C) || defined(PERL_IN_UTF8_C)
diff --git a/embedvar.h b/embedvar.h
index 3922855..98efa6f 100644
--- a/embedvar.h
+++ b/embedvar.h
@@ -368,6 +368,7 @@
 #define PL_utf8_X_prepend	(vTHX->Iutf8_X_prepend)
 #define PL_utf8_alnum		(vTHX->Iutf8_alnum)
 #define PL_utf8_alpha		(vTHX->Iutf8_alpha)
+#define PL_utf8_blank		(vTHX->Iutf8_blank)
 #define PL_utf8_digit		(vTHX->Iutf8_digit)
 #define PL_utf8_foldable	(vTHX->Iutf8_foldable)
 #define PL_utf8_foldclosures	(vTHX->Iutf8_foldclosures)
diff --git a/ext/XS-APItest/APItest.pm b/ext/XS-APItest/APItest.pm
index 0eff22e..929bf49 100644
--- a/ext/XS-APItest/APItest.pm
+++ b/ext/XS-APItest/APItest.pm
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use Carp;
 
-our $VERSION = '0.40';
+our $VERSION = '0.41';
 
 require XSLoader;
 
diff --git a/ext/XS-APItest/APItest.xs b/ext/XS-APItest/APItest.xs
index 69b7066..8138ad5 100644
--- a/ext/XS-APItest/APItest.xs
+++ b/ext/XS-APItest/APItest.xs
@@ -3456,3 +3456,17 @@ test_get_vtbl()
 	RETVAL = PTR2UV(get_vtbl(-1));
     OUTPUT:
 	RETVAL
+
+bool
+test_isBLANK_uni(UV ord)
+    CODE:
+        RETVAL = isBLANK_uni(ord);
+    OUTPUT:
+        RETVAL
+
+bool
+test_isBLANK_utf8(char * p)
+    CODE:
+        RETVAL = isBLANK_utf8((U8 *) p);
+    OUTPUT:
+        RETVAL
diff --git a/ext/XS-APItest/t/handy.t b/ext/XS-APItest/t/handy.t
new file mode 100644
index 0000000..48eb5b9
--- /dev/null
+++ b/ext/XS-APItest/t/handy.t
@@ -0,0 +1,14 @@
+#!perl -w
+
+use strict;
+use Test::More;
+
+use XS::APItest;
+
+ok(test_isBLANK_uni(ord("\N{EM SPACE}")), "EM SPACE is blank in isBLANK_uni()");
+ok(test_isBLANK_utf8("\N{EM SPACE}"), "EM SPACE is blank in isBLANK_utf8()");
+
+ok(! test_isBLANK_uni(ord("\N{GREEK DASIA}")), "GREEK DASIA is not a blank in isBLANK_uni()");
+ok(! test_isBLANK_utf8("\N{GREEK DASIA}"), "GREEK DASIA is not a blank in isBLANK_utf8()");
+
+done_testing;
diff --git a/handy.h b/handy.h
index abfc2c2..198ea0c 100644
--- a/handy.h
+++ b/handy.h
@@ -912,6 +912,7 @@ EXTCONST U32 PL_charclass[];
 /* Note that all ignore 'use bytes' */
 
 #define isALNUM_uni(c)		generic_uni(isWORDCHAR, is_uni_alnum, c)
+#define isBLANK_uni(c)		generic_uni(isBLANK, is_uni_blank, c)
 #define isIDFIRST_uni(c)        generic_uni(isIDFIRST, is_uni_idfirst, c)
 #define isALPHA_uni(c)		generic_uni(isALPHA, is_uni_alpha, c)
 #define isSPACE_uni(c)		generic_uni(isSPACE, is_uni_space, c)
@@ -932,7 +933,6 @@ EXTCONST U32 PL_charclass[];
 
 /* Posix and regular space differ only in U+000B, which is in Latin1 */
 #define isPSXSPC_uni(c)		((c) < 256 ? isPSXSPC_L1(c) : isSPACE_uni(c))
-#define isBLANK_uni(c)		isBLANK(c) /* could be wrong */
 
 #define isALNUM_LC_uvchr(c)	(c < 256 ? isALNUM_LC(c) : is_uni_alnum_lc(c))
 #define isIDFIRST_LC_uvchr(c)	(c < 256 ? isIDFIRST_LC(c) : is_uni_idfirst_lc(c))
@@ -981,6 +981,7 @@ EXTCONST U32 PL_charclass[];
                                   : Perl__is_utf8__perl_idstart(aTHX_ p))
 #define isIDCONT_utf8(p)	generic_utf8(isWORDCHAR, is_utf8_xidcont, p)
 #define isALPHA_utf8(p)		generic_utf8(isALPHA, is_utf8_alpha, p)
+#define isBLANK_utf8(p)		generic_utf8(isBLANK, is_utf8_blank, p)
 #define isSPACE_utf8(p)		generic_utf8(isSPACE, is_utf8_space, p)
 #define isDIGIT_utf8(p)		generic_utf8(isDIGIT, is_utf8_digit, p)
 #define isUPPER_utf8(p)		generic_utf8(isUPPER, is_utf8_upper, p)
@@ -1004,11 +1005,10 @@ EXTCONST U32 PL_charclass[];
 				  ? isPSXSPC_L1(TWO_BYTE_UTF8_TO_UNI(*(p),     \
                                                                      *((p)+1)))\
                                   : isSPACE_utf8(p)))
-#define isBLANK_utf8(c)		isBLANK(c) /* could be wrong */
-
 #define isALNUM_LC_utf8(p)	isALNUM_LC_uvchr(valid_utf8_to_uvchr(p,  0))
 #define isIDFIRST_LC_utf8(p)	isIDFIRST_LC_uvchr(valid_utf8_to_uvchr(p,  0))
 #define isALPHA_LC_utf8(p)	isALPHA_LC_uvchr(valid_utf8_to_uvchr(p,  0))
+#define isBLANK_LC_utf8(p)	isBLANK_LC_uvchr(valid_utf8_to_uvchr(p,  0))
 #define isSPACE_LC_utf8(p)	isSPACE_LC_uvchr(valid_utf8_to_uvchr(p,  0))
 #define isDIGIT_LC_utf8(p)	isDIGIT_LC_uvchr(valid_utf8_to_uvchr(p,  0))
 #define isUPPER_LC_utf8(p)	isUPPER_LC_uvchr(valid_utf8_to_uvchr(p,  0))
@@ -1020,7 +1020,6 @@ EXTCONST U32 PL_charclass[];
 #define isPUNCT_LC_utf8(p)	isPUNCT_LC_uvchr(valid_utf8_to_uvchr(p,  0))
 
 #define isPSXSPC_LC_utf8(c)	(isSPACE_LC_utf8(c) ||(c) == '\f')
-#define isBLANK_LC_utf8(c)	isBLANK(c) /* could be wrong */
 
 /* This conversion works both ways, strangely enough. On EBCDIC platforms,
  * CTRL-@ is 0, CTRL-A is 1, etc, just like on ASCII */
diff --git a/intrpvar.h b/intrpvar.h
index ffcac08..3e9600f 100644
--- a/intrpvar.h
+++ b/intrpvar.h
@@ -614,6 +614,7 @@ PERLVAR(I, VertSpace,   SV *)
 /* utf8 character class swashes */
 PERLVAR(I, utf8_alnum,	SV *)
 PERLVAR(I, utf8_alpha,	SV *)
+PERLVAR(I, utf8_blank,	SV *)
 PERLVAR(I, utf8_space,	SV *)
 PERLVAR(I, utf8_graph,	SV *)
 PERLVAR(I, utf8_digit,	SV *)
diff --git a/op_reg_common.h b/op_reg_common.h
index f35cb7d..8a45b20 100644
--- a/op_reg_common.h
+++ b/op_reg_common.h
@@ -36,7 +36,9 @@
 /* The character set for the regex is stored in a field of more than one bit
  * using an enum, for reasons of compactness and to ensure that the options are
  * mutually exclusive */
-/* Make sure to update ext/re/re.pm when changing this! */
+/* Make sure to update ext/re/re.pm and regcomp.sym (as these are used as
+ * offsets for various node types, like SPACE vs SPACEL, etc) when changing
+ * this! */
 typedef enum {
     REGEX_DEPENDS_CHARSET = 0,
     REGEX_LOCALE_CHARSET,
diff --git a/perl.c b/perl.c
index 4348954..71e958a 100644
--- a/perl.c
+++ b/perl.c
@@ -991,6 +991,7 @@ perl_destruct(pTHXx)
     /* clear utf8 character classes */
     SvREFCNT_dec(PL_utf8_alnum);
     SvREFCNT_dec(PL_utf8_alpha);
+    SvREFCNT_dec(PL_utf8_blank);
     SvREFCNT_dec(PL_utf8_space);
     SvREFCNT_dec(PL_utf8_graph);
     SvREFCNT_dec(PL_utf8_digit);
@@ -1009,6 +1010,7 @@ perl_destruct(pTHXx)
     SvREFCNT_dec(PL_utf8_foldclosures);
     PL_utf8_alnum	= NULL;
     PL_utf8_alpha	= NULL;
+    PL_utf8_blank	= NULL;
     PL_utf8_space	= NULL;
     PL_utf8_graph	= NULL;
     PL_utf8_digit	= NULL;
diff --git a/pod/perlguts.pod b/pod/perlguts.pod
index b9f2ed3..fcc9811 100644
--- a/pod/perlguts.pod
+++ b/pod/perlguts.pod
@@ -37,6 +37,15 @@ they will both be 64 bits.
 An SV can be created and loaded with one command.  There are five types of
 values that can be loaded: an integer value (IV), an unsigned integer
 value (UV), a double (NV), a string (PV), and another scalar (SV).
+("PV" stands for "Pointer Value".  You might think that it is misnamed
+because it is described as pointing only to strings.  However, it is
+possible to have it point to other things.  For example, inversion
+lists, used in regular expression data structures, are scalars, each
+consisting of an array of UVs which are accessed through PVs.  But,
+using it for non-strings requires care, as the underlying assumption of
+much of the internals is that PVs are just for strings.  Often, for
+example, a trailing NUL is tacked on automatically.  The non-string use
+is documented only in this paragraph.)
 
 The seven routines are:
 
@@ -2633,7 +2642,7 @@ is what makes Unicode input an interesting problem.
 In general, you either have to know what you're dealing with, or you
 have to guess.  The API function C<is_utf8_string> can help; it'll tell
 you if a string contains only valid UTF-8 characters. However, it can't
-do the work for you. On a character-by-character basis, C<is_utf8_char>
+do the work for you. On a character-by-character basis, XXX C<is_utf8_char>
 will tell you whether the current character in a string is valid UTF-8. 
 
 =head2 How does UTF-8 represent Unicode characters?
diff --git a/proto.h b/proto.h
index 272f486..b456442 100644
--- a/proto.h
+++ b/proto.h
@@ -1673,6 +1673,10 @@ PERL_CALLCONV bool	Perl_is_uni_ascii_lc(pTHX_ UV c)
 			__attribute__warn_unused_result__
 			__attribute__pure__;
 
+PERL_CALLCONV bool	Perl_is_uni_blank(pTHX_ UV c)
+			__attribute__warn_unused_result__
+			__attribute__pure__;
+
 PERL_CALLCONV bool	Perl_is_uni_cntrl(pTHX_ UV c)
 			__attribute__warn_unused_result__
 			__attribute__pure__;
@@ -1831,6 +1835,12 @@ PERL_CALLCONV bool	Perl_is_utf8_ascii(pTHX_ const U8 *p)
 #define PERL_ARGS_ASSERT_IS_UTF8_ASCII	\
 	assert(p)
 
+PERL_CALLCONV bool	Perl_is_utf8_blank(pTHX_ const U8 *p)
+			__attribute__warn_unused_result__
+			__attribute__nonnull__(pTHX_1);
+#define PERL_ARGS_ASSERT_IS_UTF8_BLANK	\
+	assert(p)
+
 PERL_CALLCONV STRLEN	Perl_is_utf8_char(const U8 *s)
 			__attribute__deprecated__
 			__attribute__nonnull__(1);
@@ -6623,22 +6633,6 @@ STATIC void	S_scan_commit(pTHX_ const struct RExC_state_t *pRExC_state, struct s
 #define PERL_ARGS_ASSERT_SCAN_COMMIT	\
 	assert(pRExC_state); assert(data); assert(minlenp)
 
-PERL_STATIC_INLINE U8	S_set_regclass_bit(pTHX_ struct RExC_state_t* pRExC_state, regnode* node, const U8 value, SV** invlist_ptr, AV** alternate_ptr)
-			__attribute__nonnull__(pTHX_1)
-			__attribute__nonnull__(pTHX_2)
-			__attribute__nonnull__(pTHX_4)
-			__attribute__nonnull__(pTHX_5);
-#define PERL_ARGS_ASSERT_SET_REGCLASS_BIT	\
-	assert(pRExC_state); assert(node); assert(invlist_ptr); assert(alternate_ptr)
-
-STATIC U8	S_set_regclass_bit_fold(pTHX_ struct RExC_state_t *pRExC_state, regnode* node, const U8 value, SV** invlist_ptr, AV** alternate_ptr)
-			__attribute__nonnull__(pTHX_1)
-			__attribute__nonnull__(pTHX_2)
-			__attribute__nonnull__(pTHX_4)
-			__attribute__nonnull__(pTHX_5);
-#define PERL_ARGS_ASSERT_SET_REGCLASS_BIT_FOLD	\
-	assert(pRExC_state); assert(node); assert(invlist_ptr); assert(alternate_ptr)
-
 STATIC I32	S_study_chunk(pTHX_ struct RExC_state_t *pRExC_state, regnode **scanp, I32 *minlenp, I32 *deltap, regnode *last, struct scan_data_t *data, I32 stopparen, U8* recursed, struct regnode_charc ... [44 chars truncated]
 			__attribute__nonnull__(pTHX_1)
 			__attribute__nonnull__(pTHX_2)
diff --git a/regcomp.c b/regcomp.c
index 6d29905..cfed452 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -9912,43 +9912,17 @@ tryagain:
 	    *flagp |= HASWIDTH;
 	    goto finish_meta_pat;
 	case 'w':
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = ALNUML;
-		    break;
-		case REGEX_UNICODE_CHARSET:
-		    op = ALNUMU;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = ALNUMA;
-		    break;
-		case REGEX_DEPENDS_CHARSET:
-		    op = ALNUM;
-		    break;
-		default:
-		    goto bad_charset;
+	    op = ALNUM + get_regex_charset(RExC_flags);
+            if (op > ALNUMA) {  /* /aa is same as /a */
+                op = ALNUMA;
             }
 	    ret = reg_node(pRExC_state, op);
 	    *flagp |= HASWIDTH|SIMPLE;
 	    goto finish_meta_pat;
 	case 'W':
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = NALNUML;
-		    break;
-		case REGEX_UNICODE_CHARSET:
-		    op = NALNUMU;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = NALNUMA;
-		    break;
-		case REGEX_DEPENDS_CHARSET:
-		    op = NALNUM;
-		    break;
-		default:
-		    goto bad_charset;
+	    op = NALNUM + get_regex_charset(RExC_flags);
+            if (op > NALNUMA) { /* /aa is same as /a */
+                op = NALNUMA;
             }
 	    ret = reg_node(pRExC_state, op);
 	    *flagp |= HASWIDTH|SIMPLE;
@@ -9956,22 +9930,9 @@ tryagain:
 	case 'b':
 	    RExC_seen_zerolen++;
 	    RExC_seen |= REG_SEEN_LOOKBEHIND;
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = BOUNDL;
-		    break;
-		case REGEX_UNICODE_CHARSET:
-		    op = BOUNDU;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = BOUNDA;
-		    break;
-		case REGEX_DEPENDS_CHARSET:
-		    op = BOUND;
-		    break;
-		default:
-		    goto bad_charset;
+	    op = BOUND + get_regex_charset(RExC_flags);
+            if (op > BOUNDA) {  /* /aa is same as /a */
+                op = BOUNDA;
             }
 	    ret = reg_node(pRExC_state, op);
 	    FLAGS(ret) = get_regex_charset(RExC_flags);
@@ -9980,103 +9941,45 @@ tryagain:
 	case 'B':
 	    RExC_seen_zerolen++;
 	    RExC_seen |= REG_SEEN_LOOKBEHIND;
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = NBOUNDL;
-		    break;
-		case REGEX_UNICODE_CHARSET:
-		    op = NBOUNDU;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = NBOUNDA;
-		    break;
-		case REGEX_DEPENDS_CHARSET:
-		    op = NBOUND;
-		    break;
-		default:
-		    goto bad_charset;
+	    op = NBOUND + get_regex_charset(RExC_flags);
+            if (op > NBOUNDA) { /* /aa is same as /a */
+                op = NBOUNDA;
             }
 	    ret = reg_node(pRExC_state, op);
 	    FLAGS(ret) = get_regex_charset(RExC_flags);
 	    *flagp |= SIMPLE;
 	    goto finish_meta_pat;
 	case 's':
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = SPACEL;
-		    break;
-		case REGEX_UNICODE_CHARSET:
-		    op = SPACEU;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = SPACEA;
-		    break;
-		case REGEX_DEPENDS_CHARSET:
-		    op = SPACE;
-		    break;
-		default:
-		    goto bad_charset;
+	    op = SPACE + get_regex_charset(RExC_flags);
+            if (op > SPACEA) {  /* /aa is same as /a */
+                op = SPACEA;
             }
 	    ret = reg_node(pRExC_state, op);
 	    *flagp |= HASWIDTH|SIMPLE;
 	    goto finish_meta_pat;
 	case 'S':
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = NSPACEL;
-		    break;
-		case REGEX_UNICODE_CHARSET:
-		    op = NSPACEU;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = NSPACEA;
-		    break;
-		case REGEX_DEPENDS_CHARSET:
-		    op = NSPACE;
-		    break;
-		default:
-		    goto bad_charset;
-            }
-	    ret = reg_node(pRExC_state, op);
-	    *flagp |= HASWIDTH|SIMPLE;
-	    goto finish_meta_pat;
-	case 'd':
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = DIGITL;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = DIGITA;
-		    break;
-		case REGEX_DEPENDS_CHARSET: /* No difference between these */
-		case REGEX_UNICODE_CHARSET:
-		    op = DIGIT;
-		    break;
-		default:
-		    goto bad_charset;
+	    op = NSPACE + get_regex_charset(RExC_flags);
+            if (op > NSPACEA) { /* /aa is same as /a */
+                op = NSPACEA;
             }
 	    ret = reg_node(pRExC_state, op);
 	    *flagp |= HASWIDTH|SIMPLE;
 	    goto finish_meta_pat;
 	case 'D':
-	    switch (get_regex_charset(RExC_flags)) {
-		case REGEX_LOCALE_CHARSET:
-		    op = NDIGITL;
-		    break;
-		case REGEX_ASCII_RESTRICTED_CHARSET:
-		case REGEX_ASCII_MORE_RESTRICTED_CHARSET:
-		    op = NDIGITA;
-		    break;
-		case REGEX_DEPENDS_CHARSET: /* No difference between these */
-		case REGEX_UNICODE_CHARSET:
-		    op = NDIGIT;
-		    break;
-		default:
-		    goto bad_charset;
+            op = NDIGIT;
+            goto join_D_and_d;
+	case 'd':
+            op = DIGIT;
+        join_D_and_d:
+            {
+                U8 offset = get_regex_charset(RExC_flags);
+                if (offset == REGEX_UNICODE_CHARSET) {
+                    offset = REGEX_DEPENDS_CHARSET;
+                }
+                else if (offset == REGEX_ASCII_MORE_RESTRICTED_CHARSET) {
+                    offset = REGEX_ASCII_RESTRICTED_CHARSET;
+                }
+                op += offset;
             }
 	    ret = reg_node(pRExC_state, op);
 	    *flagp |= HASWIDTH|SIMPLE;
@@ -10305,14 +10208,18 @@ tryagain:
 	    bool is_exactfu_sharp_s;
 
 	    ender = 0;
-            node_type = ((! FOLD) ? EXACT
-		        : (LOC)
-			  ? EXACTFL
-			  : (MORE_ASCII_RESTRICTED)
-			    ? EXACTFA
-			    : (AT_LEAST_UNI_SEMANTICS)
-			      ? EXACTFU
-			      : EXACTF);
+            if (! FOLD) {
+                node_type = EXACT;
+            }
+            else {
+                node_type = get_regex_charset(RExC_flags);
+                if (node_type >= REGEX_ASCII_RESTRICTED_CHARSET) {
+                    node_type--; /* /a is same as /u, and map /aa's offset to
+                                    what /a's would have been, so there is no
+                                    hole */
+                }
+                node_type += EXACTF;
+            }
 	    ret = reg_node(pRExC_state, node_type);
 	    s = STRING(ret);
 
@@ -10706,11 +10613,6 @@ tryagain:
     }
 
     return(ret);
-
-/* Jumped to when an unrecognized character set is encountered */
-bad_charset:
-    Perl_croak(aTHX_ "panic: Unknown regex character set encoding: %u", get_regex_charset(RExC_flags));
-    return(NULL);
 }
 
 STATIC char *
@@ -11029,171 +10931,6 @@ S_checkposixcc(pTHX_ RExC_state_t *pRExC_state)
 	}                                                                  \
     }
 
-STATIC U8
-S_set_regclass_bit_fold(pTHX_ RExC_state_t *pRExC_state, regnode* node, const U8 value, SV** invlist_ptr, AV** alternate_ptr)
-{
-
-    /* Handle the setting of folds in the bitmap for non-locale ANYOF nodes.
-     * Locale folding is done at run-time, so this function should not be
-     * called for nodes that are for locales.
-     *
-     * This function sets the bit corresponding to the fold of the input
-     * 'value', if not already set.  The fold of 'f' is 'F', and the fold of
-     * 'F' is 'f'.
-     *
-     * It also knows about the characters that are in the bitmap that have
-     * folds that are matchable only outside it, and sets the appropriate lists
-     * and flags.
-     *
-     * It returns the number of bits that actually changed from 0 to 1 */
-
-    U8 stored = 0;
-    U8 fold;
-
-    PERL_ARGS_ASSERT_SET_REGCLASS_BIT_FOLD;
-
-    fold = (AT_LEAST_UNI_SEMANTICS) ? PL_fold_latin1[value]
-                                    : PL_fold[value];
-
-    /* It assumes the bit for 'value' has already been set */
-    if (fold != value && ! ANYOF_BITMAP_TEST(node, fold)) {
-        ANYOF_BITMAP_SET(node, fold);
-        stored++;
-    }
-    if (_HAS_NONLATIN1_FOLD_CLOSURE_ONLY_FOR_USE_BY_REGCOMP_DOT_C_AND_REGEXEC_DOT_C(value) && (! isASCII(value) || ! MORE_ASCII_RESTRICTED)) {
-	/* Certain Latin1 characters have matches outside the bitmap.  To get
-	 * here, 'value' is one of those characters.   None of these matches is
-	 * valid for ASCII characters under /aa, which have been excluded by
-	 * the 'if' above.  The matches fall into three categories:
-	 * 1) They are singly folded-to or -from an above 255 character, as
-	 *    LATIN SMALL LETTER Y WITH DIAERESIS and LATIN CAPITAL LETTER Y
-	 *    WITH DIAERESIS;
-	 * 2) They are part of a multi-char fold with another character in the
-	 *    bitmap, only LATIN SMALL LETTER SHARP S => "ss" fits that bill;
-	 * 3) They are part of a multi-char fold with a character not in the
-	 *    bitmap, such as various ligatures.
-	 * We aren't dealing fully with multi-char folds, except we do deal
-	 * with the pattern containing a character that has a multi-char fold
-	 * (not so much the inverse).
-	 * For types 1) and 3), the matches only happen when the target string
-	 * is utf8; that's not true for 2), and we set a flag for it.
-	 *
-	 * The code below adds to the passed in inversion list the single fold
-	 * closures for 'value'.  The values are hard-coded here so that an
-	 * innocent-looking character class, like /[ks]/i won't have to go out
-	 * to disk to find the possible matches.  XXX It would be better to
-	 * generate these via regen, in case a new version of the Unicode
-	 * standard adds new mappings, though that is not really likely. */
-	switch (value) {
-	    case 'k':
-	    case 'K':
-		/* KELVIN SIGN */
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr, 0x212A);
-		break;
-	    case 's':
-	    case 'S':
-		/* LATIN SMALL LETTER LONG S */
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr, 0x017F);
-		break;
-	    case MICRO_SIGN:
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr,
-						 GREEK_SMALL_LETTER_MU);
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr,
-						 GREEK_CAPITAL_LETTER_MU);
-		break;
-	    case LATIN_CAPITAL_LETTER_A_WITH_RING_ABOVE:
-	    case LATIN_SMALL_LETTER_A_WITH_RING_ABOVE:
-		/* ANGSTROM SIGN */
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr, 0x212B);
-		if (DEPENDS_SEMANTICS) {    /* See DEPENDS comment below */
-		    *invlist_ptr = add_cp_to_invlist(*invlist_ptr,
-						     PL_fold_latin1[value]);
-		}
-		break;
-	    case LATIN_SMALL_LETTER_Y_WITH_DIAERESIS:
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr,
-					LATIN_CAPITAL_LETTER_Y_WITH_DIAERESIS);
-		break;
-	    case LATIN_SMALL_LETTER_SHARP_S:
-		*invlist_ptr = add_cp_to_invlist(*invlist_ptr,
-					LATIN_CAPITAL_LETTER_SHARP_S);
-
-		/* Under /a, /d, and /u, this can match the two chars "ss" */
-		if (! MORE_ASCII_RESTRICTED) {
-		    add_alternate(alternate_ptr, (U8 *) "ss", 2);
-
-		    /* And under /u or /a, it can match even if the target is
-		     * not utf8 */
-		    if (AT_LEAST_UNI_SEMANTICS) {
-			ANYOF_FLAGS(node) |= ANYOF_NONBITMAP_NON_UTF8;
-		    }
-		}
-		break;
-	    case 'F': case 'f':
-	    case 'I': case 'i':
-	    case 'L': case 'l':
-	    case 'T': case 't':
-	    case 'A': case 'a':
-	    case 'H': case 'h':
-	    case 'J': case 'j':
-	    case 'N': case 'n':
-	    case 'W': case 'w':
-	    case 'Y': case 'y':
-                /* These all are targets of multi-character folds from code
-                 * points that require UTF8 to express, so they can't match
-                 * unless the target string is in UTF-8, so no action here is
-                 * necessary, as regexec.c properly handles the general case
-                 * for UTF-8 matching */
-		break;
-	    default:
-		/* Use deprecated warning to increase the chances of this
-		 * being output */
-		ckWARN2regdep(RExC_parse, "Perl folding rules are not up-to-date for 0x%x; please use the perlbug utility to report;", value);
-		break;
-	}
-    }
-    else if (DEPENDS_SEMANTICS
-	    && ! isASCII(value)
-	    && PL_fold_latin1[value] != value)
-    {
-	   /* Under DEPENDS rules, non-ASCII Latin1 characters match their
-	    * folds only when the target string is in UTF-8.  We add the fold
-	    * here to the list of things to match outside the bitmap, which
-	    * won't be looked at unless it is UTF8 (or else if something else
-	    * says to look even if not utf8, but those things better not happen
-	    * under DEPENDS semantics. */
-	*invlist_ptr = add_cp_to_invlist(*invlist_ptr, PL_fold_latin1[value]);
-    }
-
-    return stored;
-}
-
-
-PERL_STATIC_INLINE U8
-S_set_regclass_bit(pTHX_ RExC_state_t *pRExC_state, regnode* node, const U8 value, SV** invlist_ptr, AV** alternate_ptr)
-{
-    /* This inline function sets a bit in the bitmap if not already set, and if
-     * appropriate, its fold, returning the number of bits that actually
-     * changed from 0 to 1 */
-
-    U8 stored;
-
-    PERL_ARGS_ASSERT_SET_REGCLASS_BIT;
-
-    if (ANYOF_BITMAP_TEST(node, value)) {   /* Already set */
-	return 0;
-    }
-
-    ANYOF_BITMAP_SET(node, value);
-    stored = 1;
-
-    if (FOLD && ! LOC) {	/* Locale folds aren't known until runtime */
-	stored += set_regclass_bit_fold(pRExC_state, node, value, invlist_ptr, alternate_ptr);
-    }
-
-    return stored;
-}
-
 STATIC void
 S_add_alternate(pTHX_ AV** alternate_ptr, U8* string, STRLEN len)
 {
@@ -11241,6 +10978,13 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, U32 depth)
 			       Optimizations may be possible if this is tiny */
     UV n;
 
+    /* Certain named classes have equivalents that can appear outside a
+     * character class, e.g. \w.  These flags are set for these classes.  The
+     * first flag indicates the op depends on the character set modifier, like
+     * /d, /u....  The second is for those that don't have this dependency. */
+    bool has_special_charset_op = FALSE;
+    bool has_special_non_charset_op = FALSE;
+
     /* Unicode properties are stored in a swash; this holds the current one
      * being parsed.  If this swash is the only above-latin1 component of the
      * character class, an optimization is to pass it directly on to the
@@ -11253,28 +10997,20 @@ S_regclass(pTHX_ RExC_state_t *pRExC_state, U32 depth)
      * on to the engine */
     UV has_user_defined_property = 0;
 
-    /* code points this node matches that can't be stored in the bitmap */
-    SV* nonbitmap = NULL;
-
-    /* The items that are to match that aren't stored in the bitmap, but are a
-     * result of things that are stored there.  This is the fold closure of
-     * such a character, either because it has DEPENDS semantics and shouldn't
-     * be matched unless the target string is utf8, or is a code point that is
-     * too large for the bit map, as for example, the fold of the MICRO SIGN is
-     * above 255.  This all is solely for performance reasons.  By having this
-     * code know the outside-the-bitmap folds that the bitmapped characters are
-     * involved with, we don't have to go out to disk to find the list of
-     * matches, unless the character class includes code points that aren't
-     * storable in the bit map.  That means that a character class with an 's'
-     * in it, for example, doesn't need to go out to disk to find everything
-     * that matches.  A 2nd list is used so that the 'nonbitmap' list is kept
-     * empty unless there is something whose fold we don't know about, and will
-     * have to go out to the disk to find. */
-    SV* l1_fold_invlist = NULL;
+    /* inversion list of code points this node matches only when the target
+     * string is in UTF-8.  (Because is under /d) */
+    SV* depends_list = NULL;
+
+    /* inversion list of code points this node matches.  For much of the
+     * function, it includes only those that match regardless of the utf8ness
+     * of the target string */
+    SV* cp_list = NULL;
 
     /* List of multi-character folds that are matched by this node */
     AV* unicode_alternate  = NULL;
 #ifdef EBCDIC
+    /* In a range, counts how many 0-2 of the ends of it came from literals,
+     * not escapes.  Thus we can tell if 'A' was input vs \x{C1} */
     UV literal_endpoint = 0;
 #endif
     UV stored = 0;  /* how many chars stored in the bitmap */
@@ -11637,27 +11373,49 @@ parseit:
 		    ckWARN4reg(RExC_parse,
 			       "False [] range \"%*.*s\"",
 			       w, w, rangebegin);
-
-		    stored +=
-                         set_regclass_bit(pRExC_state, ret, '-', &l1_fold_invlist, &unicode_alternate);
-		    if (prevvalue < 256) {
-			stored +=
-                         set_regclass_bit(pRExC_state, ret, (U8) prevvalue, &l1_fold_invlist, &unicode_alternate);
-		    }
-		    else {
-			nonbitmap = add_cp_to_invlist(nonbitmap, prevvalue);
-		    }
+                    cp_list = add_cp_to_invlist(cp_list, '-');
+                    cp_list = add_cp_to_invlist(cp_list, prevvalue);
 		}
 
 		range = 0; /* this was not a true range */
+                element_count += 2; /* So counts for three values */
 	    }
 
-	    if (!SIZE_ONLY) {
+	    if (SIZE_ONLY) {
+
+                /* In the first pass, do a little extra work so below can
+                 * possibly optimize the whole node to one of the nodes that
+                 * correspond to the classes given below */
+
+                /* The optimization will only take place if there is a single
+                 * element in the class, so can skip if there is more than one
+                 */
+                if (element_count == 1) {
 
 		/* Possible truncation here but in some 64-bit environments
 		 * the compiler gets heartburn about switch on 64-bit values.
 		 * A similar issue a little earlier when switching on value.
 		 * --jhi */
+                    switch ((I32)namedclass) {
+                        case ANYOF_ALNUM:
+                        case ANYOF_NALNUM:
+                        case ANYOF_DIGIT:
+                        case ANYOF_NDIGIT:
+                        case ANYOF_SPACE:
+                        case ANYOF_NSPACE:
+                            has_special_charset_op = TRUE;
+                            break;
+
+                        case ANYOF_HORIZWS:
+                        case ANYOF_NHORIZWS:
+                        case ANYOF_VERTWS:
+                        case ANYOF_NVERTWS:
+                            has_special_non_charset_op = TRUE;
+                            break;
+                    }
+                }
+            }
+            else {
 		switch ((I32)namedclass) {
 
 		case ANYOF_ALNUMC: /* C's alnum, in contrast to \w */
@@ -11718,10 +11476,12 @@ parseit:
 		     * them */
 		    DO_POSIX_LATIN1_ONLY_KNOWN_L1_RESOLVED(ret, namedclass, properties,
                         PL_PosixDigit, "XPosixDigit", listsv);
+                    has_special_charset_op = TRUE;
 		    break;
 		case ANYOF_NDIGIT:
 		    DO_N_POSIX_LATIN1_ONLY_KNOWN(ret, namedclass, properties,
                         PL_PosixDigit, PL_PosixDigit, "XPosixDigit", listsv);
+                    has_special_charset_op = TRUE;
 		    break;
 		case ANYOF_GRAPH:
 		    DO_POSIX_LATIN1_ONLY_KNOWN(ret, namedclass, properties,
@@ -11732,16 +11492,18 @@ parseit:
                         PL_PosixGraph, PL_L1PosixGraph, "XPosixGraph", listsv);
 		    break;
 		case ANYOF_HORIZWS:
-		    /* For these, we use the nonbitmap, as /d doesn't make a
+		    /* For these, we use the cp_list, as /d doesn't make a
 		     * difference in what these match.  There would be problems
 		     * if these characters had folds other than themselves, as
-		     * nonbitmap is subject to folding.  It turns out that \h
+		     * cp_list is subject to folding.  It turns out that \h
 		     * is just a synonym for XPosixBlank */
-		    _invlist_union(nonbitmap, PL_XPosixBlank, &nonbitmap);
+		    _invlist_union(cp_list, PL_XPosixBlank, &cp_list);
+                    has_special_non_charset_op = TRUE;
 		    break;
 		case ANYOF_NHORIZWS:
-                    _invlist_union_complement_2nd(nonbitmap,
-                                                 PL_XPosixBlank, &nonbitmap);
+                    _invlist_union_complement_2nd(cp_list,
+                                                 PL_XPosixBlank, &cp_list);
+                    has_special_non_charset_op = TRUE;
 		    break;
 		case ANYOF_LOWER:
 		case ANYOF_NLOWER:
@@ -11800,10 +11562,12 @@ parseit:
 		case ANYOF_SPACE:
                     DO_POSIX(ret, namedclass, properties,
                                             PL_PerlSpace, PL_XPerlSpace);
+                    has_special_charset_op = TRUE;
 		    break;
 		case ANYOF_NSPACE:
                     DO_N_POSIX(ret, namedclass, properties,
                                             PL_PerlSpace, PL_XPerlSpace);
+                    has_special_charset_op = TRUE;
 		    break;
 		case ANYOF_UPPER:   /* Same as LOWER, above */
 		case ANYOF_NUPPER:
@@ -11835,21 +11599,25 @@ parseit:
 		case ANYOF_ALNUM:   /* Really is 'Word' */
 		    DO_POSIX_LATIN1_ONLY_KNOWN(ret, namedclass, properties,
                             PL_PosixWord, PL_L1PosixWord, "XPosixWord", listsv);
+                    has_special_charset_op = TRUE;
 		    break;
 		case ANYOF_NALNUM:
 		    DO_N_POSIX_LATIN1_ONLY_KNOWN(ret, namedclass, properties,
                             PL_PosixWord, PL_L1PosixWord, "XPosixWord", listsv);
+                    has_special_charset_op = TRUE;
 		    break;
 		case ANYOF_VERTWS:
-		    /* For these, we use the nonbitmap, as /d doesn't make a
+		    /* For these, we use the cp_list, as /d doesn't make a
 		     * difference in what these match.  There would be problems
 		     * if these characters had folds other than themselves, as
-		     * nonbitmap is subject to folding */
-		    _invlist_union(nonbitmap, PL_VertSpace, &nonbitmap);
+		     * cp_list is subject to folding */
+		    _invlist_union(cp_list, PL_VertSpace, &cp_list);
+                    has_special_non_charset_op = TRUE;
 		    break;
 		case ANYOF_NVERTWS:
-                    _invlist_union_complement_2nd(nonbitmap,
-                                                    PL_VertSpace, &nonbitmap);
+                    _invlist_union_complement_2nd(cp_list,
+                                                    PL_VertSpace, &cp_list);
+                    has_special_non_charset_op = TRUE;
 		    break;
 		case ANYOF_XDIGIT:
                     DO_POSIX(ret, namedclass, properties,
@@ -11896,9 +11664,8 @@ parseit:
 			       "False [] range \"%*.*s\"",
 			       w, w, rangebegin);
 		    }
-		    if (!SIZE_ONLY)
-			stored +=
-                            set_regclass_bit(pRExC_state, ret, '-', &l1_fold_invlist, &unicode_alternate);
+                    if (!SIZE_ONLY)
+                        cp_list = add_cp_to_invlist(cp_list, '-');
 		} else
 		    range = 1;	/* yeah, it's a range! */
 		continue;	/* but do it the next time */
@@ -11913,118 +11680,356 @@ parseit:
 
 	/* now is the next time */
 	if (!SIZE_ONLY) {
-	    if (prevvalue < 256) {
-	        const IV ceilvalue = value < 256 ? value : 255;
-		IV i;
-#ifdef EBCDIC
-		/* In EBCDIC [\x89-\x91] should include
-		 * the \x8e but [i-j] should not. */
-		if (literal_endpoint == 2 &&
-		    ((isLOWER(prevvalue) && isLOWER(ceilvalue)) ||
-		     (isUPPER(prevvalue) && isUPPER(ceilvalue))))
-		{
-		    if (isLOWER(prevvalue)) {
-			for (i = prevvalue; i <= ceilvalue; i++)
-			    if (isLOWER(i) && !ANYOF_BITMAP_TEST(ret,i)) {
-				stored +=
-                                  set_regclass_bit(pRExC_state, ret, (U8) i, &l1_fold_invlist, &unicode_alternate);
-			    }
-		    } else {
-			for (i = prevvalue; i <= ceilvalue; i++)
-			    if (isUPPER(i) && !ANYOF_BITMAP_TEST(ret,i)) {
-				stored +=
-                                  set_regclass_bit(pRExC_state, ret, (U8) i, &l1_fold_invlist, &unicode_alternate);
-			    }
-		    }
-		}
-		else
-#endif
-		      for (i = prevvalue; i <= ceilvalue; i++) {
-			stored += set_regclass_bit(pRExC_state, ret, (U8) i, &l1_fold_invlist, &unicode_alternate);
-	              }
-	  }
-	  if (value > 255) {
-	    const UV prevnatvalue  = NATIVE_TO_UNI(prevvalue);
-	    const UV natvalue      = NATIVE_TO_UNI(value);
-	    nonbitmap = _add_range_to_invlist(nonbitmap, prevnatvalue, natvalue);
-	}
-#ifdef EBCDIC
-	    literal_endpoint = 0;
+#ifndef EBCDIC
+            cp_list = _add_range_to_invlist(cp_list, prevvalue, value);
+#else
+            UV* this_range = _new_invlist(1);
+            _append_range_to_invlist(this_range, prevvalue, value);
+
+            /* In EBCDIC, the ranges 'A-Z' and 'a-z' are each not contiguous.
+             * If this range was specified using something like 'i-j', we want
+             * to include only the 'i' and the 'j', and not anything in
+             * between, so exclude non-ASCII, non-alphabetics from it.
+             * However, if the range was specified with something like
+             * [\x89-\x91] or [\x89-j], all code points within it should be
+             * included.  literal_endpoint==2 means both ends of the range used
+             * a literal character, not \x{foo} */
+	    if (literal_endpoint == 2
+                && (prevvalue >= 'a' && value <= 'z')
+                    || (prevvalue >= 'A' && value <= 'Z'))
+            {
+                _invlist_intersection(this_range, PL_ASCII, &this_range, );
+                _invlist_intersection(this_range, PL_Alpha, &this_range, );
+            }
+            _invlist_union(cp_list, this_range, &cp_list);
+            literal_endpoint = 0;
 #endif
         }
 
 	range = 0; /* this range (if it was one) is done now */
     }
 
+    /* [\w] can be optimized into \w, but not if there is anything else in the
+     * brackets (except for an initial '^' which indictes omplementing).  We
+     * also can optimize the common special case /[0-9]/ into /\d/a */
+    if (element_count == 1 &&
+        (has_special_charset_op
+         || has_special_non_charset_op
+         || (prevvalue == '0' && value == '9')))
+    {
+        U8 op;
+        bool invert = ANYOF_FLAGS(ret) & ANYOF_INVERT;
+        const char * cur_parse = RExC_parse;
+
+        if (has_special_charset_op) {
+            U8 offset = get_regex_charset(RExC_flags);
+
+            /* /aa is the same as /a for these */
+            if (offset == REGEX_ASCII_MORE_RESTRICTED_CHARSET) {
+                offset = REGEX_ASCII_RESTRICTED_CHARSET;
+            }
+            switch ((I32)namedclass) {
+                case ANYOF_NALNUM:
+                    invert = ! invert;
+                    /* FALLTHROUGH */
+                case ANYOF_ALNUM:
+                    op = ALNUM;
+                    break;
+                case ANYOF_NSPACE:
+                    invert = ! invert;
+                    /* FALLTHROUGH */
+                case ANYOF_SPACE:
+                    op = SPACE;
+                    break;
+                case ANYOF_NDIGIT:
+                    invert = ! invert;
+                    /* FALLTHROUGH */
+                case ANYOF_DIGIT:
+                    op = DIGIT;
+
+                    /* There is no DIGITU */
+                    if (offset == REGEX_UNICODE_CHARSET) {
+                        offset = REGEX_DEPENDS_CHARSET;
+                    }
+                    break;
+                default:
+                    Perl_croak(aTHX_ "panic: Named character class %"IVdf" is not expected to have a non-[...] version", namedclass);
+            }
+
+            /* The number of varieties of each of these is the same, hence, so
+             * is the delta between the normal and complemented nodes */
+            if (invert) {
+                offset += NALNUM - ALNUM;
+            }
+
+            op += offset;
+        }
+        else if (has_special_non_charset_op) {
+            switch ((I32)namedclass) {
+                case ANYOF_NHORIZWS:
+                    invert = ! invert;
+                    /* FALLTHROUGH */
+                case ANYOF_HORIZWS:
+                    op = HORIZWS;
+                    break;
+                case ANYOF_NVERTWS:
+                    invert = ! invert;
+                    /* FALLTHROUGH */
+                case ANYOF_VERTWS:
+                    op = VERTWS;
+                    break;
+                default:
+                    Perl_croak(aTHX_ "panic: Named character class %"IVdf" is not expected to have a non-[...] version", namedclass);
+            }
+
+            /* The complement version of each of these nodes is adjacently next
+             * */
+            if (invert) {
+                op++;
+            }
+        }
+        else {  /* The remaining possibility is [0-9] */
+            op = (invert) ? NDIGITA : DIGITA;
+        }
+
+        /* Throw away this ANYOF regnode, and emit the calculated one, which
+         * should correspond to the beginning, not current, state of the parse
+         */
+        RExC_parse = (char *)orig_parse;
+        RExC_emit = (regnode *)orig_emit;
+        ret = reg_node(pRExC_state, op);
+        RExC_parse = (char *) cur_parse;
 
+        SvREFCNT_dec(listsv);
+        return ret;
+    }
 
     if (SIZE_ONLY)
         return ret;
     /****** !SIZE_ONLY AFTER HERE *********/
 
-    /* If folding and there are code points above 255, we calculate all
-     * characters that could fold to or from the ones already on the list */
-    if (FOLD && nonbitmap) {
+    /* If folding, we calculate all characters that could fold to or from the
+     * ones already on the list */
+    if (FOLD && cp_list) {
 	UV start, end;	/* End points of code point ranges */
 
 	SV* fold_intersection = NULL;
 
-	/* This is a list of all the characters that participate in folds
-	    * (except marks, etc in multi-char folds */
-	if (! PL_utf8_foldable) {
-	    SV* swash = swash_init("utf8", "Cased", &PL_sv_undef, 1, 0);
-	    PL_utf8_foldable = _swash_to_invlist(swash);
-            SvREFCNT_dec(swash);
-	}
+        const UV highest_index = invlist_len(cp_list) - 1;
+
+        /* In the Latin1 range, the characters that can be folded-to or -from
+         * are precisely the alphabetic characters.  If the highest code point
+         * is within Latin1, we can use the compiled-in list, and not have to
+         * go out to disk.  If the last element in the array is in the
+         * inversion list set, it starts a range that goes to infinity, so the
+         * maximum of the inversion list is definitely above Latin1.
+         * Otherwise, it starts a range that isn't in the set, so the max is
+         * one less than it */
+        if (! ELEMENT_RANGE_MATCHES_INVLIST(highest_index)
+            && invlist_array(cp_list)[highest_index] <= 256)
+        {
+            _invlist_intersection(PL_L1PosixAlpha, cp_list, &fold_intersection);
+        }
+        else {
 
-	/* This is a hash that for a particular fold gives all characters
-	    * that are involved in it */
-	if (! PL_utf8_foldclosures) {
+            /* This is a list of all the characters that participate in folds
+             * (except marks, etc in multi-char folds */
+            if (! PL_utf8_foldable) {
+                SV* swash = swash_init("utf8", "Cased", &PL_sv_undef, 1, 0);
+                PL_utf8_foldable = _swash_to_invlist(swash);
+                SvREFCNT_dec(swash);
+            }
 
-	    /* If we were unable to find any folds, then we likely won't be
-	     * able to find the closures.  So just create an empty list.
-	     * Folding will effectively be restricted to the non-Unicode rules
-	     * hard-coded into Perl.  (This case happens legitimately during
-	     * compilation of Perl itself before the Unicode tables are
-	     * generated) */
-	    if (invlist_len(PL_utf8_foldable) == 0) {
-		PL_utf8_foldclosures = newHV();
-	    } else {
-		/* If the folds haven't been read in, call a fold function
-		    * to force that */
-		if (! PL_utf8_tofold) {
-		    U8 dummy[UTF8_MAXBYTES+1];
-		    STRLEN dummy_len;
-
-		    /* This particular string is above \xff in both UTF-8 and
-		     * UTFEBCDIC */
-		    to_utf8_fold((U8*) "\xC8\x80", dummy, &dummy_len);
-		    assert(PL_utf8_tofold); /* Verify that worked */
-		}
-		PL_utf8_foldclosures = _swash_inversion_hash(PL_utf8_tofold);
-	    }
-	}
+            /* This is a hash that for a particular fold gives all characters
+             * that are involved in it */
+            if (! PL_utf8_foldclosures) {
+
+                /* If we were unable to find any folds, then we likely won't be
+                 * able to find the closures.  So just create an empty list.
+                 * Folding will effectively be restricted to the non-Unicode
+                 * rules hard-coded into Perl.  (This case happens legitimately
+                 * during compilation of Perl itself before the Unicode tables
+                 * are generated) */
+                if (invlist_len(PL_utf8_foldable) == 0) {
+                    PL_utf8_foldclosures = newHV();
+                }
+                else {
+                    /* If the folds haven't been read in, call a fold function
+                     * to force that */
+                    if (! PL_utf8_tofold) {
+                        U8 dummy[UTF8_MAXBYTES+1];
+                        STRLEN dummy_len;
+
+                        /* This particular string is above \xff in both UTF-8
+                         * and UTFEBCDIC */
+                        to_utf8_fold((U8*) "\xC8\x80", dummy, &dummy_len);
+                        assert(PL_utf8_tofold); /* Verify that worked */
+                    }
+                    PL_utf8_foldclosures =
+                                        _swash_inversion_hash(PL_utf8_tofold);
+                }
+            }
 
-	/* Only the characters in this class that participate in folds need be
-	 * checked.  Get the intersection of this class and all the possible
-	 * characters that are foldable.  This can quickly narrow down a large
-	 * class */
-	_invlist_intersection(PL_utf8_foldable, nonbitmap, &fold_intersection);
+            /* Only the characters in this class that participate in folds need
+             * be checked.  Get the intersection of this class and all the
+             * possible characters that are foldable.  This can quickly narrow
+             * down a large class */
+            _invlist_intersection(PL_utf8_foldable, cp_list,
+                                  &fold_intersection);
+        }
 
 	/* Now look at the foldable characters in this class individually */
 	invlist_iterinit(fold_intersection);
 	while (invlist_iternext(fold_intersection, &start, &end)) {
 	    UV j;
 
+            /* Locale folding for Latin1 characters is deferred until runtime */
+            if (LOC && start < 256) {
+                start = 256;
+            }
+
 	    /* Look at every character in the range */
 	    for (j = start; j <= end; j++) {
 
-		/* Get its fold */
 		U8 foldbuf[UTF8_MAXBYTES_CASE+1];
 		STRLEN foldlen;
-		const UV f =
-                    _to_uni_fold_flags(j, foldbuf, &foldlen,
-                                       (allow_full_fold) ? FOLD_FLAGS_FULL : 0);
+                UV f;
+
+                if (j < 256) {
+
+                    /* We have the latin1 folding rules hard-coded here so that
+                     * an innocent-looking character class, like /[ks]/i won't
+                     * have to go out to disk to find the possible matches.
+                     * XXX It would be better to generate these via regen, in
+                     * case a new version of the Unicode standard adds new
+                     * mappings, though that is not really likely, and may be
+                     * caught by the default: case of the switch below. */
+
+                    if (PL_fold_latin1[j] != j) {
+
+                        /* ASCII is always matched; non-ASCII is matched only
+                         * under Unicode rules */
+                        if (isASCII(j) || AT_LEAST_UNI_SEMANTICS) {
+                            cp_list =
+                                add_cp_to_invlist(cp_list, PL_fold_latin1[j]);
+                        }
+                        else {
+                            depends_list =
+                                add_cp_to_invlist(depends_list, PL_fold_latin1[j]);
+                        }
+                    }
+
+                    if (HAS_NONLATIN1_FOLD_CLOSURE(j)
+                        && (! isASCII(j) || ! MORE_ASCII_RESTRICTED))
+                    {
+                        /* Certain Latin1 characters have matches outside
+                         * Latin1, or are multi-character.  To get here, 'j' is
+                         * one of those characters.   None of these matches is
+                         * valid for ASCII characters under /aa, which is why
+                         * the 'if' just above excludes those.  The matches
+                         * fall into three categories:
+                         * 1) They are singly folded-to or -from an above 255
+                         *    character, e.g., LATIN SMALL LETTER Y WITH
+                         *    DIAERESIS and LATIN CAPITAL LETTER Y WITH
+                         *    DIAERESIS;
+                         * 2) They are part of a multi-char fold with another
+                         *    latin1 character; only LATIN SMALL LETTER
+                         *    SHARP S => "ss" fits this;
+                         * 3) They are part of a multi-char fold with a
+                         *    character outside of Latin1, such as various
+                         *    ligatures.
+                        * We aren't dealing fully with multi-char folds, except
+                        * we do deal with the pattern containing a character
+                        * that has a multi-char fold (not so much the inverse).
+                        * For types 1) and 3), the matches only happen when the
+                        * target string is utf8; that's not true for 2), and we
+                        * set a flag for it.
+                        *
+                        * The code below adds the single fold closures for 'j'
+                        * to the inversion list. */
+                        switch (j) {
+                            case 'k':
+                            case 'K':
+                                /* KELVIN SIGN */
+                                cp_list =
+                                    add_cp_to_invlist(cp_list, 0x212A);
+                                break;
+                            case 's':
+                            case 'S':
+                                /* LATIN SMALL LETTER LONG S */
+                                cp_list =
+                                    add_cp_to_invlist(cp_list, 0x017F);
+                                break;
+                            case MICRO_SIGN:
+                                cp_list = add_cp_to_invlist(cp_list,
+                                                    GREEK_SMALL_LETTER_MU);
+                                cp_list = add_cp_to_invlist(cp_list,
+                                                    GREEK_CAPITAL_LETTER_MU);
+                                break;
+                            case LATIN_CAPITAL_LETTER_A_WITH_RING_ABOVE:
+                            case LATIN_SMALL_LETTER_A_WITH_RING_ABOVE:
+                                /* ANGSTROM SIGN */
+                                cp_list =
+                                        add_cp_to_invlist(cp_list, 0x212B);
+                                break;
+                            case LATIN_SMALL_LETTER_Y_WITH_DIAERESIS:
+                                cp_list = add_cp_to_invlist(cp_list,
+                                        LATIN_CAPITAL_LETTER_Y_WITH_DIAERESIS);
+                                break;
+                            case LATIN_SMALL_LETTER_SHARP_S:
+                                cp_list = add_cp_to_invlist(cp_list,
+                                                LATIN_CAPITAL_LETTER_SHARP_S);
+
+                                /* Under /a, /d, and /u, this can match the two
+                                 * chars "ss" */
+                                if (! MORE_ASCII_RESTRICTED) {
+                                    add_alternate(&unicode_alternate,
+                                                  (U8 *) "ss", 2);
+
+                                    /* And under /u or /a, it can match even if
+                                     * the target is not utf8 */
+                                    if (AT_LEAST_UNI_SEMANTICS) {
+                                        ANYOF_FLAGS(ret) |=
+                                                    ANYOF_NONBITMAP_NON_UTF8;
+                                    }
+                                }
+                                break;
+                            case 'F': case 'f':
+                            case 'I': case 'i':
+                            case 'L': case 'l':
+                            case 'T': case 't':
+                            case 'A': case 'a':
+                            case 'H': case 'h':
+                            case 'J': case 'j':
+                            case 'N': case 'n':
+                            case 'W': case 'w':
+                            case 'Y': case 'y':
+                                /* These all are targets of multi-character
+                                 * folds from code points that require UTF8 to
+                                 * express, so they can't match unless the
+                                 * target string is in UTF-8, so no action here
+                                 * is necessary, as regexec.c properly handles
+                                 * the general case for UTF-8 matching */
+                                break;
+                            default:
+                                /* Use deprecated warning to increase the
+                                 * chances of this being output */
+                                ckWARN2regdep(RExC_parse, "Perl folding rules are not up-to-date for 0x%"UVXf"; please use the perlbug utility to report;", j);
+                                break;
+                        }
+                    }
+                    continue;
+                }
+
+                /* Here is an above Latin1 character.  We don't have the rules
+                 * hard-coded for it.  First, get its fold */
+		f = _to_uni_fold_flags(j, foldbuf, &foldlen,
+                                    ((allow_full_fold) ? FOLD_FLAGS_FULL : 0)
+                                    | ((LOC)
+                                        ? FOLD_FLAGS_LOCALE
+                                        : (MORE_ASCII_RESTRICTED)
+                                            ? FOLD_FLAGS_NOMIX_ASCII
+                                            : 0));
 
 		if (foldlen > (STRLEN)UNISKIP(f)) {
 
@@ -12040,54 +12045,25 @@ parseit:
 
 			/* If any of the folded characters of this are in the
 			 * Latin1 range, tell the regex engine that this can
-			 * match a non-utf8 target string.  The only multi-byte
-			 * fold whose source is in the Latin1 range (U+00DF)
-			 * applies only when the target string is utf8, or
-			 * under unicode rules */
-			if (j > 255 || AT_LEAST_UNI_SEMANTICS) {
-			    while (loc < e) {
-
-				/* Can't mix ascii with non- under /aa */
-				if (MORE_ASCII_RESTRICTED
-				    && (isASCII(*loc) != isASCII(j)))
-				{
-				    goto end_multi_fold;
-				}
-				if (UTF8_IS_INVARIANT(*loc)
-				    || UTF8_IS_DOWNGRADEABLE_START(*loc))
-				{
-                                    /* Can't mix above and below 256 under LOC
-                                     */
-				    if (LOC) {
-					goto end_multi_fold;
-				    }
-				    ANYOF_FLAGS(ret)
-					    |= ANYOF_NONBITMAP_NON_UTF8;
-				    break;
-				}
-				loc += UTF8SKIP(loc);
-			    }
-			}
+			 * match a non-utf8 target string.  */
+                        while (loc < e) {
+                            if (UTF8_IS_INVARIANT(*loc)
+                                || UTF8_IS_DOWNGRADEABLE_START(*loc))
+                            {
+                                ANYOF_FLAGS(ret)
+                                        |= ANYOF_NONBITMAP_NON_UTF8;
+                                break;
+                            }
+                            loc += UTF8SKIP(loc);
+                        }
 
 			add_alternate(&unicode_alternate, foldbuf, foldlen);
-		    end_multi_fold: ;
-		    }
-
-		    /* This is special-cased, as it is the only letter which
-		     * has both a multi-fold and single-fold in Latin1.  All
-		     * the other chars that have single and multi-folds are
-		     * always in utf8, and the utf8 folding algorithm catches
-		     * them */
-		    if (! LOC && j == LATIN_CAPITAL_LETTER_SHARP_S) {
-			stored += set_regclass_bit(pRExC_state,
-					ret,
-					LATIN_SMALL_LETTER_SHARP_S,
-					&l1_fold_invlist, &unicode_alternate);
 		    }
 		}
-		else {
-		    /* Single character fold.  Add everything in its fold
-		     * closure to the list that this node should match */
+                else {
+                    /* Single character fold of above Latin1.  Add everything
+                     * in its fold closure to the list that this node should
+                     * match */
 		    SV** listp;
 
 		    /* The fold closures data structure is a hash with the keys
@@ -12110,90 +12086,129 @@ parseit:
 			    /* /aa doesn't allow folds between ASCII and non-;
 			     * /l doesn't allow them between above and below
 			     * 256 */
-			    if ((MORE_ASCII_RESTRICTED
-				 && (isASCII(c) != isASCII(j)))
-				    || (LOC && ((c < 256) != (j < 256))))
+			    if ((MORE_ASCII_RESTRICTED && (isASCII(c) != isASCII(j)))
+				|| (LOC && ((c < 256) != (j < 256))))
 			    {
 				continue;
 			    }
 
-			    if (c < 256 && AT_LEAST_UNI_SEMANTICS) {
-				stored += set_regclass_bit(pRExC_state,
-					ret,
-					(U8) c,
-					&l1_fold_invlist, &unicode_alternate);
-			    }
-				/* It may be that the code point is already in
-				 * this range or already in the bitmap, in
-				 * which case we need do nothing */
-			    else if ((c < start || c > end)
-					&& (c > 255
-					    || ! ANYOF_BITMAP_TEST(ret, c)))
-			    {
-				nonbitmap = add_cp_to_invlist(nonbitmap, c);
+                            /* Folds involving non-ascii Latin1 characters
+                             * under /d are added to a separate list */
+			    if (isASCII(c) || c > 255 || AT_LEAST_UNI_SEMANTICS)
+                            {
+				cp_list = add_cp_to_invlist(cp_list, c);
+                            }
+                            else {
+                                depends_list = add_cp_to_invlist(depends_list, c);
 			    }
 			}
 		    }
 		}
-	    }
+            }
 	}
 	SvREFCNT_dec(fold_intersection);
     }
 
-    /* Combine the two lists into one. */
-    if (l1_fold_invlist) {
-	if (nonbitmap) {
-	    _invlist_union(nonbitmap, l1_fold_invlist, &nonbitmap);
-	    SvREFCNT_dec(l1_fold_invlist);
-	}
-	else {
-	    nonbitmap = l1_fold_invlist;
-	}
-    }
-
     /* And combine the result (if any) with any inversion list from properties.
      * The lists are kept separate up to now because we don't want to fold the
      * properties */
     if (properties) {
-	if (nonbitmap) {
-	    _invlist_union(nonbitmap, properties, &nonbitmap);
-	    SvREFCNT_dec(properties);
-	}
-	else {
-	    nonbitmap = properties;
-	}
+        if (AT_LEAST_UNI_SEMANTICS) {
+            if (cp_list) {
+                _invlist_union(cp_list, properties, &cp_list);
+                SvREFCNT_dec(properties);
+            }
+            else {
+                cp_list = properties;
+            }
+        }
+        else {
+
+            /* Under /d, we put the things that match only when the target
+             * string is utf8, into a separate list */
+            SV* nonascii_but_latin1_properties = NULL;
+            _invlist_intersection(properties, PL_Latin1,
+                                  &nonascii_but_latin1_properties);
+            _invlist_subtract(nonascii_but_latin1_properties, PL_ASCII,
+                              &nonascii_but_latin1_properties);
+            _invlist_subtract(properties, nonascii_but_latin1_properties,
+                              &properties);
+            if (cp_list) {
+                _invlist_union(cp_list, properties, &cp_list);
+                SvREFCNT_dec(properties);
+            }
+            else {
+                cp_list = properties;
+            }
+
+            if (depends_list) {
+                _invlist_union(depends_list, nonascii_but_latin1_properties,
+                               &depends_list);
+                SvREFCNT_dec(nonascii_but_latin1_properties);
+            }
+            else {
+                depends_list = nonascii_but_latin1_properties;
+            }
+        }
     }
 
-    /* Here, <nonbitmap> contains all the code points we can determine at
-     * compile time that we haven't put into the bitmap.  Go through it, and
-     * for things that belong in the bitmap, put them there, and delete from
-     * <nonbitmap> */
-    if (nonbitmap) {
+    /* Here, we have calculated what code points should be in the character
+     * class.
+     *
+     * Now we can see about various optimizations.  Fold calculation (which we
+     * did above) needs to take place before inversion.  Otherwise /[^k]/i
+     * would invert to include K, which under /i would match k, which it
+     * shouldn't. */
+
+    /* Optimize inverted simple patterns (e.g. [^a-z]).  Note that we haven't
+     * set the FOLD flag yet, so this does optimize those.  It doesn't
+     * optimize locale.  Doing so perhaps could be done as long as there is
+     * nothing like \w in it; some thought also would have to be given to the
+     * interaction with above 0x100 chars */
+    if ((ANYOF_FLAGS(ret) & ANYOF_INVERT)
+        && ! LOC
+	&& ! depends_list
+	&& ! unicode_alternate
+	&& SvCUR(listsv) == initial_listsv_len)
+    {
+        _invlist_invert(cp_list);
+
+        /* Any swash can't be used as-is, because we've inverted things */
+        if (swash) {
+            SvREFCNT_dec(swash);
+            swash = NULL;
+        }
 
-	/* Above-ASCII code points in /d have to stay in <nonbitmap>, as they
-	 * possibly only should match when the target string is UTF-8 */
-	UV max_cp_to_set = (DEPENDS_SEMANTICS) ? 127 : 255;
+	/* Clear the invert flag since have just done it here */
+	ANYOF_FLAGS(ret) &= ~ANYOF_INVERT;
+    }
+
+    /* Here, <cp_list> contains all the code points we can determine at
+     * compile time that match under all conditions.  Go through it, and
+     * for things that belong in the bitmap, put them there, and delete from
+     * <cp_list> */
+    if (cp_list) {
 
 	/* This gets set if we actually need to modify things */
 	bool change_invlist = FALSE;
 
 	UV start, end;
 
-	/* Start looking through <nonbitmap> */
-	invlist_iterinit(nonbitmap);
-	while (invlist_iternext(nonbitmap, &start, &end)) {
+	/* Start looking through <cp_list> */
+	invlist_iterinit(cp_list);
+	while (invlist_iternext(cp_list, &start, &end)) {
 	    UV high;
 	    int i;
 
 	    /* Quit if are above what we should change */
-	    if (start > max_cp_to_set) {
+	    if (start > 255) {
 		break;
 	    }
 
 	    change_invlist = TRUE;
 
 	    /* Set all the bits in the range, up to the max that we are doing */
-	    high = (end < max_cp_to_set) ? end : max_cp_to_set;
+	    high = (end < 255) ? end : 255;
 	    for (i = start; i <= (int) high; i++) {
 		if (! ANYOF_BITMAP_TEST(ret, i)) {
 		    ANYOF_BITMAP_SET(ret, i);
@@ -12205,140 +12220,27 @@ parseit:
 	}
 
         /* Done with loop; remove any code points that are in the bitmap from
-         * <nonbitmap> */
+         * <cp_list> */
 	if (change_invlist) {
-	    _invlist_subtract(nonbitmap,
-		              (DEPENDS_SEMANTICS)
-			        ? PL_ASCII
-			        : PL_Latin1,
-                              &nonbitmap);
+	    _invlist_subtract(cp_list, PL_Latin1, &cp_list);
 	}
 
 	/* If have completely emptied it, remove it completely */
-	if (invlist_len(nonbitmap) == 0) {
-	    SvREFCNT_dec(nonbitmap);
-	    nonbitmap = NULL;
+	if (invlist_len(cp_list) == 0) {
+	    SvREFCNT_dec(cp_list);
+	    cp_list = NULL;
 	}
     }
 
-    /* Here, we have calculated what code points should be in the character
-     * class.  <nonbitmap> does not overlap the bitmap except possibly in the
-     * case of DEPENDS rules.
-     *
-     * Now we can see about various optimizations.  Fold calculation (which we
-     * did above) needs to take place before inversion.  Otherwise /[^k]/i
-     * would invert to include K, which under /i would match k, which it
-     * shouldn't. */
-
-    /* Optimize inverted simple patterns (e.g. [^a-z]).  Note that we haven't
-     * set the FOLD flag yet, so this does optimize those.  It doesn't
-     * optimize locale.  Doing so perhaps could be done as long as there is
-     * nothing like \w in it; some thought also would have to be given to the
-     * interaction with above 0x100 chars */
-    if ((ANYOF_FLAGS(ret) & ANYOF_INVERT)
-        && ! LOC
-	&& ! unicode_alternate
-	/* In case of /d, there are some things that should match only when in
-	 * not in the bitmap, i.e., they require UTF8 to match.  These are
-	 * listed in nonbitmap, but if ANYOF_NONBITMAP_NON_UTF8 is set in this
-	 * case, they don't require UTF8, so can invert here */
-	&& (! nonbitmap
-	    || ! DEPENDS_SEMANTICS
-	    || (ANYOF_FLAGS(ret) & ANYOF_NONBITMAP_NON_UTF8))
-	&& SvCUR(listsv) == initial_listsv_len)
-    {
-	int i;
-	if (! nonbitmap) {
-	    for (i = 0; i < 256; ++i) {
-		if (ANYOF_BITMAP_TEST(ret, i)) {
-		    ANYOF_BITMAP_CLEAR(ret, i);
-		}
-		else {
-		    ANYOF_BITMAP_SET(ret, i);
-		    prevvalue = value;
-		    value = i;
-		}
-	    }
-	    /* The inversion means that everything above 255 is matched */
-	    ANYOF_FLAGS(ret) |= ANYOF_UNICODE_ALL;
+    /* Combine the two lists into one. */
+    if (depends_list) {
+	if (cp_list) {
+	    _invlist_union(cp_list, depends_list, &cp_list);
+	    SvREFCNT_dec(depends_list);
 	}
 	else {
-	    /* Here, also has things outside the bitmap that may overlap with
-	     * the bitmap.  We have to sync them up, so that they get inverted
-	     * in both places.  Earlier, we removed all overlaps except in the
-	     * case of /d rules, so no syncing is needed except for this case
-	     */
-	    SV *remove_list = NULL;
-
-	    if (DEPENDS_SEMANTICS) {
-		UV start, end;
-
-		/* Set the bits that correspond to the ones that aren't in the
-		 * bitmap.  Otherwise, when we invert, we'll miss these.
-		 * Earlier, we removed from the nonbitmap all code points
-		 * < 128, so there is no extra work here */
-		invlist_iterinit(nonbitmap);
-		while (invlist_iternext(nonbitmap, &start, &end)) {
-		    if (start > 255) {  /* The bit map goes to 255 */
-			break;
-		    }
-		    if (end > 255) {
-			end = 255;
-		    }
-		    for (i = start; i <= (int) end; ++i) {
-			ANYOF_BITMAP_SET(ret, i);
-			prevvalue = value;
-			value = i;
-		    }
-		}
-	    }
-
-	    /* Now invert both the bitmap and the nonbitmap.  Anything in the
-	     * bitmap has to also be removed from the non-bitmap, but again,
-	     * there should not be overlap unless is /d rules. */
-	    _invlist_invert(nonbitmap);
-
-	    /* Any swash can't be used as-is, because we've inverted things */
-	    if (swash) {
-		SvREFCNT_dec(swash);
-		swash = NULL;
-	    }
-
-	    for (i = 0; i < 256; ++i) {
-		if (ANYOF_BITMAP_TEST(ret, i)) {
-		    ANYOF_BITMAP_CLEAR(ret, i);
-		    if (DEPENDS_SEMANTICS) {
-			if (! remove_list) {
-			    remove_list = _new_invlist(2);
-			}
-			remove_list = add_cp_to_invlist(remove_list, i);
-		    }
-		}
-		else {
-		    ANYOF_BITMAP_SET(ret, i);
-		    prevvalue = value;
-		    value = i;
-		}
-	    }
-
-	    /* And do the removal */
-	    if (DEPENDS_SEMANTICS) {
-		if (remove_list) {
-		    _invlist_subtract(nonbitmap, remove_list, &nonbitmap);
-		    SvREFCNT_dec(remove_list);
-		}
-	    }
-	    else {
-		/* There is no overlap for non-/d, so just delete anything
-		 * below 256 */
-		_invlist_intersection(nonbitmap, PL_AboveLatin1, &nonbitmap);
-	    }
+	    cp_list = depends_list;
 	}
-
-	stored = 256 - stored;
-
-	/* Clear the invert flag since have just done it here */
-	ANYOF_FLAGS(ret) &= ~ANYOF_INVERT;
     }
 
     /* Folding in the bitmap is taken care of above, but not for locale (for
@@ -12348,7 +12250,7 @@ parseit:
      * run-time fold flag for these */
     if (FOLD && (LOC
 		|| (DEPENDS_SEMANTICS
-		    && nonbitmap
+		    && cp_list
 		    && ! (ANYOF_FLAGS(ret) & ANYOF_NONBITMAP_NON_UTF8))
 		|| unicode_alternate))
     {
@@ -12369,7 +12271,7 @@ parseit:
      * characters which only have the two folds; so things like 'fF' and 'Ii'
      * wouldn't work because they are part of the fold of 'LATIN SMALL LIGATURE
      * FI'. */
-    if (! nonbitmap
+    if (! cp_list
 	&& ! unicode_alternate
 	&& SvCUR(listsv) == initial_listsv_len
 	&& ! (ANYOF_FLAGS(ret) & (ANYOF_INVERT|ANYOF_UNICODE_ALL))
@@ -12456,7 +12358,7 @@ parseit:
 	SvREFCNT_dec(swash);
 	swash = NULL;
     }
-    if (! nonbitmap
+    if (! cp_list
 	&& SvCUR(listsv) == initial_listsv_len
 	&& ! unicode_alternate)
     {
@@ -12473,7 +12375,7 @@ parseit:
 	 *       swash is stored there now.
 	 * av[2] stores the multicharacter foldings, used later in
 	 *       regexec.c:S_reginclass().
-	 * av[3] stores the nonbitmap inversion list for use in addition or
+	 * av[3] stores the cp_list inversion list for use in addition or
 	 *       instead of av[0]; not used if av[1] isn't NULL
 	 * av[4] is set if any component of the class is from a user-defined
 	 *       property; not used if av[1] isn't NULL */
@@ -12485,12 +12387,12 @@ parseit:
 			: listsv);
 	if (swash) {
 	    av_store(av, 1, swash);
-	    SvREFCNT_dec(nonbitmap);
+	    SvREFCNT_dec(cp_list);
 	}
 	else {
 	    av_store(av, 1, NULL);
-	    if (nonbitmap) {
-		av_store(av, 3, nonbitmap);
+	    if (cp_list) {
+		av_store(av, 3, cp_list);
 		av_store(av, 4, newSVuv(has_user_defined_property));
 	    }
 	}
diff --git a/regcomp.sym b/regcomp.sym
index a1eec5b..0865a73 100644
--- a/regcomp.sym
+++ b/regcomp.sym
@@ -31,11 +31,17 @@ EOS         EOL,        no        ; Match "" at end of string.
 EOL         EOL,        no        ; Match "" at end of line.
 MEOL        EOL,        no        ; Same, assuming multiline.
 SEOL        EOL,        no        ; Same, assuming singleline.
+# The regops that have varieties that vary depending on the character set regex
+# modifiers have to ordered thusly: /d, /l, /u, /a, /aa.  This is because code
+# in regcomp.c uses the enum value of the modifier as an offset from the /d
+# version.  The complements must come after the non-complements.
+# BOUND, ALNUM, SPACE, DIGIT, and their complements are affected, as well as
+# EXACTF.
 BOUND       BOUND,      no        ; Match "" at any word boundary using native charset semantics for non-utf8
 BOUNDL      BOUND,      no        ; Match "" at any locale word boundary
 BOUNDU      BOUND,      no        ; Match "" at any word boundary using Unicode semantics
 BOUNDA      BOUND,      no         ; Match "" at any word boundary using ASCII semantics
-# All NBOUND nodes are required by a line regexec.c to be greater than all BOUND ones
+# All NBOUND nodes are required by code in regexec.c to be greater than all BOUND ones
 NBOUND      NBOUND,     no        ; Match "" at any word non-boundary using native charset semantics for non-utf8
 NBOUNDL     NBOUND,     no        ; Match "" at any locale word non-boundary
 NBOUNDU     NBOUND,     no        ; Match "" at any word non-boundary using Unicode semantics
@@ -49,6 +55,11 @@ SANY        REG_ANY,    no 0 S    ; Match any one character.
 CANY        REG_ANY,    no 0 S    ; Match any one byte.
 ANYOF       ANYOF,      sv 0 S    ; Match character in (or not in) this class, single char match only
 ANYOFV      ANYOF,      sv 0 V    ; Match character in (or not in) this class, can match-multiple chars
+
+# Order (within each group) of the below is important.  See ordering comment
+# above.  The PLACEHOLDERn ones are wasting a value.  Right now, we have plenty
+# to spare, but these would be obvious candidates if ever we ran out of node
+# types in a U8.
 ALNUM       ALNUM,      no 0 S    ; Match any alphanumeric character using native charset semantics for non-utf8
 ALNUML      ALNUM,      no 0 S    ; Match any alphanumeric char in locale
 ALNUMU      ALNUM,      no 0 S    ; Match any alphanumeric char using Unicode semantics
@@ -67,10 +78,14 @@ NSPACEU     NSPACE,     no 0 S    ; Match any non-whitespace char using Unicode
 NSPACEA     NSPACE,     no 0 S    ; Match [^ \t\n\f\r]
 DIGIT       DIGIT,      no 0 S    ; Match any numeric character using native charset semantics for non-utf8
 DIGITL      DIGIT,      no 0 S    ; Match any numeric character in locale
+PLACEHOLDER1 NOTHING,   no        ; placeholder for missing DIGITU
 DIGITA      DIGIT,      no 0 S    ; Match [0-9]
 NDIGIT      NDIGIT,     no 0 S    ; Match any non-numeric character using native charset semantics for non-utf8
 NDIGITL     NDIGIT,     no 0 S    ; Match any non-numeric character in locale
+PLACEHOLDER2 NOTHING,   no        ; placeholder for missing NDIGITU
 NDIGITA     NDIGIT,     no 0 S    ; Match [^0-9]
+# End of order is important (within groups)
+
 CLUMP       CLUMP,      no 0 V    ; Match any extended grapheme cluster sequence
 
 #* Alternation
@@ -98,9 +113,9 @@ EXACT       EXACT,      str       ; Match this string (preceded by length).
 EXACTF      EXACT,      str       ; Match this non-UTF-8 string (not guaranteed to be folded) using /id rules (w/len).
 EXACTFL     EXACT,      str       ; Match this string (not guaranteed to be folded) using /il rules (w/len).
 EXACTFU     EXACT,      str	  ; Match this string (folded iff in UTF-8, length in folding doesn't change if not in UTF-8) using /iu rules (w/len).
+EXACTFA     EXACT,      str	  ; Match this string (not guaranteed to be folded) using /iaa rules (w/len).
 EXACTFU_SS  EXACT,      str	  ; Match this string (folded iff in UTF-8, length in folding may change even if not in UTF-8) using /iu rules (w/len).
 EXACTFU_TRICKYFOLD EXACT,  str	  ; Match this folded UTF-8 string using /iu rules
-EXACTFA     EXACT,      str	  ; Match this string (not guaranteed to be folded) using /iaa rules (w/len).
 
 #*Do nothing types
 
@@ -212,6 +227,9 @@ KEEPS       KEEPS,      no        ; $& begins here.
 
 #*New charclass like patterns
 LNBREAK     LNBREAK,    none      ; generic newline pattern
+
+# regcomp.c expects the node number of the complement to be one greater than
+# the non-complement
 VERTWS      VERTWS,     none 0 S  ; vertical whitespace         (Perl 6)
 NVERTWS     NVERTWS,    none 0 S  ; not vertical whitespace     (Perl 6)
 HORIZWS     HORIZWS,    none 0 S  ; horizontal whitespace       (Perl 6)
diff --git a/regnodes.h b/regnodes.h
index 14bac24..84096d6 100644
--- a/regnodes.h
+++ b/regnodes.h
@@ -6,8 +6,8 @@
 
 /* Regops and State definitions */
 
-#define REGNODE_MAX           	112
-#define REGMATCH_STATE_MAX    	152
+#define REGNODE_MAX           	114
+#define REGMATCH_STATE_MAX    	154
 
 #define	END                   	0	/* 0000 End of program. */
 #define	SUCCEED               	1	/* 0x01 Return from a subroutine, basically. */
@@ -50,78 +50,80 @@
 #define	NSPACEA               	38	/* 0x26 Match [^ \t\n\f\r] */
 #define	DIGIT                 	39	/* 0x27 Match any numeric character using native charset semantics for non-utf8 */
 #define	DIGITL                	40	/* 0x28 Match any numeric character in locale */
-#define	DIGITA                	41	/* 0x29 Match [0-9] */
-#define	NDIGIT                	42	/* 0x2a Match any non-numeric character using native charset semantics for non-utf8 */
-#define	NDIGITL               	43	/* 0x2b Match any non-numeric character in locale */
-#define	NDIGITA               	44	/* 0x2c Match [^0-9] */
-#define	CLUMP                 	45	/* 0x2d Match any extended grapheme cluster sequence */
-#define	BRANCH                	46	/* 0x2e Match this alternative, or the next... */
-#define	BACK                  	47	/* 0x2f Match "", "next" ptr points backward. */
-#define	EXACT                 	48	/* 0x30 Match this string (preceded by length). */
-#define	EXACTF                	49	/* 0x31 Match this non-UTF-8 string (not guaranteed to be folded) using /id rules (w/len). */
-#define	EXACTFL               	50	/* 0x32 Match this string (not guaranteed to be folded) using /il rules (w/len). */
-#define	EXACTFU               	51	/* 0x33 Match this string (folded iff in UTF-8, length in folding doesn't change if not in UTF-8) using /iu rules (w/len). */
-#define	EXACTFU_SS            	52	/* 0x34 Match this string (folded iff in UTF-8, length in folding may change even if not in UTF-8) using /iu rules (w/len). */
-#define	EXACTFU_TRICKYFOLD    	53	/* 0x35 Match this folded UTF-8 string using /iu rules */
+#define	PLACEHOLDER1          	41	/* 0x29 placeholder for missing DIGITU */
+#define	DIGITA                	42	/* 0x2a Match [0-9] */
+#define	NDIGIT                	43	/* 0x2b Match any non-numeric character using native charset semantics for non-utf8 */
+#define	NDIGITL               	44	/* 0x2c Match any non-numeric character in locale */
+#define	PLACEHOLDER2          	45	/* 0x2d placeholder for missing NDIGITU */
+#define	NDIGITA               	46	/* 0x2e Match [^0-9] */
+#define	CLUMP                 	47	/* 0x2f Match any extended grapheme cluster sequence */
+#define	BRANCH                	48	/* 0x30 Match this alternative, or the next... */
+#define	BACK                  	49	/* 0x31 Match "", "next" ptr points backward. */
+#define	EXACT                 	50	/* 0x32 Match this string (preceded by length). */
+#define	EXACTF                	51	/* 0x33 Match this non-UTF-8 string (not guaranteed to be folded) using /id rules (w/len). */
+#define	EXACTFL               	52	/* 0x34 Match this string (not guaranteed to be folded) using /il rules (w/len). */
+#define	EXACTFU               	53	/* 0x35 Match this string (folded iff in UTF-8, length in folding doesn't change if not in UTF-8) using /iu rules (w/len). */
 #define	EXACTFA               	54	/* 0x36 Match this string (not guaranteed to be folded) using /iaa rules (w/len). */
-#define	NOTHING               	55	/* 0x37 Match empty string. */
-#define	TAIL                  	56	/* 0x38 Match empty string. Can jump here from outside. */
-#define	STAR                  	57	/* 0x39 Match this (simple) thing 0 or more times. */
-#define	PLUS                  	58	/* 0x3a Match this (simple) thing 1 or more times. */
-#define	CURLY                 	59	/* 0x3b Match this simple thing {n,m} times. */
-#define	CURLYN                	60	/* 0x3c Capture next-after-this simple thing */
-#define	CURLYM                	61	/* 0x3d Capture this medium-complex thing {n,m} times. */
-#define	CURLYX                	62	/* 0x3e Match this complex thing {n,m} times. */
-#define	WHILEM                	63	/* 0x3f Do curly processing and see if rest matches. */
-#define	OPEN                  	64	/* 0x40 Mark this point in input as start of */
-#define	CLOSE                 	65	/* 0x41 Analogous to OPEN. */
-#define	REF                   	66	/* 0x42 Match some already matched string */
-#define	REFF                  	67	/* 0x43 Match already matched string, folded using native charset semantics for non-utf8 */
-#define	REFFL                 	68	/* 0x44 Match already matched string, folded in loc. */
-#define	REFFU                 	69	/* 0x45 Match already matched string, folded using unicode semantics for non-utf8 */
-#define	REFFA                 	70	/* 0x46 Match already matched string, folded using unicode semantics for non-utf8, no mixing ASCII, non-ASCII */
-#define	NREF                  	71	/* 0x47 Match some already matched string */
-#define	NREFF                 	72	/* 0x48 Match already matched string, folded using native charset semantics for non-utf8 */
-#define	NREFFL                	73	/* 0x49 Match already matched string, folded in loc. */
-#define	NREFFU                	74	/* 0x4a Match already matched string, folded using unicode semantics for non-utf8 */
-#define	NREFFA                	75	/* 0x4b Match already matched string, folded using unicode semantics for non-utf8, no mixing ASCII, non-ASCII */
-#define	IFMATCH               	76	/* 0x4c Succeeds if the following matches. */
-#define	UNLESSM               	77	/* 0x4d Fails if the following matches. */
-#define	SUSPEND               	78	/* 0x4e "Independent" sub-RE. */
-#define	IFTHEN                	79	/* 0x4f Switch, should be preceded by switcher . */
-#define	GROUPP                	80	/* 0x50 Whether the group matched. */
-#define	LONGJMP               	81	/* 0x51 Jump far away. */
-#define	BRANCHJ               	82	/* 0x52 BRANCH with long offset. */
-#define	EVAL                  	83	/* 0x53 Execute some Perl code. */
-#define	MINMOD                	84	/* 0x54 Next operator is not greedy. */
-#define	LOGICAL               	85	/* 0x55 Next opcode should set the flag only. */
-#define	RENUM                 	86	/* 0x56 Group with independently numbered parens. */
-#define	TRIE                  	87	/* 0x57 Match many EXACT(F[ALU]?)? at once. flags==type */
-#define	TRIEC                 	88	/* 0x58 Same as TRIE, but with embedded charclass data */
-#define	AHOCORASICK           	89	/* 0x59 Aho Corasick stclass. flags==type */
-#define	AHOCORASICKC          	90	/* 0x5a Same as AHOCORASICK, but with embedded charclass data */
-#define	GOSUB                 	91	/* 0x5b recurse to paren arg1 at (signed) ofs arg2 */
-#define	GOSTART               	92	/* 0x5c recurse to start of pattern */
-#define	NGROUPP               	93	/* 0x5d Whether the group matched. */
-#define	INSUBP                	94	/* 0x5e Whether we are in a specific recurse. */
-#define	DEFINEP               	95	/* 0x5f Never execute directly. */
-#define	ENDLIKE               	96	/* 0x60 Used only for the type field of verbs */
-#define	OPFAIL                	97	/* 0x61 Same as (?!) */
-#define	ACCEPT                	98	/* 0x62 Accepts the current matched string. */
-#define	VERB                  	99	/* 0x63 Used only for the type field of verbs */
-#define	PRUNE                 	100	/* 0x64 Pattern fails at this startpoint if no-backtracking through this */
-#define	MARKPOINT             	101	/* 0x65 Push the current location for rollback by cut. */
-#define	SKIP                  	102	/* 0x66 On failure skip forward (to the mark) before retrying */
-#define	COMMIT                	103	/* 0x67 Pattern fails outright if backtracking through this */
-#define	CUTGROUP              	104	/* 0x68 On failure go to the next alternation in the group */
-#define	KEEPS                 	105	/* 0x69 $& begins here. */
-#define	LNBREAK               	106	/* 0x6a generic newline pattern */
-#define	VERTWS                	107	/* 0x6b vertical whitespace         (Perl 6) */
**** PATCH TRUNCATED AT 2000 LINES -- 396 NOT SHOWN ****

--
Perl5 Master Repository



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About