* Karl Williamson <public@khwilliamson.com> [2016-07-06 21:48]: > I'm looking for some more ideas. It seems to me you framed the problem as being about runs, and then got stuck in that frame for its design, which would seem to imply that this has something to do with quantifiers. Really though, what you want is to match another character that is like some other already-matched character. That doesn’t need to involve any sort of quantifier. The pattern matching language already has a way of saying “give me more of the same as this”: you capture the thing you want to match, then you use a backref to match more of it. (?x) ( [=,.-] {2} ) \1+ So the most generic primitive for match script runs would be one which takes a template string from either a literal or a backreference, looks at the script of each character, then matches a sequence of characters that has the same sequence of scripts. (Very vaguely this is akin to the samecase/samemark methods in Perl 6.) Something like (.)(*script:\1)* (No idea what the spelling of the primitive ought to look like. I keep thinking this belongs into the \p{} syntax, but it fits oddly there for multiple reasons.) That’s obviously too clunky in daily use, so it’s clearly not the end point of the design. Some flag may turn out to be the right answer. But I think this is the right place to start, or at least the right initial direction. Once we have a properly composable primitive we can look at what people need/use it for most (or how it trips them up most often!), and iterate in that direction. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>Thread Previous | Thread Next