optimizing /\s*,\s*/

Jeff 'japhy' Pinyan
September 8, 2003 10:37
optimizing /\s*,\s*/
Message ID:
A lot of times, when people want to split a string into comma-separated
fields, they use something like

  @fields = split /\s*,\s*/, $string;

Yes, naive, whatever, that's not the point.  The point is that the regex
engine matches \s*, and then looks for it to be followed by a comma.
Could the engine be optimized to search FIRST for the NON-OPTIONAL comma,
and then match all immediately preceding whitespace?  That is, on a string
like "abc  def , ghi,...", the engine would first find the , and then
subtract one from the beginning index of the match while the preceding
character is whitespace?

I'm not sure I know enough to implement this, but I'd think there'd be an
improvement, especially in cases where the optional piece (\s*) is found
frequently in the string.

Jeff "japhy" Pinyan
