* George Greer <perl@greerga.m-l.org> [2011-04-29 15:45]: > Correct. Going back to the original (somewhat nonsensical[1]) > regex that triggered this problem: > > /[^\x00-\x1f\x7f-\xff :]+:/i > > So "s" is an acceptable part of the regex but due to > multi-character case folding "ss" is not. So you have the > peculiar case that: > > "s s" =~ /^[^\xDF]+$/i => Y > "ss" =~ /^[^\xDF]+$/i => N > > which can end up very surprising when your word isn't German > and the only reason \xDF is in the list is because it was > caught in a range. It’s surprising even when your word is German. I think the orthography reform has made it so you can always substitute a double s for a sharp s. (If memory serves, this was not always the case before. I’m unsure on both counts.) But you can definitely not replace any old double s by a sharp s. The canonical example is “Wasser”: spelling it “Waßer” has always been an error and so it remains. This means the regex engine cannot make *any* reasonable guess whatsoever at which match is desired or even acceptable in any particular case without the user indicating it explicitly. I’m iffy about the entire notion of multi-character case folds (for regex matching), outside of designated pure ligatures. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>Thread Previous | Thread Next