On Wed, Nov 24, 2010 at 8:22 PM, karl williamson <public@khwilliamson.com> wrote: > > Chip Salzenberg wrote: >> >> I'm experimenting with some text scanning code against a very large corpus, >> and I've got Perl 5.12.1 dying on this assertion failure: >> >> perl: utf8.c:1997: Perl_swash_fetch: Assertion `klen <= sizeof(PL_last_swash_key)' failed. >> >> > > Here's a relevant comment: > /* Given a UTF-X encoded char 0xAA..0xYY,0xZZ > * then the "swatch" is a vec() for all the chars which start > * with 0xAA..0xYY > * So the key in the hash (klen) is length of encoded char -1 > */ > I've uncovered the string that's causing this problem. When the attached string has the utf8 bit enabled and a regex is applied, Perl dies with the above exception. Fortunately, utf8::valid() returns false, so I have an easy way of avoiding this particular crash. But it's still a Perl bug that should be fixed - assertion failures should not result from applying a regex, even to an invalid utf8 string. PS: The proximate source of the invalid string is Encode::Guess. But, well, its name does say "guess" after all.Thread Previous | Thread Next