Since this causes Perl to hang, I think it should be addressed somehow in 5.12.1. It may be that the thing to do is just document it. It's been around since 2007. I'm still looking at how things are done currently, and a number of things appear wrong to me, but that's an initial take, subject to further consideration. Father Chrysostomos wrote: > > On Apr 27, 2010, at 6:56 PM, karl williamson wrote: > >> To summarize, I propose that we use Unicode's XID_Start and >> XID_Continue properties in 5.14, even though that breaks one of our >> tests, and possibly existing code. > > Would we change the meanings of is_utf8_idcont and is_utf8_idfirst, or > introduce new functions? My first take is that I think we would just change the meanings. The differences are quite minimal. ID_Start contains 23 more characters than XID_Start: 037A GREEK YPOGEGRAMMENI 0E33 THAI CHARACTER SARA AM 0EB3 LAO VOWEL SIGN AM 309B KATAKANA-HIRAGANA VOICED SOUND MARK 309C KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK FC5E ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM FC5F ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM FC60 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM FC61 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM FC62 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM FC63 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM FDFB ARABIC LIGATURE JALLAJALALOUHOU FE70 ARABIC FATHATAN ISOLATED FORM FE72 ARABIC DAMMATAN ISOLATED FORM FE74 ARABIC KASRATAN ISOLATED FORM FE76 ARABIC FATHA ISOLATED FORM FE78 ARABIC DAMMA ISOLATED FORM FE7A ARABIC KASRA ISOLATED FORM FE7C ARABIC SHADDA ISOLATED FORM FE7E ARABIC SUKUN ISOLATED FORM FF9E HALFWIDTH KATAKANA VOICED SOUND MARK FF9F HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK And ID_Continue contains 19 more characters than XID_Continue: 037A GREEK YPOGEGRAMMENI 309B KATAKANA-HIRAGANA VOICED SOUND MARK 309C KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK FC5E ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM FC5F ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM FC60 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM FC61 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM FC62 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM FC63 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM FDFB ARABIC LIGATURE JALLAJALALOUHOU FE70 ARABIC FATHATAN ISOLATED FORM FE72 ARABIC DAMMATAN ISOLATED FORM FE74 ARABIC KASRATAN ISOLATED FORM FE76 ARABIC FATHA ISOLATED FORM FE78 ARABIC DAMMA ISOLATED FORM FE7A ARABIC KASRA ISOLATED FORM FE7C ARABIC SHADDA ISOLATED FORM FE7E ARABIC SUKUN ISOLATED FORM So the differences are minimal; we would be recognizing 23 or 19 fewer characters by going with the X versions. You can tell from some of the names why it was wrong to put them in the original versions. But I need to further study things to come up with a recommendation > > In anticipation of this change, I’ve attached a patch that corrects the > test in utf8.t to use ¡ instead of ·. I’ve also moved the test outside > of the eval, so it will still run (and fail) if the compilation fails, > instead of causing an invalid test count. > Thanks. Have you considered adding a timeout? test.pl has one that will kill the test script if Perl hangs.Thread Previous | Thread Next