Here's a little something that someone might consider doing over the holiday season, a nice way to get to know UTF-8 if someone feels the urge or interest to do so. There are (*still*, after much cleaning) various spots all over the code that do UTF-8 parsing 'from scratch' without using the utilities provided by utf8.c. There are for example three spots that output "Malformed UTF-8 character" which are not the utf8.c:utf8_to_uv(). Another good way to find such places is to grep for '0x[8c]0'. If someone would clean up all such naughty spots using either the functions (utf8_to_uv(), uv_to_utf8(), utf8_length(), utf8_distance(), utf8_hop(), etc.) of utf8.c, or the macros (UTF8_XXX, UNIYYY) of utf8.h, I would be very grateful. (For a quick brush-up on the UTF-8 encoding and other character code issues, surf over to http://www.czyborra.com/) (It's for a function, say, pp_foo(), to do foo on UTF-8 strings, there is no need to migrate that specific code to utf8.c, it's just the repeated (and therefore error-prone) UTF-8 code that worries me.) (One detail comes to my mind: I would like to see a safer utf8_hop(): a 'start' pointer, and a 'length', so that the pointer couldn't wander off the buffer boundaries.) -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack CohenThread Next