develooper Front page | perl.perl5.porters | Postings from December 2000

nice little task for someone

Thread Next
From:
Jarkko Hietaniemi
Date:
December 21, 2000 16:31
Subject:
nice little task for someone
Message ID:
20001221183143.J29816@chaos.wustl.edu
Here's a little something that someone might consider doing over
the holiday season, a nice way to get to know UTF-8 if someone
feels the urge or interest to do so.

There are (*still*, after much cleaning) various spots all over the
code that do UTF-8 parsing 'from scratch' without using the utilities
provided by utf8.c.  There are for example three spots that output
"Malformed UTF-8 character" which are not the utf8.c:utf8_to_uv().
Another good way to find such places is to grep for '0x[8c]0'.

If someone would clean up all such naughty spots using either the
functions (utf8_to_uv(), uv_to_utf8(), utf8_length(), utf8_distance(),
utf8_hop(), etc.) of utf8.c, or the macros (UTF8_XXX, UNIYYY) of
utf8.h, I would be very grateful.

(For a quick brush-up on the UTF-8 encoding and other character code
issues, surf over to http://www.czyborra.com/) (It's for a function,
say, pp_foo(), to do foo on UTF-8 strings, there is no need to migrate
that specific code to utf8.c, it's just the repeated (and therefore
error-prone) UTF-8 code that worries me.)

(One detail comes to my mind: I would like to see a safer utf8_hop():
 a 'start' pointer, and a 'length', so that the pointer couldn't wander
 off the buffer boundaries.)

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About