develooper Front page | perl.i18n | Postings from May 2008

Re: Stripping out Unicode combining characters (diacritics)

Thread Previous | Thread Next
Mike Rylander
May 6, 2008 04:56
Re: Stripping out Unicode combining characters (diacritics)
Message ID:
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <> wrote:
>  I'm pulling my hair out on this... so any help would be appreciated.  If there's any other info I can provide, let me know.

You'll want to transform the text to NFD format (nominally, base
characters plus combining marks) instead of NFC (precombined
characters) using Unicode::Normalize:

 use Unicode::Normalize;

 my $text = NFD($original);
 $text =~ s/\pM+//go;

Hope that helps.

Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email:
 | web:

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About