develooper Front page | perl.i18n | Postings from May 2008

Re: Stripping out Unicode combining characters (diacritics)

Thread Previous | Thread Next
From:
Mike Rylander
Date:
May 6, 2008 04:56
Subject:
Re: Stripping out Unicode combining characters (diacritics)
Message ID:
b918cf3d0805051852n6d5df98hfb82ecaf408dc349@mail.gmail.com
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <doran@uta.edu> wrote:
[snip]
>
>  I'm pulling my hair out on this... so any help would be appreciated.  If there's any other info I can provide, let me know.
>

You'll want to transform the text to NFD format (nominally, base
characters plus combining marks) instead of NFC (precombined
characters) using Unicode::Normalize:

 use Unicode::Normalize;

 my $text = NFD($original);
 $text =~ s/\pM+//go;

Hope that helps.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: miker@esilibrary.com
 | web: http://www.esilibrary.com

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About