develooper Front page | perl.i18n | Postings from May 2008

Re: Stripping out Unicode combining characters (diacritics)

Thread Previous | Thread Next
From:
David Kaufman
Date:
May 7, 2008 03:34
Subject:
Re: Stripping out Unicode combining characters (diacritics)
Message ID:
20080506235300.2034.qmail@lists.develooper.com
Hi Michael,

"Doran, Michael D" <doran@uta.edu> wrote:

> I'm trying to strip out combining diacritics from some form input using 
> this code:
> [...]
> $sans_diacritics  =~ s/\p{M}*//g;

I do it like this:

use Encode;
use Unicode::Normalize qw(normalize);

my $ascii = encode('ascii', normalize('KD', $utf8), sub { $_[0]='' });




Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About