develooper Front page | perl.perl4lib | Postings from May 2008

Re: Stripping out Unicode combining characters (diacritics)

Thread Previous | Thread Next
From:
David Kaufman
Date:
May 7, 2008 03:33
Subject:
Re: Stripping out Unicode combining characters (diacritics)
Hi Michael,

"Doran, Michael D" <doran@uta.edu> wrote:

> I'm trying to strip out combining diacritics from some form input using 
> this code:
> [...]
> $sans_diacritics  =~ s/\p{M}*//g;

I do it like this:

use Encode;
use Unicode::Normalize qw(normalize);

my $ascii = encode('ascii', normalize('KD', $utf8), sub { $_[0]='' });




Thread Previous | Thread Next


Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About