develooper Front page | perl.perl5.porters | Postings from August 2001

Re: Unicode Normalization Forms

Thread Next
From:
SADAHIRO Tomoyuki
Date:
August 10, 2001 19:36
Subject:
Re: Unicode Normalization Forms
Message ID:
20010811112922.DDB2.BQW10602@nifty.com
Hello, everyone.

Unicode::Normalize 0.03 is unloaded.

NAME
Unicode::Normalize - normalized forms of Unicode text

SYNOPSIS

  use Unicode::Normalize;

  $string_NFD  = NFD($raw_string);  # Normalization Form D
  $string_NFC  = NFC($raw_string);  # Normalization Form C
  $string_NFKD = NFKD($raw_string); # Normalization Form KD
  $string_NFKC = NFKC($raw_string); # Normalization Form KC

   or

  use Unicode::Normalize 'normalize';

  $string_NFD  = normalize('D',  $raw_string); # Normalization Form D
  $string_NFC  = normalize('C',  $raw_string); # Normalization Form C
  $string_NFKD = normalize('KD', $raw_string); # Normalization Form KD
  $string_NFKC = normalize('KC', $raw_string); # Normalization Form KC

e.g. you say

    use Encode;
    use Encode::Tcl;
    use Unicode::Normalize qw(normalize);

    print encode 'shiftjis',
          normalize 'KC',
          decode 'shiftjis', $_ while <$SJIS>;

and get normalized *shiftjis* text.

i.e. full-width digits and latin letters, half-width kana, etc. 
are converted to their normal compatibility equivalents.

Regards, SADAHIRO Tomoyuki



Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About