Front page | perl.perl5.porters |
Postings from April 2011
charnames and CJK
Thread Next
From:
Tom Christiansen
Date:
April 26, 2011 15:49
Subject:
charnames and CJK
Message ID:
6799.1303858127@chthon
Cool, we get CJK (quasi-)names now.
% perl5.12.3 -Mcharnames=:full -E 'say charnames::viacode(0x547c) || "<missing>"'
<missing>
%# blead -Mcharnames=:full -E 'say charnames::viacode(0x547c) || "<missing>"'
CJK UNIFIED IDEOGRAPH-547C
% perl -Mcharnames=:full -wE 'printf "U+%X\n", ord "\N{CJK UNIFIED IDEOGRAPH-547C}"'
Unknown charname 'CJK UNIFIED IDEOGRAPH-547C' at /usr/local/lib/perl5/5.12.3/unicore/Name.pl line 1
U+FFFD
% blead -Mcharnames=:full -wE 'printf "U+%X\n", ord "\N{CJK UNIFIED IDEOGRAPH-547C}"'
U+547C
It's not a new code point at all:
% blead uniprops -a 'CJK UNIFIED IDEOGRAPH-547C'
U+547C ‹呼› \N{CJK UNIFIED IDEOGRAPH-547C}
\w \pL \p{L_} \p{Lo}
All Any Alnum Alpha Alphabetic Assigned InCJK_UnifiedIdeographs CJK_Unified_Ideographs L Lo Gr_Base Grapheme_Base Graph
GrBase Han Hani ID_Continue IDC ID_Start IDS Ideo Ideographic Letter L_ Other_Letter Print UIdeo Unified_Ideograph
Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word
Age=1.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=CJK_Unified_Ideographs Canonical_Combining_Class=0
Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None
East_Asian_Width=W East_Asian_Width=Wide EA=W Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Script=Han
Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup
Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=ID Line_Break=Ideographic LB=ID Numeric_Type=None NT=None
Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0
Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0
Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Han Script=Hani Sentence_Break=LE
Sentence_Break=OLetter SB=LE Word_Break=Other WB=XX Word_Break=XX _X_Begin
Since it was already here in 1.1, I figure charnames just acts
differently now. Is that right?
--tom
Thread Next
-
charnames and CJK
by Tom Christiansen