develooper Front page | perl.i18n | Postings from May 2006

Re: ordering Japanese

Thread Previous | Thread Next
Barry Caplan
May 5, 2006 01:07
Re: ordering Japanese
Message ID:
Ah! A non-Japanese speaker :)

Japanese is probably the world's most complex writing system.

It is a pretty safe bet that almost all kanji have more then one
pronunciation - the actual reading is very context dependent.

The only algorithm that is going to work is "brute force" and even then,
especially with names, I bet you won't cover everything. I recall
meeting a Japanese translator whose thesis showed that even Japanese
phone operators have something like (IIRC) a 70% chance of being able to
find a given phone number, given the pronunciation. That was in the
early 90s, so maybe things have improved with increased automation, but
I am certain any code to do what you ask of the general case would be
far from trivial and consist of maybe 10's of thousands of special cases.


Barry Caplan

Mike Barborak wrote:
> Thanks for the reply. And yes, that's the explanation that I got from my
> client - that the ordering should be based on the pronunication of the
> Japanese and then based on this ordering:
> A I U E O
> Ka Ki Ku/Qu Ke Ko (Ca Ci Cu Ce Co if the sound is "K")
> Ga Gi Gu Ge Go
> Sa Shi Su Se So (Ca Ci Cu Ce Co if the sound is "S")
> Ja Ji Ju Je Jo
> Za Zi Zu Ze Zo
> Ta Ti Tu Tsu Te To
> Da Di Du De Do
> Na Ni Nu Ne No
> Ha Hi Hu He Ho
> Va Vi Vu Ve Vo
> Pa Pi Pu Pe Po
> Fa Fi Fu Fe Fo
> Ba Bi Bu Be Bo
> Ma Mi Mu Me Mo
> Ya Yi Yu Ye Yo
> Ra/La Ri/Li Ru/Lu Re/Le Ro/Lo
> Wa Wi Wu We Wo
> Nb
> So that makes sense to me. The problem is that I haven't found a
> programmatic way to do this. I've tried the Lingua::JA::Sort::JIS perl
> module which does a localized ordering but seemingly only with respect to
> katakana and not kanji. So I also tried the Unicode::Collate module and
> while that seems to support a great deal of localization, I can't get it to
> produce the desired ordering. So I'm not really sure if I should pursue
> those modules or do something else? My latest thought is to try to use a
> module like Lingua::JA::Romanize::Japanese which will convert the Japanese
> glyphs to a romanized pronunciation that I could then try to sort on but I
> kind of feel like I would be inventing something that someone else has
> probably already built. Any thoughts?
> Thanks,
> Mike
> -----Original Message-----
> From: Dr Bean [] 
> Sent: Thursday, May 04, 2006 11:28 PM
> To:
> Subject: Re: ordering Japanese
> On Wed, 03 May 2006, Mike Barborak wrote:
>> Hi,
>> 1. 伊勢丹 JR京都店
>> 2. アペックス 福山
>> 3. アミュプラザ 鹿児島
>> 4. オクノ 旭川
>> 5. さくら野百貨店 仙台
>> 6. さつま屋 鹿児島
>> 7. スタンス 米子
>> 8. そごう 神戸店
>> 9. そごう 千葉店
>> 10. そごう 大宮店
>> 11. そごう 横浜店
>> 12. ダイアモンドシティアルル 橿原
>> 13. ニューズ 熊本
> Oops, I was reading this list in Big5. I guess iconv was doing its best.
> Looked at as UTF-8, 1. comes after 3. because the 'i' in 'Isetan' comes
> after the 'a' in 'Amyuplaza'.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About