develooper Front page | perl.i18n | Postings from May 2006

Re: ordering Japanese

Thread Previous | Thread Next
From:
Kino
Date:
May 4, 2006 21:17
Subject:
Re: ordering Japanese
Message ID:
C0198BD1-DC14-4067-8BFA-22FC2FDB4E59@rio.odn.ne.jp

On 4 May 2006, at 8:12, Mike Barborak wrote:

> Here is an example of 13 store names ordered with
> Lingua::JA::Sort::JIS::msort:
>
> 1. 伊勢丹 JR京都店
> 2. アペックス 福山
> 3. アミュプラザ 鹿児島
> 4. オクノ 旭川
> 5. さくら野百貨店 仙台
> 6. さつま屋 鹿児島
> 7. スタンス 米子
> 8. そごう 神戸店
> 9. そごう 千葉店
> 10. そごう 大宮店
> 11. そごう 横浜店
> 12. ダイアモンドシティアルル 橿原
> 13. ニューズ 熊本
>
> My client tells me that entry 1 should actually come after the 3rd  
> entry and
> before the fourth.

He is right. Usually Japanese words are sorted on the pronunciation  
(shown between [...] below).

1. [あぺっくす]	アペックス 福山*
2. [あみゅぷらざ]	アミュプラザ 鹿児島
3. [いせたんじぇいあーるきょうとてん]	伊勢丹 JR 
京都店
4. [おくの]	オクノ 旭川
5. [さくらのひゃっかてん]	さくら野百貨店 仙台
6. [さつまや]	さつま屋 鹿児島
7. [すたんす]	スタンス 米子
8. [そごうおおみやてん]	そごう 大宮店
9. [そごうこうべてん]	そごう 神戸店
10. [そごうちばてん]	そごう 千葉店
11. [そごうよこはまてん]	そごう 横浜店
12. [だいやもんどしてぃあるる]	ダイアモンドシ 
ティアルル 橿原
13. [にゅーず]	ニューズ 熊本

* If there is another アペックス, e.g. アペックス広島, you  
have to use [あぺっくすふくやま].

> From this description on manyogana, I'm thinking they're saying  
> that collation of the glyph 伊 should be based on its katakana  
> adaptation イ which makes sense:
>
> http://en.wikipedia.org/wiki/Manyogana

I'm not an expert of Japanese language and literature. But as far as  
modern Japanese is concerned, I think it it inappropriate to  
associate the pronunciation of a kanji (Chinese letter and pseudo- 
Chinese letter used in Japanese) to a man'yogana. 伊勢 is a common  
proper name and pronounced いせ.

> 3. Is the solution to first convert the manyogana characters to  
> katakana and then do the msort?

Yes.

> If so does anyone know of a Perl module to do this or a nice  
> reference that I could use more programmatically than the image on  
> the link above?

I don't know and I'm afraid there's not such a module. To give a  
pronunciation to all common kanji words would require a large  
dictionary...

> 4. Can anyone think of any other glyphs or classes of Japanese  
> glyphs similar to manyogana that I should be worried about?

Romaji -- JR in "JR京都店" in your example.



Kino

☯





Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About