develooper Front page | perl.macosx | Postings from April 2005

Re: character encoding on file upload name

Thread Previous | Thread Next
From:
Andrew Mace
Date:
April 7, 2005 11:40
Subject:
Re: character encoding on file upload name
With Randy's tip and my discovery of the Unicode::Normalize module, 
I've gotten things worked out.

use Unicode::Normalize qw(compose);
use Encode qw(decode_utf8);
...
my $f = decode_utf8(param('file'));
... write out the file itself with name in decomposed utf-8
$f = compose($f);
... now do something with filename in composed utf-8
etc.

Thanks to everyone who helped out.  I'm not sure what to do with my day 
now.

Andrew



On Apr 7, 2005, at 1:57 PM, Randy Boring wrote:

>> I've noticed that the non-ASCII characters are getting split into 
>> their
>> base code points.  For example, U+00E9, Latin small letter E with
>> acute, becomes U+0065 U+0301 (unicode.org/charts/PDF/U0080.pdf).  Is
>> there a way to easily recombine the code points to get the original
>> value?  It's strange to me that Encode::decode_utf8 doesn't do this.  
>> I
>> thought diacritical marks were always combined with their preceding
>> letter, if possible.
>>
>> Andrew
>
> You've run into the particular format of HFS+ filenames.  It's not just
> any utf-8 encoding, most all of the Unicode characters that are
> decomposable are decomposed, and must be so!
>
> In Apple's header files (CoreFoundation/CFStringEncodingExt.h), it's
> referred to as kUnicodeCanonicalDecompVariant.
> In NSString.h there are functions for
> decomposedStringWithCanonicalMapping (and precomposed- and
> -CompatabilityMapping).  How you get to them from Perl, tho.... maybe
> CamelBones?
>
> A description of this text encoding (and the reason for it) are found 
> at
>   http://developer.apple.com/technotes/tn/tn1150.html
>
> see especially
>   http://developer.apple.com/technotes/tn/tn1150.html#HFSPlusNames
> and
>   http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties
>
>
> Hope that helps a little,
>
>  -Randy
>


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About