develooper Front page | perl.perl5.porters | Postings from May 2011

RE: Unicode Advice needed

Thread Previous | Thread Next
From:
Paul Marquess
Date:
May 19, 2011 14:53
Subject:
RE: Unicode Advice needed
Message ID:
006b01cc166f$24e4ff30$6eaefd90$@btinternet.com
From: Zefram [mailto:zefram@fysh.org] 
 
> >Am I correct in assuming that I can't automatically (and safely) 
> >determine whether the $name parameter is a candidate for storing as UTF8?
> 
> In principle anything is a candidate for storing as UTF-8.  If your choice
is between storing as Latin-1 and storing as UTF-8, a more sensible question
is whether you *need* to store as UTF-8
 > (because the string can't be represented as Latin-1).  You can test this
with /[^\x00-\xff]/.

Correct - I only want to bother with encoding if it is actually needed -
UTF8 support in Zip files is a relatively recent addition and it does mean
extra bloat to the file created. I don't want to do it if it isn't
necessary.

> You might alternatively decide to UTF-8-encode anything that's not pure
ASCII, which you can similarly test with /[^\x00-\x7f]/.

The /[^\x00-\xff]/ suggestion sounds like it should be a sure-fire way to
say that the string does contain utf8, but the opposite obviously isn't
going to be true. 

That means I'm going to have to add an option to allow the user to
explicitly flag that the filename should be encoded in utf8 before it gets
written to the file.

Paul


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About