develooper Front page | perl.perl5.porters | Postings from October 2011

Re: [perl #95160] Unicode readdir bugs

Thread Previous
From:
Aristotle Pagaltzis
Date:
October 4, 2011 09:01
Subject:
Re: [perl #95160] Unicode readdir bugs
Message ID:
20111004160139.GA16868@klangraum.plasmasturm.org
* Eric Brine <ikegami@adaelis.com> [2011-09-19 03:20]:
> File names are meant to be read as text, so one can't really claim
> they're just octet sequences. So the real question is what should we
> do when readdir encounters a file name that doesn't cleanly decode
> using the encoding it's expected to be encoded with (e.g. a file name
> that's not valid UTF-8 on a box with a UTF-8 locale).

One could take a page from Python here and use its surrogate escape
error handling. There was a subthread about it a while ago:
http://www.nntp.perl.org/group/perl.perl5.porters/;msgid=A8767ACF-E6A0-498A-B402-54A12D26523B@activestate.com

What this approach effectively does is allow strings to unambiguously
represent a mixture of bytes and characters, which in a roundabout way
essentially solves the problem that Perl only has a single string type.
But do note the later message about the security implications. It will
take some thought to get this clean, but there is a lot of potential in
it.

I love the idea and it is one of my todos to add this to Encode should
no one else get there first. The core could then use this method to
provide clean and nice interfaces to any OS APIs which are textual in
intent but binary in practice – as Python does.

It would be a major step forward for Perl.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About