develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on broken manpages, trolling, inconsistent implementation andthe difficulty to fix bugs

Thread Previous | Thread Next
Glenn Linderman
May 20, 2008 03:45
Re: on broken manpages, trolling, inconsistent implementation andthe difficulty to fix bugs
Message ID:
On approximately 5/20/2008 2:48 AM, came the following characters from 
the keyboard of Ben Morrow:
> Quoth (Tom Christiansen):
>> But I still think that you are asking a lot if you want to make the
>> claim that filenames as used to access the system's underlying files 
>> VIA ITS OWN INTERFACES are data rather than metadata.  And I don't think
>> that filesystem metadata is reliably treated as anything but bytes, at
>> least on systems with which I am conversant.
> But this is exactly where the thread started... Win32 (and, IIUC, other
> systems such as VMS) don't treat filenames as (sequences of) bytes, but
> as sequences of Unicode characters. Win32 at least also has two sets of
> APIs: one takes parameters in some currently-selected encoding and
> converts to Unicode for you (the 'ANSI' API), and one which takes
> arguments in Unicode (the 'Unicode' API).
> This leaves three possibilities. At the moment (I think), all IO happens
> through the ANSI API, which leaves Perl in the unfortunate position of
> being unable to open files with names that don't fit in the current
> character set.
> I believe what Jan was suggesting (please correct me if I've
> misunderstood) was
>     - filenames which are !SvUTF8 use the ANSI    API,
>     - filenames which are SvUTF8  use the Unicode API.

I didn't see that in any of Jan's messages. I don't think he ever 
mentioned the Unicode API.  In the long term, I think the ANSI API 
should only be used on Windows 9x systems that have no native or 
subsequently installed Unicode support.

> However, since this would mean that
>     my $fn = "\xe0";
>     open my $F, '<', $fn;
> and
>     my $fn = substr "\xe0\x{100}", 0, 1;
>     open my $F, '<', $fn;
> potentially opened different files, 

Under the assumption that you stated about, about which filenames used 
which APIs, this could be the case.  But it wouldn't be, if instead of 
your assumptions, Perl always converted the string to Unicode, and used 
the Unicode API.

> Perl's auto-upgrading on Win32 would
> also have to be changed to use the current ANSI encoding instead of
> ISO8859-1. My complaint with this is that it means that when a string is
> upgraded, the values reported by 'chr' (for instance) will mysteriously
> and silently change.

Because it hasn't done that from day 1 of Unicode support, I think this 
would be inappropriate.

> The potential alternative I was proposing was that all filenames should
> be upgraded to SvUTF8 (using ISO8859-1, as currently) and then passed to
> the Unicode API. This has the advantage of maintaining current in-Perl
> string semantics, and the disadvantage of breaking all Win32 programs
> that currently use non-ASCII filenames.

Hmm, yes, this is the same alternative I mentioned just above! 
Answering sequentially, I guess.

> I don't think there's any way forward without breaking *something*. The
> question is what will cause least damage.

What would break under my proposal?  And does it not go forward?  Or is 
it just too complex?

> Ben

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About