On approximately 5/20/2008 2:48 AM, came the following characters from the keyboard of Ben Morrow: > Quoth tchrist@perl.com (Tom Christiansen): >> But I still think that you are asking a lot if you want to make the >> claim that filenames as used to access the system's underlying files >> VIA ITS OWN INTERFACES are data rather than metadata. And I don't think >> that filesystem metadata is reliably treated as anything but bytes, at >> least on systems with which I am conversant. > > But this is exactly where the thread started... Win32 (and, IIUC, other > systems such as VMS) don't treat filenames as (sequences of) bytes, but > as sequences of Unicode characters. Win32 at least also has two sets of > APIs: one takes parameters in some currently-selected encoding and > converts to Unicode for you (the 'ANSI' API), and one which takes > arguments in Unicode (the 'Unicode' API). > > This leaves three possibilities. At the moment (I think), all IO happens > through the ANSI API, which leaves Perl in the unfortunate position of > being unable to open files with names that don't fit in the current > character set. > > I believe what Jan was suggesting (please correct me if I've > misunderstood) was > > - filenames which are !SvUTF8 use the ANSI API, > - filenames which are SvUTF8 use the Unicode API. I didn't see that in any of Jan's messages. I don't think he ever mentioned the Unicode API. In the long term, I think the ANSI API should only be used on Windows 9x systems that have no native or subsequently installed Unicode support. > However, since this would mean that > > my $fn = "\xe0"; > open my $F, '<', $fn; > > and > > my $fn = substr "\xe0\x{100}", 0, 1; > open my $F, '<', $fn; > > potentially opened different files, Under the assumption that you stated about, about which filenames used which APIs, this could be the case. But it wouldn't be, if instead of your assumptions, Perl always converted the string to Unicode, and used the Unicode API. > Perl's auto-upgrading on Win32 would > also have to be changed to use the current ANSI encoding instead of > ISO8859-1. My complaint with this is that it means that when a string is > upgraded, the values reported by 'chr' (for instance) will mysteriously > and silently change. Because it hasn't done that from day 1 of Unicode support, I think this would be inappropriate. > The potential alternative I was proposing was that all filenames should > be upgraded to SvUTF8 (using ISO8859-1, as currently) and then passed to > the Unicode API. This has the advantage of maintaining current in-Perl > string semantics, and the disadvantage of breaking all Win32 programs > that currently use non-ASCII filenames. Hmm, yes, this is the same alternative I mentioned just above! Answering sequentially, I guess. > I don't think there's any way forward without breaking *something*. The > question is what will cause least damage. What would break under my proposal? And does it not go forward? Or is it just too complex? (see <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-05/msg00570.html>) > Ben -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration NetworkingThread Previous | Thread Next