develooper Front page | perl.perl5.porters | Postings from November 2008

[perl #60888] Win32: support full unicode in filenames (use Wide-system calls)

November 28, 2008 05:01
[perl #60888] Win32: support full unicode in filenames (use Wide-system calls)
Message ID:
# New Ticket Created by  "ab" 
# Please include the string:  [perl #60888]
# in the subject line of all future correspondence about this issue. 
# <URL: >

OS: Windows XP (German); cp1252

There are 2 problems with encoding of filenames on windows:
1) cp1252 != latin1, but perl treats them as the same:
   for example filenames returned by readdir (cp1252) are silently interpreted as latin1,
   but the Euro sign for example is different, the result is wrong/unuseable filename in this case.

Note: the error may be invisible if the function that uses the filename again
silently uses the inverse conversion. However if i use the filename somewhere else (print to utf8 text file,
use direct Win32 Api call, ...), it is wrong.

2) Unicode chars are not possible

Since perl supports utf8 strings internally, the filenames should be correct utf8 strings
(for opendir, open, stat, readdir, -d, -e, etc...). Currently this is not so.
WinAnsi cp1252 byte strings are interpreted as latin1 (and the other way around),
with above problem.

NTFS supports unicode filenames, and winapi has "Wide-system calls" (suffix W, 
e.g. CreateFileW, FindFilesW)

So, perl should switch to use these Wide-system calls (only a UCS2 <=> utf8 conversion remains to be done),
both problems above would be solved ...

[Active Perl 5.8.8, 5.10.0]

    severity=medium Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About