develooper Front page | perl.perl5.porters | Postings from August 2018

Better Win32 filename handling?

Thread Next
Christian Walde
August 19, 2018 00:47
Better Win32 filename handling?
Message ID:
This is just a quick shot to get some background info and opinions not obvious from the repo history.

Currently on Windows if you, for example, do readdir and encounter a file with e.g. kanji in the name, you get roughly this from it: "?????.txt" On linux the result of this wouldn't be perfect, but at least you'd get the complete byte string of the filename and can process it after that. On windows such filenames just get completely trashed and are unusable.

Internally modern Windows stores all filenames in UTF-16, and expects the developer to use widechar versions of file io functions. So i checked the source and it turns out that win32.c already uses e.g. FindNextFileW to implement readdir. However it ALSO runs the string it gets from that through WideCharToMultiByte(CP_ACP, ...), which converts it to the local default Windows ANSI code page. In practice that means all characters which look at least similar to local ansi characters are converted to those, and everything else gets converted into a question mark.

Jan Dubois implemented things this way in 2006 but didn't provide much explanation in the commit message. I suspect it may be for compatibility with older Windows.

A naive way of fixing this would probably be to simply remove the calls to WideCharToMultiByte from this and other io-related functions.

It might break some scripts that modify filenames as if they're singlebyte strings, but this might be acceptable given that it would make it possible to actually handle full utf8 filenames in windows perl without writing uncastrated duplicates of core code in XS and adding myriad extra handlings around Path::Tiny, IO::All and friends.


With regards,
Christian Walde

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About