On 4 October 2016 at 15:10, Klemen Marković <Klemen.Markovic@comtrade.com> wrote: > Hello, > > > > We use perl as it allows us to use the same scripts on Windows and UX > instead of relying on bash scripts for UX and something else for Windows. > > We were surprised to find out that we are unable to use Unicode input > arguments and environment variables on Windows, like we do on UX platforms. > > > > Example: > > > > 1. open windows cmd.exe > > 2. chcp 65001 > > 3. set VARIABLE=简化字 > > 4. perl somescript.pl 简化字(containing: print $ENV{VARIABLE}, @ARGV) > > 5. we do not receive proper characters on output > > > > Reasons why this happens are: > > -perl.exe uses main(int argc, char **argv) and not wmain(int argc, wchar_t > **argv) > > -perl.exe calls GetEnvironmentVariableA instead of GetEnvironmentW > > -perl.exe calls CreateProcessA instead of calling CreateProcessW and > specifying CREATE_UNICODE_ENVIRONMENT, therefore spawned processed by perl > will also not have Unicode environment variables and input arguments > > > > On Windows, the Wide versions of all the APIs have to be used to have > Unicode support, because Windows Kernel internally uses UCS2/UTF16, so in > order to not lose any information, conversions from UTF8 <-> UTF16 and vice > versa need to be performed. Using narrow characters in Windows results in > loss of Unicode(even though UTF8 is used as input in the console). > > > > We have made these modifications ourselves, but we'd rather see that they > get into the next version of perl, so we do not have to do this with each > upgrade. > > Are there any design limitations to implement this ? > > > > We also noticed that most of the Windows W APIs were removed in favor of A > APIs since v5.8.7, where we first encountered these issues. What was the > reasoning for it ? > > I see it was removed from 5.9.3 (and presumably later backported to 5.8.8) by http://perl5.git.perl.org/perl.git/commit/8c56068e9474ff1eb28abd58496550d54581dd25 which was mentioned briefly on p5p here: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-11/msg00240.html where it says, "the code never really worked anymore ever since Perl added UTF8 support for PVs internally. Therefore I think it is high time to get rid of it to improve maintainability (and the sharing with cygwin)." The use of the wide APIs had been switched off since 2003 by http://perl5.git.perl.org/perl.git/commit/581883cdf264875c9c1f1fd2c8d45ef942f553c1 There was brief talk of reviving it via a -C switch here: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-02/msg00320.html but it didn't happen: http://perl5.git.perl.org/perl.git/commit/a05d7ebb5e798334196e3cff205b658506cc4384 I think it was basically a hang-over from 5.6 days and never really worked properly with the Unicode changes in 5.8. There have been numerous discussions on p5p and the perl-unicode list regarding what to do about it in the future, e.g. see the huge thread starting at http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00742.html The current consensus is still what Jan Dubois mentioned in that thread (at http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00745.html), namely to virtualize operating system access: http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod#l1048Thread Previous