develooper Front page | perl.perl5.porters | Postings from October 2016

Re: Windows Unicode Environment variables and Input arguments

Thread Previous
From:
Steve Hay via perl5-porters
Date:
October 5, 2016 17:23
Subject:
Re: Windows Unicode Environment variables and Input arguments
Message ID:
CADED=K6PPFcOwNOiiZxj27jm0W5P1FV+kcxvvfsb-5rP465KpA@mail.gmail.com
On 4 October 2016 at 15:10, Klemen Marković
<Klemen.Markovic@comtrade.com> wrote:
> Hello,
>
>
>
> We use perl as it allows us to use the same scripts on Windows and UX
> instead of relying on bash scripts for UX and something else for Windows.
>
> We were surprised to find out that we are unable to use Unicode input
> arguments and environment variables on Windows, like we do on UX platforms.
>
>
>
> Example:
>
>
>
> 1. open windows cmd.exe
>
> 2. chcp 65001
>
> 3. set VARIABLE=简化字
>
> 4. perl somescript.pl 简化字(containing: print $ENV{VARIABLE}, @ARGV)
>
> 5. we do not receive proper characters on output
>
>
>
> Reasons why this happens are:
>
> -perl.exe uses main(int argc, char **argv) and not wmain(int argc, wchar_t
> **argv)
>
> -perl.exe calls GetEnvironmentVariableA instead of GetEnvironmentW
>
> -perl.exe calls CreateProcessA instead of calling CreateProcessW and
> specifying CREATE_UNICODE_ENVIRONMENT, therefore spawned processed by perl
> will also not have Unicode environment variables and input arguments
>
>
>
> On Windows, the Wide versions of all the APIs have to be used to have
> Unicode support, because Windows Kernel internally uses UCS2/UTF16, so in
> order to not lose any information, conversions from UTF8 <-> UTF16 and vice
> versa need to be performed. Using narrow characters in Windows results in
> loss of Unicode(even though UTF8 is used as input in the console).
>
>
>
> We have made these modifications ourselves, but we'd rather see that they
> get into the next version of perl, so we do not have to do this with each
> upgrade.
>
> Are there any design limitations to implement this ?
>
>
>
> We also noticed that most of the Windows W APIs were removed in favor of A
> APIs since v5.8.7, where we first encountered these issues. What was the
> reasoning for it ?
>
>

I see it was removed from 5.9.3 (and presumably later backported to 5.8.8) by

http://perl5.git.perl.org/perl.git/commit/8c56068e9474ff1eb28abd58496550d54581dd25

which was mentioned briefly on p5p here:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2005-11/msg00240.html

where it says, "the code never really worked anymore ever since Perl
added UTF8 support for PVs internally.  Therefore I think it is high
time to get rid of it to improve maintainability (and the sharing with
cygwin)."

The use of the wide APIs had been switched off since 2003 by

http://perl5.git.perl.org/perl.git/commit/581883cdf264875c9c1f1fd2c8d45ef942f553c1

There was brief talk of reviving it via a -C switch here:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-02/msg00320.html

but it didn't happen:

http://perl5.git.perl.org/perl.git/commit/a05d7ebb5e798334196e3cff205b658506cc4384

I think it was basically a hang-over from 5.6 days and never really
worked properly with the Unicode changes in 5.8. There have been
numerous discussions on p5p and the perl-unicode list regarding what
to do about it in the future, e.g. see the huge thread starting at

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00742.html

The current consensus is still what Jan Dubois mentioned in that
thread (at http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00745.html),
namely to virtualize operating system access:

http://perl5.git.perl.org/perl.git/blob/HEAD:/Porting/todo.pod#l1048

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About