develooper Front page | perl.perl5.porters | Postings from June 2012

RE: [perl #113536] GetEnvironmentStrings() mess up the values for non-ascii strings

Thread Previous | Thread Next
Steve Hay
June 14, 2012 00:56
RE: [perl #113536] GetEnvironmentStrings() mess up the values for non-ascii strings
Message ID:
bulk88 via RT wrote on 2012-06-13:
> Okay, here is the problem with the patch, the env var is NOT 
> corrupted, only, if the env var is a utf8 char, that can be 
> represented in the current system high ascii CP. If the utf8 char can 
> not be represented in the current system high ascii CP, it is 
> converted to 0x3F. On pre- patch Perls, the previous case results in a 
> different letter, but not "?"
> appearing in the child process. 

Thanks for the sample programs. I intend to have a look through them when I have more time, but if I understand you correctly the problem here is only that things are still not working if the environment contains characters which cannot be represented in the current ANSI codepage?

That is to be expected, given the widechar (UTF-16) -> ANSI codepage conversion being done, and the ? character, of course, comes from the WideCharToMultiByte() calls when they fail to map a unicode character to the target encoding. On pre-patch perls something different happens because GetEnvironmentStringsA() was being used which returns things in the OEM codepage and must presumably have some algorithm of its own for handling unicode characters that cannot be mapped to that encoding, hence different characters (but still the wrong ones!) appearing instead of ?.

I still think the patch is worthwhile (with the fix I posted previously) because it fixes things for the common case where the environment contains characters which *are* in the current ANSI codepage but either not in the current OEM codepage (because the GetEnvironmentStringsA() conversion from unicode to OEM would have failed) or else (more commonly) in the OEM codepage but at a different byte position (because perl then treated the OEM bytes as if they were the ANSI bytes which are normally used).

So the only problem remaining is what happens in the case when characters outside of the current ANSI codepage appear in the environment, and that would be worth logging as a separate bug. It isn't such a common case, and there are likely to be problems in other areas of the code in that case anyway, e.g. you can't open a file whose name contains characters outside of the ANSI codepage without jumping through some hoops rather than using the built-in open() function.

If you agree and have no other issues with the patch then I'll apply it and raise the non-ANSI characters problem as a separate bug (and attach your scripts etc).

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About