develooper Front page | perl.perl5.porters | Postings from September 2018

[perl #133496] @ARGV, -CA and Win32

From:
Tony Cook via RT
Date:
September 11, 2018 02:01
Subject:
[perl #133496] @ARGV, -CA and Win32
Message ID:
rt-4.0.24-22928-1536631275-1000.133496-15-0@perl.org
On Thu, 06 Sep 2018 11:44:09 -0700, me@xenu.pl wrote:
> On Wed, 05 Sep 2018 16:20:03 -0700
> "Tony Cook via RT" <perlbug-followup@perl.org> wrote:
> 
> > Also, it breaks embedding, so don't apply this patch.
> >
> > Maybe an alternative is to not make it depend on the -CA switch, but
> > on the current code page.
> >
> > If the current code page is 65001 then main() (in win32/runperl.c)
> > could do the conversion to utf-8 I do in my patch.
> >
> > The program then depends on the normal -CA behaviour to treat that as
> > UTF-8, so perl code sees Unicode in @ARGV.
> >
> > It does mean that a user has to do something unusual (chcp 65001) to
> > get reasonable behaviour.
> >
> > Tony
> 
> You mean the console codepage? There are some problem with that
> approach.
> 
> Console codepages don't exist in windows subsystem applications (like
> wperl.exe), GetConsoleCP() returns 0 in them:
> 
> C:\Users\xenu>wperl -MWin32 -E "open my($fh), '>', 'a.txt'; print
> {$fh} Win32::GetConsoleCP()"
> C:\Users\xenu>type a.txt
> 0
> 
> Another problem is that it won't cover situations where it's
> impossible
> to change console codepage, for example when perl.exe is launched via
> explorer.exe (e.g. via .lnk shortcut or when some file extension is
> associated with a perl script).
> 
> I think that the only reasonable way to fix the win32 unicode bug is
> to
> introduce a way to globally force utf-8 everywhere, i.e. @ARGV,
> filenames and env variables. -C flag used to serve this exact
> purpose[1],
> but this functionality was removed in 5.8.1.

The argv handling looks similar to what my patch does - with the same problem for embedding.

The wide system calls handling appears to assume all SVs are UTF-8 encoded, even without the SVf_UTF8 flag set.

> 
> IMO we should reintroduce that switch.
> 
> On second thought, I think, in the long run, we should enable unicode
> handling by default and add a switch which would restore the old
> behaviour for scripts that rely on it. IMO that would be the most
> reasonable approach, because the current behaviour is *completely*
> broken and I'm pretty sure that changing it would fix more code than
> it
> would break.

I think fixing @ARGV would be reasonably painless for backcompat, but the rest wide-character support is too likely to break things, I think.

I wonder how much CPAN testing was done with -C for perl 5.8.0.

Tony

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=133496



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About