develooper Front page | perl.perl5.porters | Postings from October 2021

Re: "use v5.36.0" should imply ASCII source

Thread Previous | Thread Next
From:
Felipe Gasper
Date:
October 4, 2021 13:21
Subject:
Re: "use v5.36.0" should imply ASCII source
Message ID:
E4B63395-FF5D-4A9E-B1B9-4435162E25CE@felipegasper.com

> On Oct 4, 2021, at 4:45 AM, Yuki Kimoto <kimoto.yuki@gmail.com> wrote:
> 
> 
> 2021-10-4 3:57 Ricardo Signes <perl.p5p@rjbs.manxome.org> wrote:
> 
> ONE:  What's the end state we'd like to get to?
> 
> 
>  I have a question.
> 
>   echo -e '1' | perl -p -E 's/\d/1/'
> 
> '1' of echo argument is Japanese UTF-8. Output is ASCII 1.
> 
> Current Output(UTF-8 1)
> 
>   1
> 
> Ideal Output(ASCII 1)
> 
>   1
> 
> Do you want this to work ideally in the UNIX/Linux system?

For that to happen you would pass the `-CIO` flag to perl, which causes STDIN & STDOUT to automatically decode/encode UTF-8.

The one-liner as-is outputs "\xef\xbc\x91" (U+FF11 in UTF-8) instead of ASCII 1 because those 3 bytes are what Perl receives on STDIN, and nothing is decoding those to U+FF11. Your s/\d/1/ only works on *digits*, and none of U+00EF, U+00BC, or U+0091 is. So no change happens.

-FG
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About