develooper Front page | perl.perl5.porters | Postings from October 2021

Re: "use v5.36.0" should imply ASCII source

Thread Previous | Thread Next
From:
Yuki Kimoto
Date:
October 7, 2021 06:06
Subject:
Re: "use v5.36.0" should imply ASCII source
Message ID:
CAExogxOazB+dcHfKE9cFpJ1a5NxPF+K+8-iYoVxO9WKGT5XzEg@mail.gmail.com
Ric asks "ONE:  What's the end state we'd like to get to?".

I'm thinking about the goal.

1. enable utf8 by default in use vx in the future;

2. In also one liner it can be used in the same way as a normal Perl
program using "use utf8", "Encode::decode" and "Encode::encode".

2 needed a little more description.

----------------------------------------------------
Source is UTF-8, and the input string is decoded form UTF-8(arguments(A),
stdin(I), input file stream(i)),

and the string is encode to UTF-8(stdout(O), stderr(E), output file
stream(o)).

In the one liner, I need to write the following way. SAD is same as -IOEAio

  echo -e '1あい' | perl -Mutf8 -CSAD -p -e 's/\d\wい/1ai/'

Input

  1あい

Output

  1ai

The replacement is successful as expected.

I want to write this more easily, for example --utf8 option.

  echo -e '1あい' | perl --utf8 -p -e 's/\d\wい/1ai/'
----------------------------------------------------------------------------------

I think this is independent of the topic of the string flag.

what do you think?

2021-10-5 17:25 Yuki Kimoto <kimoto.yuki@gmail.com> wrote:

>
>
> 2021-10-4 22:21 Felipe Gasper <felipe@felipegasper.com> wrote:
>
>>
>> > On Oct 4, 2021, at 4:45 AM, Yuki Kimoto <kimoto.yuki@gmail.com> wrote:
>> >
>> >
>> > 2021-10-4 3:57 Ricardo Signes <perl.p5p@rjbs.manxome.org> wrote:
>> >
>> > ONE:  What's the end state we'd like to get to?
>> >
>> >
>> >  I have a question.
>> >
>> >   echo -e '1' | perl -p -E 's/\d/1/'
>> >
>> > '1' of echo argument is Japanese UTF-8. Output is ASCII 1.
>> >
>> > Current Output(UTF-8 1)
>> >
>> >   1
>> >
>> > Ideal Output(ASCII 1)
>> >
>> >   1
>> >
>> > Do you want this to work ideally in the UNIX/Linux system?
>>
>> For that to happen you would pass the `-CIO` flag to perl, which causes
>> STDIN & STDOUT to automatically decode/encode UTF-8.
>>
>> The one-liner as-is outputs "\xef\xbc\x91" (U+FF11 in UTF-8) instead of
>> ASCII 1 because those 3 bytes are what Perl receives on STDIN, and nothing
>> is decoding those to U+FF11. Your s/\d/1/ only works on *digits*, and none
>> of U+00EF, U+00BC, or U+0091 is. So no change happens.
>>
>> -FG
>
>
>  I understand if I get the result, I can use the -CIO flag. I will try to
> learn these flags for a while.
>
>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About