develooper Front page | perl.perl5.porters | Postings from February 2003

Re: [perl #21395] rcatline doesn't grok utf8

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
February 28, 2003 04:20
Subject:
Re: [perl #21395] rcatline doesn't grok utf8
Message ID:
20030228122031.U2347@plum.flirble.org
On Fri, Feb 28, 2003 at 11:29:28AM +0200, Enache Adrian wrote:
> On Thu, Feb 27, 2003 at 10:25:19PM -0000, Nicholas Clark wrote:
> > This didn't appear to get logged as a bug:
> > 
> > $ echo >test; perl5.6.1 -lwe '$_ = chr 256; $_ .= <>; print ord $_' test
> > 256
> > $ echo >test; perl5.8.0 -lwe '$_ = chr 256; $_ .= <>; print ord $_' test
> > 196
> 
> Please try this ( that's bleedperl, if you try on 5.8.0, you
> have to remove the else .. in 2 or 3 places ).
> 
> note: the 'else SvUTF8_off()' is kind of useless, since
> SvPOK_only has alreadly turned off the SVf_UTF8 flag.
> 
> Regards
> 
> Adi
> 
> -------------------------------------------------------------------
> --- /arc/perl-current/sv.c	2003-02-26 04:50:55.000000000 +0200
> +++ sv.c	2003-02-28 11:20:29.000000000 +0200
> @@ -6247,7 +6247,7 @@ Perl_sv_gets(pTHX_ register SV *sv, regi
>      (void)SvUPGRADE(sv, SVt_PV);
>  
>      SvSCREAM_off(sv);
> -    SvPOK_only(sv);    /* Validate pointer */
> +    append ? SvPOK_only_UTF8(sv) : SvPOK_only(sv);
>  
>      if (PL_curcop == &PL_compiling) {
>  	/* we always read code in line mode */
> @@ -6546,8 +6546,6 @@ screamer2:
>  check_utf8_and_return:
>      if (PerlIO_isutf8(fp))
>  	SvUTF8_on(sv);
> -    else
> -	SvUTF8_off(sv);
>  
>      return (SvCUR(sv) - append) ? SvPVX(sv) : Nullch;
>  }
> -------------------------------------------------------------------

Thanks., That passes the tests I sent. However the reason why I still
don't know how to make a good patch is because of the other permutations.
I've just tried testing them, and they fail

I create my file like this:

$ ./perl -le 'binmode STDOUT, ":utf8"; print chr 256' >testutf8

these pass, as I'd expect, on 5.8.0 and on blead with your patch

$ perl5.8.0 -lwe '$_ = chr 127; binmode STDIN, ":utf8"; $_ .= <STDIN>; print ord $_' <testutf8 
127
$ ./perl -lwe '$_ = chr 127; binmode STDIN, ":utf8"; $_ .= <STDIN>; print ord $_' <testutf8 
127

these fail:

$ perl5.8.0 -lwe '$_ = chr 128; binmode STDIN, ":utf8"; $_ .= <STDIN>; print ord $_' <testutf8 
Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in ord at -e line 1, <STDIN> line 1.
0
$ ./perl -lwe '$_ = chr 128; binmode STDIN, ":utf8"; $_ .= <STDIN>; print ord $_' <testutf8 
Malformed UTF-8 character (unexpected continuation byte 0x80, with no preceding start byte) in ord at -e line 1, <STDIN> line 1.
0


Is it as simple as putting a check on whether the file handle is flagged
as UTF8, and if it is upgrading the existing scalar?

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About