develooper Front page | perl.perl5.porters | Postings from January 2001

Re: Not OK: perl v5.7.0 +DEVEL8325 on os390 05.00 (UNINSTALLED)

From:
Jarkko Hietaniemi
Date:
January 9, 2001 09:13
Subject:
Re: Not OK: perl v5.7.0 +DEVEL8325 on os390 05.00 (UNINSTALLED)
Message ID:
20010109111312.B6541@chaos.wustl.edu
> > Note too the "B" characters inserted into the C<print "ok\n";> statements
> > that are a part of the comp/require.t test.  See my original report
> > for things like:
> > 
> > String found where operator expected at bleah.pm line 1, near "BpBrBiBnBt
> > "BoBk"
> >         (Do you need to predeclare BpBrBiBnBt?)
> > 
> > [answer: no I do not need to predeclare BpBrBiBnBt I need for perl to not
> > try to improperly utf8-ize C<print> :-]
> > 
> > $ echo B | od -t d1
> > 0000000000   194  21
> > 0000000002
> > 
> > Peter Prymmer
> > 
> comp/require.t is doing utf tests like:
> 
> # UTF-encoded things
> my $utf8 = chr(0xFEFF);
> $i++; do_require(qq(${utf8}print "ok $i\n"; 1;\n)
> 
> Should EBCDIC machines support utf8 ?

Well, yes, we are working rather hard to reach that goal :-)

> Here some explanation about the cause of the problem.
> The test script build the bleah.pm with following contents
> in Hex written:ef bb bf c2 97 c2 99 c2  89 c2 95 c2 a3 40 7f c2 96 c2 92 40
> c3 b1 15 7f  5e 40 c3 b1 5e 15
> The first three characters (\ef \bb \bf) are the utf8-mark followed by the
> "BpBrBiBnBt "BoBk"
> We can see that EBCDIC characters are utf-8-ized.

If the input stream is utf8 or UTF-16[LB]E I guess Unicode should be
assumed, even in EBCDIC hosts.  This might, in turn, mean using perlio?

> In toke.c the first three bytes are removed (in swallow_bom) but utf-8-ized
> characters are not converted back to normal bytes.
> 
> -- Ignasi Roca 

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About