develooper Front page | perl.perl5.porters | Postings from March 2012

[perl #5956] Perl_warner isn't utf8 aware

Thread Previous | Thread Next
From:
Nicholas Clark via RT
Date:
March 29, 2012 08:26
Subject:
[perl #5956] Perl_warner isn't utf8 aware
Message ID:
rt-3.6.HEAD-4610-1333034773-1463.5956-15-0@perl.org
On Sun Sep 12 18:09:11 2010, sprout wrote:
> On Wed Jul 28 20:23:36 2010, greerga wrote:
> > On Sun Aug 24 11:25:14 2003, nicholas wrote:
> > > The latter bug is still present on 5.6.1:
> > > 
> > > $ perl5.6.1 -wle '$a=v240.257; $a=substr($a,0,1); $a = -$a;'
> > > $ perl5.6.1 -wle '$a=v240; $a=substr($a,0,1); $a = -$a;'
> > > Argument "M-p" isn't numeric in negation (-) at -e line 1.
> > > 
> > > The former bug is fixed by 5.8.0, at least for the case of the numeric
> > > conversion warning.
> > 
> > The perl 5.10.1 that comes with my Ubuntu Linux as well as standard
> > 5.13.3 and 5.12.1 are printing for those two lines:
> > 
> > Argument "\x{f0}" isn't numeric in negation (-) at -e line 1.
> > 
> > http://rt.perl.org/rt3/Ticket/Display.html?id=5956
> 
> Does that mean this can be marked as resolved?

Well, re-resolved, as Jarkko originally marked it as resolved,
when he made this change to fix it:

commit 8eb28a70b2ec19f2782a68fd1ccf1a9a24131140
Author: Jarkko Hietaniemi <jhi@iki.fi>
Date:   Tue Oct 23 22:19:34 2001 +0000

    Negation and Unicode: sort of solves 20010303.010,
    except not quite like reported in the Subject
    (Perl_warner is still utf8-ignorant).
    
    p4raw-id: //depot/perl@12614

 pp.c              |   21 ++++++++++++++-------
 t/lib/warnings/sv |    8 ++++++++
 2 files changed, 22 insertions(+), 7 deletions(-)


What's strange is that prior to that commit, post 5.6.0, the warning
had been eliminated. Turns out to have been by this seemginly
unrelated commit:

commit 3bd709b1a63d554f3d98d5394be78ed628eb46da
Author: Peter Prymmer <PPrymmer@factset.com>
Date:   Thu Mar 8 08:23:25 2001 -0800

    Re: Unicode/EBCDIC
    Message-ID:
<Pine.OSF.4.10.10103081617390.377472-100000@aspara.forte.com>
    
    p4raw-id: //depot/perl@9082

 perl.c |    6 +-
 perl.h |  216
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 sv.c   |    4 ++
 toke.c |   10 ++-
 utf8.c |    4 +-
 utf8.h |   19 ++++++
 6 files changed, 251 insertions(+), 8 deletions(-)

which doesn't *seem* to touch anything in the relevant code paths.
(But I have checked manually - it did do it. Yet more strange
action at distance. Crazy codebase.)

On Wed Jan 11 09:58:17 2012, sprout wrote:
> On Sat Dec 07 14:31:20 2002, nick@unfortu.net wrote:
> > On Sat, Dec 07, 2002 at 06:16:52PM -0000, Jarkko Hietaniemi wrote:
> > > The original difference between the warning being there or not has
> > > been resolved in Perl 5.8.0-- one gets
> > > Argument "\x{f0}" isn't numeric in negation (-) at -e line 1.
> > > in both cases.
> > > As regards to the Perl_warner not being utf8-aware, I don't know,
> > > is there a problem?  I think if the STDOUT is UTF-8 aware, we can
> > > just spit out UTF-8?
> > > So I'm marking the problem ticket as resolved.
> > 
> > The prototype is:
> > 
> > void
> > Perl_warn(pTHX_ const char *pat, ...)
> > 
> > How does it know whether the 8 bit values I'm passing are utf8, or
latin 1
> > bytes that also happen to represent a valid utf8 sequence?
> 
> I don’t know when it was added, but we have SVf for precisely this.  In
> 5.15.4, almost all error messages with symbol names were changed to use
> SVf or the new HEKf instead of %s, so I think this ticket can be resolved.

The macro SVf was added in 2000:

commit 894356b32151f778d4d2915c6db38e5d049b115a
Author: Gurusamy Sarathy <gsar@cpan.org>
Date:   Sat Jan 22 10:06:53 2000 +0000

    add patch for printf-style format typechecks (from Robin Barker
    <rmb1@cise.npl.co.uk>); fixes for problems so identified
    
    p4raw-id: //depot/perl@4836

but I know that the definition of a format for SVs goes back much
further...

5.004 it seems (commit fc36a67e8855d031, in April 1997)

However, I think that the historic problem was that a lot of code
called /warn/ and /croak/ functions with char * pointers, and at
the time Jarkko and I were assuming that the logical fix was to have
some way of saying "char * but UTF-8". Probably partly because we'd
not thought of SVf, but I'm guessing also that because in at least
some places (at the point of calling /warn/ or /croak/) the value
about to be complained about was in a char *.

I'm guessing that these days, after a lot of UTF-8 cleanup, the value
is often already in an SV that has been passed down far enough, as
half the problem (alluded to in parens above) is that the
"UTF-8 or not" had already been lost earlier in the C call stack,
when something naively used SvPV() without caring about SvUTF8().

So yes, both problems referenced in this bug are resolved.
I'm sure specific issue with UTF-8 wrongness remain, but thanks to
you, Brian, Karl and others there are far far fewer. And any little
critters still existing deserve their own tickets.

Nicholas Clark


---
via perlbug:  queue: perl5 status: resolved
https://rt.perl.org:443/rt3/Ticket/Display.html?id=5956

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About