develooper Front page | perl.perl5.porters | Postings from May 2011

GSOC Status Report: Week 1

From:
Brian Fraser
Date:
May 30, 2011 16:33
Subject:
GSOC Status Report: Week 1
Message ID:
BANLkTi=u+a9q_ZxU3qY9NnNWGToAzVTujg@mail.gmail.com
Nicholas and rafl requested that I send this to p5p too, so all blame goes
to them :)

For those not into the loop, howdy! My name is Brian, or Hugmeir on irc, and
I'm working on getting the core UTF-8 clean.

That's about it. Feel free to tear and shred to your heart's content!

---------- Forwarded message ----------
From: Brian Fraser <fraserbn@gmail.com>
Date: Mon, May 30, 2011 at 6:12 PM
Subject: Status Report: Week 1
To: tpf-gsoc-students@googlegroups.com
Cc: zefram@fysh.org, public@khwilliamson.com, rafl@debian.org,
sprout@cpan.org


Howdy all.

As usual, dailyish progress is in github/Hugmeir/hugmeir_gsoc, specifically
in the stash_clean branch. I don't feel comfortable with what I've done so
far as to push it into the reviews repos (primarily because it's more of a
prototype than anything else, but also because I accumulated about ninety
TODOs in just this weekend), but here's a snapshot of what I've been working
on, and some areas that I need some input on.

After the previous progress report I got quite a bit of comments for the pad
branch, and most of the corrections in response to that are already in the
pad review repo. The only things not pushed yet are one of Reini's
corrections, and the bit discussed in my previous mail.

Unfortunately, the unflagged OP from last week took me until Friday or so to
actually track down - a gv_fetchpvn_flag where I mistakenly passed the UTF-8
flag as the fourth parameter, not the third - so I used the weekend as a
sort of test run; advancing as far as I could, only changing what was needed
to get things working (not necessarily working right, for instance, I
mortalized next to nothing) to get a feel for whatever rough spots I'll need
some mentoring in :) As such, most of the progress you can see in
/stash_clean is little more than a lie built on matchsticks, but ultimately
an extremely helpful lie. A rough summary:

Stash names were already HEKs, so what I had expected would be a cyclopean
tasks turned out to be pretty much done for me instead - With the flag now
being passed to gv.c, and hv_name_set and hv_ename_add already taking flags
parameters, the biggest ordeal was actually going through all the things
that assume stash names as byte strings. Some of those are still unfinished
(primarily in mro and other things dealing with ISA, but also error
messages). I haven't checked to what extent dist/ and ext/ remain unclean,
but that's a can of worms I'm not opening yet.
Another thing that assumes chars is op.h and cop.h in threaded builds. The
former I've only skimmed, but it appears to be doing something similar to
cop.h, plus savesharedpv() magic that I haven't dug too deep into.
In order to fix caller(), which currently returns unflagged SVs for the
package and sub name, I loosely bandaged cop.h by adding a U32
cop_stashflags member to the cop struct (for threaded builds). This seems to
be holding on fine, but may be causing a segfault in op/stash.t (look for
\&{"six"}) - This is the one big flaring regression in the stuff I've done
this weekend.

Also, a couple of things still need work on in hv.c itself, primarily some
memEQ()s, but this doesn't look troublesome.

Moving on, GV names were also in HEKs, but this is more of a mixed blessing
- More on that in a moment. Storing and fetching the correct GV, past that
unflagged OP issue, went quite well. Barewords didn't take much longer to
get clean, either. Since they are closely related, I also got __PACKAGE__
and package UTF8; working. There appears to be a regression of sorts here
too, with some harmless uninitialized variable warnings showing up in a
couple of modules; I have yet to track these down.

Since package UTF8; and GVs names can now be in UTF-8, I did some mending on
gv_ fetchmeth*, ref(), and bless() - Unexpectedly (to me), the latter's pp
entry assumed byte strings, even though sv_bless simply takes an SV. In any
case, simple UTF-8 method fetching is working now (i.e. all of the weird
behavior reported by Jured in #31991 have been fixed, but now three tests in
op/method.t fail - One a deprecation warning regarding inherited AUTOLOAD,
which I haven't yet touched, and two output an incomplete warning when a
method is not found). As I didn't want to get into mro.c, that remains
undone. I think that use parent/base is dual-lifed, so I won't be working on
those for the time being.
Meanwhile, ref() depends on sv_reftype(), which returns chars, so I'd like
to suggest adding a sv_reftypesv(). For the time being, I've just added some
checking in pp_ref to get the flag from the object.

I also worked on AUTOLOAD for a while, and I'm while pretty sure that
autoloading functions works (i.e. it passes all tests regardless of
UTF-8ness! :), I haven't really tested autoloading methods beyond the bare
minimum - technically, though, it should churn along more or less fine with
UTF-8 packages.
Strangely, somewhere along the road tie broke down when it fetched
AUTOLOADed methods; The solution was adding a G_AUTOLOAD flag to a
gv_fetchmethod call in pp_tie, however, I'm puzzled as for why it suddenly
requires this.

The GV's name being in HEK is a mixed blessing in the sense that the UTF-8
flag is stored in the HEK, not SvFLAGS, which means that SvUTF8 will always
return false on a GV. This is problematic, as things like SvPV_nolen work
without a hitch on a gv, returning, I think, GvNAME_get(), and several areas
of the core treat SVs and GVs the same in that regard. Case point: The
stringification of GVs happens primarily in three places; All three get a SV
parameter, and use SvUTF8() or DO_UTF8() to get a correctly flagged result,
which will never happen with GVs.
Somewhat expectedly, manually fixing those three was nowhere near enough:
pp_seq (all of the pp_s* operations really) also makes use of SvUTF8(). This
means that *Lèon eq "*main::Lèon" will not return true, even though
("".*Leòn) eq "*main::Lèon" will.

The quick solution would be to also check for isGV(sv) && GvNAMEUTF8(sv) in
SvUTF8(), but I'm unsure about changing a macro used so often as SvUTF8().

A final tidbit on universal.c; Getting ->can clean didn't take much, but
->isa and ->DOES will remain untouched until mro.c is cleanish.

Finally, I tried merging with the pad branch to see if 'our' would be clean
on its own, but no such luck, and the weekend came to an end before I could
look too far into it.

For this week, I'll start by figuring out what is causing that stash
segfault - If it's not the cop.h changes, then I can apply the same thing
for op.h too. Beyond that, I intend to fix all of the TODOs regarding memEQs
and related functions (stashpv_hvname_match comes to mind), and some toke.c
functions that look at UTF instead of being passed the flag. Some of the
changes from the weekend - package UTF8; and __PACKAGE__, for instance, or
HEK flag copying on glob assignment, are probably good enough to stay, so
I'll also take a time to review those.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About