develooper Front page | perl.perl5.porters | Postings from June 2011

GSOC Status Report, Week 5

Thread Next
Brian Fraser
June 29, 2011 22:36
GSOC Status Report, Week 5
Message ID:
Howdy all.

Apologies for the late report. I screwed up and git cleaned/reseted away a
bunch of mro tests and some changes, so I wanted to make up for the lost
work before writing this.

mro is pretty much finished - A part of the tokenizer that isn't clean is
causing some tests to die under strict, which I have mended locally to get
the desired behavior, but it'll probably be saner to turn off strict in the
final version, and leave toke.c changes to whenever I actually tackle that.

->can, ->isa, call_method() and gv_fetchmethod_(autoload|flags) are now both
UTF-8 and null clean; I forgot about ->DOES, so I'll be doing that soon, but
I don't expect it to be problematic. There's a couple of error messages
regarding versions that are still undone, as are error messages for scalar
filehandles (I can't quite recall the details now, but last I checked I
recall suspecting they might fix themselves post-merge with the pad stuff)
and things dealing with SvUTF8() on globs.

Beyond that, I think there's only one thing pending implementation for
stashes and GVs, which is somewhat related to the SvUTF8() issue: When to
Specifically: Right now, when initializing a stash/GV, [hg]v_name_set calls
share_hek(), which does the downgrading as necessary. And that appears to be
sufficient, as the rest of the gv/hv code is robust enough to handle
whatever you throw at it.  Should I rely on that? The pad had to jump
through a few hoops because of this issue, but it doesn't deal in heks.

But more importantly, it brings up another issue: When should I pass in the
flag (from toke.c)? Simply doing UTF ? SVf_UTF8 : 0 is obviously wrong
(you'll end up with a UTF-8 flagged _, for instance). I have a branch with a
macro (unimaginatively named UTF_T) in toke.c that replaces those UTF with
something like UTF_T(s, len, UTF), where UTF_T is defined as
#define UTF_T(s,len,u) (u && !is_ascii_string((const U8*)s, len) &&
is_utf8_string((const U8*)s, len))
Something like that - name notwithstanding - appears to make do just fine,
but I've been wrong before.

On a less than cheerful note, I've fallen behind schedule with tests,
particularly regarding the new versions of several functions. I'll try to
get back on track asap.

Also, I seem to have introduced a bug somewhere along the road that's
breaking the test suite depending on how it's run, which is stopping me from
celebrating anything; So, while I fix that, here are two semi-contentious
changes that need addressing:

First, S_not_a_number calls sv_uni_display with a 0 for its flags. This
means that now, if you try to do something like 1 + *Lèon, you'll get a
fairly useless warning ('Argument "\x{2a}\x{6d}..." isn't numeric', etc,
where \x{6d} is the 'm' of *main::Lèon).
Would there be any objections to passing in UNI_DISPLAY_ISPRINT instead? It
would get the right behavior for globs ('Argument "*main::L\x{e8}..." isn't
numeric'), but it _will_ break tests outside of the core. Uh, probably.

Second, I need a change like this in
diff --git a/t/ b/t/
index 99d77cc..d9b9432 100644
--- a/t/
+++ b/t/
@@ -750,9 +750,7 @@ sub _fresh_perl {
     $runperl_args->{progfile} = $tmpfile;
     $runperl_args->{stderr} = 1;

+    my $mode = $prog =~ /\P{ASCII}/ ? '>:utf8' : '>';
+    open TEST, $mode, $tmpfile or die "Cannot open $tmpfile: $!";
-    open TEST, ">$tmpfile" or die "Cannot open $tmpfile: $!";

     # VMS adjustments
     if( $is_vms ) {

(Though I suppose that \p{} might be a bit too much for Maybe
utf8::is_utf8($prog) will make do? Or a mode param?)
Essentially, I require some way to say that it should write the program with
one layer or another, otherwise, I'll have to split all the fresh_perl tests
with UTF-8 into different files, and I dunno what the policy for changes to is.

That's about it I think. This week (shortened as it is due to the lateness
of this report) I'll be finishing the last TODOs, keep on writing tests, and
start reviewing things for a preliminary version. If we can reach a
consensus on the SvUTF8() & flag passing issues (the latter, I suppose,
could be left as it is until I get to toke.c), then aiming for a "cleanup
done, review at will" mail next week isn't entirely insane. /motivation

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About