develooper Front page | perl.perl5.porters | Postings from July 2011

GSOC Status Report, Week 9

Thread Next
Brian Fraser
July 27, 2011 01:41
GSOC Status Report, Week 9
Message ID:
Howdy people.

Last week I took a couple of extra stabs at rebasing, and now the repo is in
fairly better shape - Reviews are welcome as usual! I also dived into mg.c
(as it shared the same magic-granting errors that I tackled in gv.c the
previous week) and toke.c (though nothing much is ready for pushing at the
moment). Cleaning up attributes turned out to be trivial, so that's done.

Finally, in the last few days, I've been taking a look at string eval, with
much appreciated handholding from Zefram : )
These two programs show the crux of the issue:

perl -CS -E 'use utf8; my $prog = "say qq!\x{f9}!"; eval $prog;
utf8::upgrade($prog); eval $prog;'
perl -le 'use utf8; print eval "q!\360\237\220\252!" eq eval "q!\x{1f42a}!"

On the former, to paraphrase Zefram, "use utf8;" shouldn't affect the
correctness of the evaled program. And the latter shows why eval shouldn't
pay attention to the hints of it's enclosing scope, but only of the scalar
passed in.
Fixing this, however, steps on a landmine; It's not particularly
backwards-compatible. When working correctly, the first eval in the first
program stops being a syntax error, and the eq in the second program returns
false. And things that expected to pass octets to eval and have them
interpreted as UTF-8 will now suddenly break; There are at least two such
occurrences in the test suite right now, both which are kind of buggy on
their own right: op/utfhash.t, which reads from DATA without setting an
encoding on the filehandle and passes the return value to eval, expecting
them to be interpreted as UTF-8 (Which is a mortal sin, don'tyouknow; To
quote Tom Christiansen, "If you have a DATA handle, you must explicitly set
its encoding."), and lib/utf8.t, which admittedly I haven't given more than
a cursory glance, but I get the feeling it's testing things in a completely
backwards way, by using "use utf8/no utf8" inside string evals to test how
UTF-8 hash keys work (that also makes me wonder whenever the tests are
misplaced, as there's a op/utfhash.t).

As for this week, I'll continue working on string eval and reviewing the
GV/stash stuff, plus tackling whatever toke.c throws at me (I already did a
bit of work on the tokenizer in other branches while working on GVs, so
chances are I'll also go back and see if there's anything usable there).
And finally, the sidequest of the week involves ironing out a few wrinkles
that pop out when using UTF-8 labels with characters in the latin-1 range.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About