develooper Front page | perl.perl5.porters | Postings from July 2011

Re: GSOC Status Report, Week 5

Thread Previous | Thread Next
Brian Fraser
July 3, 2011 16:06
Re: GSOC Status Report, Week 5
Message ID:
On Sun, Jul 3, 2011 at 5:40 PM, Father Chrysostomos <> wrote:

> I’m not so sure that that *is* wrong. Having a UTF8-flagged _ is harmless.
> If toke.c has to check for non-ASCII characters to determine when to pass
> the flag, how is that different from having share_hek do the check, from an
> efficiency standpoint?
Functionally, it has no impact, I think. Even with globs flagged liberally
like that, all of the test suite passes; It simply looks messier.
As far as efficiency goes, I don't know. It depends on how slow using
bytes_from_utf8 is (which share_hek calls for UTF-8 flagged pvs), compared
to discarding the flag early with !is_ascii_string() && is_utf8_string().

> See above. I think you’re spreading too much complexity around
Well, that's quite likely :)

That patch you are suggesting is the wrong way to do it. It means that a
> string containing "\xFF" won’t be encoded as UTF-8, but add one \x{100}, and
> the \xFF is suddenly encoded differently. While it may be OK for perl’s own
> tests (it not being a public API), I think it will lead to badly written
> tests later on.
> You could add a new function or a new argument to the appropriate function
> in But to keep things simpler it might be better to call
> utf8::encode on the program before passing it to That way people
> reading the tests can see exactly what’s happening.
Hm, I hadn't thought about any of that. Sounds good, utf8::encode() it is!

Concerning the SvUTF8 flag on GVs (that’s what you mean, isn’t it), I just
> had a look in gv.h, and I don’t see any UTF8 flag that goes in GvFLAGS.
> (Based on what you said earlier, I had assumed there was, without even
> looking.) Am I missing something?
> If I’m correct, then you can use SvUTF8(gv) itself to store the UTF8 flag
> and your problem is solved. Any time a GV is copied, however, you need to
> make sure that flag is copied too, just as for strings.
It is; Sorry for the fuzzyness in that mail, I should've had a coffee before
writing it.
You could set the UTF8 flag on a GV, but that's problematic by itself. You
end up with a proxy flag that can mean several different things, and may not
even be correct; If either the stash or the gv changes, you have to change
the flags on the GV, but what if you only have the stash at hand?

It's certainly doable though. Just requires a bit more work, and I assume
more discipline from extension writers - i.e."if you mess with a GV, don't
forget to update the flags."

The alternative would be to do something like this in sv.h:
#define SvUTF8(sv)   (isGV(sv) ? ((GvNAMEUTF8(sv) || (GvSTASH(sv) &&
HvNAMEUTF8(GvSTASH(sv)))) ? SVf_UTF8 : 0) : (SvFLAGS(sv) & SVf_UTF8))
...But that's going to slow things down.

Oh, since we are at it, question. I have a HEK. Doing this fails when it has
UTF-8 data:
hv_common(isa, NULL, HEK_KEY(canon_name), HEK_LEN(canon_name),
HEK_FLAGS(canon_name), HV_FETCH_ISEXISTS, NULL, HEK_HASH(canon_name));

But this succeeds:
hv_common(isa, NULL, HEK_KEY(canon_name), HEK_LEN(canon_name),

Does that mean that I'm storing the wrong hash somewhere?

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About