develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on the almost impossibility to write correct XS modules

Rafael Garcia-Suarez
May 20, 2008 07:39
Re: on the almost impossibility to write correct XS modules
Message ID:
2008/5/20 Juerd Waalboer <>:
>> Now, at the perl language level, I think the problem we have is that
>> we sometimes want uc, lc or //i to have Unicode semantics, and sometimes
>> not. (other operations here ?)
> Er, why "sometimes not"?
> Why would you uppercase something that's not text?
> I suggest that we keep the possibility to uppercase only the ASCII
> character range, and call that ASCII::uc(), while the normal uc() is
> made Unicode compliant regardless of the PV's state.
> Maybe this should even be called Unicode::uc(), and uc() should
> "default" to Unicode, with "use ASCII qw(uc);" and "use Unicode qw(uc);"
> as ways to override the default.

Likewise for char-classes ? So that's the unicode pragma I was talking
about, with possible sub-pragmas :
    use unicode qw(uc);
    no unicode qw(regex);

>>   Additionally, we can add a regexp flag qr//u, that says "this
>>   regexp matches with Unicode semantics". (I'm thinking out loud
>>   here)
> I have suggested /u(nicode), /a(scii) before. These are "needed" in
> addition to the pragma, because of qr//: there must be a way to
> stringify the lexically selected behavior so it survives the end of the
> lexical scope.

What about (?u:...) ? What about mixing qr//u and qr//a in the same
match ?

>> * Drop relying on the SvUTF8 flag to choose whether Unicode semantics
>>   should be applied. Big change, not backwards compatible, but IMO
>>   needed for sanity.
> Yes!
> However, there's also a way to

Missing sentence ?

>> But sometimes we want perl to magically switch between Unicode and
>> non-Unicode semantics depending on the data it's handling.
> No, we don't want this to happen MAGICALLY. Or at least I really do not
> want Perl to do that. This is one place where DWIM heuristics simply
> cannot work.

I see your point. Sometimes I'm thick.

>> Does that mean that we need to add a new kind of data to perl,
>> "Unicode SV" ?  Will that solve problems ? What problems will this
>> create ?
> Indeed there could be a way to indicate "I intend this string to be a
> byte string". I have a module, called, in the works that makes
> this very easy. I'll try to release it really soon so you can have a
> look.

Didn't you talk about it at one of the meetings ?

> Because of the way BLOB works, it could probably be used by XS and core
> code too. BLOB assumes that everything is text until explicitly marked
> as binary.

Indeed, the symmetrical alternative to a "Unicode SV" would be a "Binary
SV". But Unicode SVs look less appealing to me if we force Unicode
semantics on everything and don't apply heuristics depending on the data
type. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About