develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on the almost impossibility to write correct XS modules

Thread Previous | Thread Next
Rafael Garcia-Suarez
May 20, 2008 07:39
Re: on the almost impossibility to write correct XS modules
Message ID:
2008/5/20 Juerd Waalboer <>:
>> Now, at the perl language level, I think the problem we have is that
>> we sometimes want uc, lc or //i to have Unicode semantics, and sometimes
>> not. (other operations here ?)
> Er, why "sometimes not"?
> Why would you uppercase something that's not text?
> I suggest that we keep the possibility to uppercase only the ASCII
> character range, and call that ASCII::uc(), while the normal uc() is
> made Unicode compliant regardless of the PV's state.
> Maybe this should even be called Unicode::uc(), and uc() should
> "default" to Unicode, with "use ASCII qw(uc);" and "use Unicode qw(uc);"
> as ways to override the default.

Likewise for char-classes ? So that's the unicode pragma I was talking
about, with possible sub-pragmas :
    use unicode qw(uc);
    no unicode qw(regex);

>>   Additionally, we can add a regexp flag qr//u, that says "this
>>   regexp matches with Unicode semantics". (I'm thinking out loud
>>   here)
> I have suggested /u(nicode), /a(scii) before. These are "needed" in
> addition to the pragma, because of qr//: there must be a way to
> stringify the lexically selected behavior so it survives the end of the
> lexical scope.

What about (?u:...) ? What about mixing qr//u and qr//a in the same
match ?

>> * Drop relying on the SvUTF8 flag to choose whether Unicode semantics
>>   should be applied. Big change, not backwards compatible, but IMO
>>   needed for sanity.
> Yes!
> However, there's also a way to

Missing sentence ?

>> But sometimes we want perl to magically switch between Unicode and
>> non-Unicode semantics depending on the data it's handling.
> No, we don't want this to happen MAGICALLY. Or at least I really do not
> want Perl to do that. This is one place where DWIM heuristics simply
> cannot work.

I see your point. Sometimes I'm thick.

>> Does that mean that we need to add a new kind of data to perl,
>> "Unicode SV" ?  Will that solve problems ? What problems will this
>> create ?
> Indeed there could be a way to indicate "I intend this string to be a
> byte string". I have a module, called, in the works that makes
> this very easy. I'll try to release it really soon so you can have a
> look.

Didn't you talk about it at one of the meetings ?

> Because of the way BLOB works, it could probably be used by XS and core
> code too. BLOB assumes that everything is text until explicitly marked
> as binary.

Indeed, the symmetrical alternative to a "Unicode SV" would be a "Binary
SV". But Unicode SVs look less appealing to me if we force Unicode
semantics on everything and don't apply heuristics depending on the data

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About