2008/5/20 Juerd Waalboer <juerd@convolution.nl>: >> Now, at the perl language level, I think the problem we have is that >> we sometimes want uc, lc or //i to have Unicode semantics, and sometimes >> not. (other operations here ?) > > Er, why "sometimes not"? > > Why would you uppercase something that's not text? > > I suggest that we keep the possibility to uppercase only the ASCII > character range, and call that ASCII::uc(), while the normal uc() is > made Unicode compliant regardless of the PV's state. > > Maybe this should even be called Unicode::uc(), and uc() should > "default" to Unicode, with "use ASCII qw(uc);" and "use Unicode qw(uc);" > as ways to override the default. Likewise for char-classes ? So that's the unicode pragma I was talking about, with possible sub-pragmas : use unicode qw(uc); no unicode qw(regex); >> Additionally, we can add a regexp flag qr//u, that says "this >> regexp matches with Unicode semantics". (I'm thinking out loud >> here) > > I have suggested /u(nicode), /a(scii) before. These are "needed" in > addition to the pragma, because of qr//: there must be a way to > stringify the lexically selected behavior so it survives the end of the > lexical scope. What about (?u:...) ? What about mixing qr//u and qr//a in the same match ? >> * Drop relying on the SvUTF8 flag to choose whether Unicode semantics >> should be applied. Big change, not backwards compatible, but IMO >> needed for sanity. > > Yes! > > However, there's also a way to Missing sentence ? >> But sometimes we want perl to magically switch between Unicode and >> non-Unicode semantics depending on the data it's handling. > > No, we don't want this to happen MAGICALLY. Or at least I really do not > want Perl to do that. This is one place where DWIM heuristics simply > cannot work. I see your point. Sometimes I'm thick. >> Does that mean that we need to add a new kind of data to perl, >> "Unicode SV" ? Will that solve problems ? What problems will this >> create ? > > Indeed there could be a way to indicate "I intend this string to be a > byte string". I have a module, called BLOB.pm, in the works that makes > this very easy. I'll try to release it really soon so you can have a > look. Didn't you talk about it at one of the Amsterdam.pm meetings ? > Because of the way BLOB works, it could probably be used by XS and core > code too. BLOB assumes that everything is text until explicitly marked > as binary. Indeed, the symmetrical alternative to a "Unicode SV" would be a "Binary SV". But Unicode SVs look less appealing to me if we force Unicode semantics on everything and don't apply heuristics depending on the data type.Thread Previous | Thread Next