On Sat, Apr 26, 2008 at 05:33:02PM +0200, Rafael Garcia-Suarez <rgarciasuarez@gmail.com> wrote: > > LOG ("see this object:"); > > LOG ($obj); > > # $obj no longer an object here, it became a string > > You mean, like "Class=HASH(0xDEADBEEF)", I suppose (I haven't checked). Yes, for example references. > > I think the situation with unicode and cpan perl modules cannot improve > > as long as it so difficult to do somethign as simple as get at the string > > data in a non-random/godgiven encoding. > > That's right, and that's probably also why people find it difficult to > handle utf-8 in perl as soon as they begin using XS modules. Yes, it becomes unpredictable. It is also not very helpful when one has to fight for every bugfix regarding unicode in perl (the bugfix could break existing code), and when everybody mentally uses and propagates a slightly different unicode model. The biggest problem with perl and unicode is that it isn't consistent, and little is done to make it consistent. And little can be done from the side of XS to make this work. Now, to make this mail a bit more helpful, which flags would I need to check to achieve the following: 1. if the SV can safely be SvPVutf'd, just do it 2. if not, SvPVutf8'd a mortalcopy i.e., how could I check that calling SvPVutf8 (or bytes) on a scalar is "safe" in the sense of not modifying it w.r.t. to perl visibility? > I think that this screams for a new macro which would be more or less > the one you suggested here, maybe implemented in a more efficient way if > possible. Well, I think that would be a very bad way to solve the problem. For example, in the reference case, the stringified non-utf-8 reference is still valid utf-8, so we don't have to create an sv with an utf-8 string. (if backwards-compatibility is a problem, name the macro differently, but both solutions would require that). > You mean this one ? > T_PV > $var = ($type)SvPV_nolen($arg) > Here, the result would be dependent on the internal representation of > the string in perl, so I suppose you would like to change this to > something that uses SvPVbyte. I wonder, however, how much code would > break with that change. Actually I suspect that more code would be fixed > than breaked, but that's a wild intuition... Thats my point: all modules that haven't been updated would be fixed. The problem that people out there have with unicode in perl is that it doesn't work with their favourite module. When it doesn't work, they start to google and find stuff about this utf-8 flag that they then want to manipulate, thinking unicode==utf-8 flag, and this doesn't work too well either, so they get frustrated etc. I keep giving talks about perls unicode model, but it is hard to explain it without also saying that it doesn't work in practise, and everything is immensely complicate din practise, and.... Now, as for changing T_PV: what could be done trivially (I'd happily send a patch) would be to provide typedef char char_bytes; typedef char char_utf8; char_utf8 T_PVutf8 char_bytes T_PVbytes *preferably* with the mortalcopy trick or a more efficient/perlish solution to the SvPVutf8 problem. Changing T_PV itself is something that should really be considered, though. Code that breaks almost certainly exists, but isn't it somewhat quetsionable anyways to use "char *" in perl and then get access to the flag in other ways (e.g. by counting agruments and using ST(n))? the point is, code that uses char * without looking at the flag in other ways is simply broken, it has no defined/documentable behaviour on the perl level. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / pcg@goof.com -=====/_/_//_/\_,_/ /_/\_\