develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on the almost impossibility to write correct XS modules

Thread Previous | Thread Next
From:
Glenn Linderman
Date:
May 21, 2008 10:17
Subject:
Re: on the almost impossibility to write correct XS modules
Message ID:
4834590C.5040708@NevCal.com
On approximately 5/21/2008 8:56 AM, came the following characters from 
the keyboard of demerphq:
> 2008/5/21 Glenn Linderman <perl@nevcal.com>:
>> On approximately 5/21/2008 1:29 AM, came the following characters from the
>> keyboard of Rafael Garcia-Suarez:
>>
>>> Some way to mark PVs as "binary" and not upgradeable to SvUTF8 would be
>>> handy, though.
>>
>> What's the goal?
>>
>> If, during the lifetime of a binary string, data gets attached to it that
>> makes it get upgraded, and later that data is detached, and the storage
>> format is truly transparent, then when the string is used in a context that
>> needs bytes, it should be handled properly (if not, let's fix that bug),
>> either by downgrading, or by accessing the data and validating that the
>> values are each < 256 (which downgrading does as a side effect).
> 
> So how would that work exactly? Seriously. Give a general framework
> about how it would work. Consider that if it makes things massively
> slower that its probably not going to fly.


So where are the places where perl string operations need bytes?  O (of 
I/O) springs to mind.  Module Encode::Decode springs to mind.

O already warns if a data item contains values > 255.  Not sure how that 
is implemented, but since it already does it, it seems there is little 
added cost.

Decode is already looking at data byte-by-byte.  Changing that to 
character by character, and bounds checking the values doesn't seem like 
a huge added cost.

Others?


>> If the goal is to prevent the cost of upgrading and downgrading, well, just
>> fix the bug that attached the upgraded data... and the cost of doing so also
>> vanishes.
> 
> I dont think its so easy. The code responsible may be very hard to identify.


Especially when the storage format is truly transparent, the responsible 
code may be very hard to identify.  I don't think, though, that it would 
be necessary to remove utf8::is_utf8, but switching it to diagnostic 
only, would allow code to be instrumented to discover where data is 
upgraded... via binary search instrumentation, breadth first tree 
searching, etc.  This would allow it to be tracked down when necessary. 
  The thing is, if your string _is_ byte-oriented, any operation that 
upgrades it truly is a bug, and it should be tracked down.

So if it is cheap to add a flag to prevent upgrades, and produce errors 
or warnings at the point of upgrade attempts, maybe that is OK, but 
correct code wouldn't need the checks, as far as I can see.  So that's 
why I wondered what the goal was...


> Yvesb


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About