develooper Front page | perl.perl5.porters | Postings from March 2001

Re: use bytes; - what does/should it mean?

Thread Previous | Thread Next
From:
Jarkko Hietaniemi
Date:
March 12, 2001 07:15
Subject:
Re: use bytes; - what does/should it mean?
Message ID:
20010312091423.M3966@chaos.wustl.edu
On Mon, Mar 12, 2001 at 02:55:01PM +0000, Simon Cozens wrote:
> On Mon, Mar 12, 2001 at 08:50:50AM -0600, Jarkko Hietaniemi wrote:
> > The Hairy Beast is rather sparse and/or vague on "use bytes".
> 
> Good! More freedom for us.
> 
> > 	In this case, you may put a use bytes declaration around
> > 	the byte-oriented code to force it to use byte semantics
> > 	even on strings marked as utf8 strings. 
> 
> This is what I think "use bytes" should do, and currently does.

In other words, you think the internal UTF-8 representation MUST
be exposed at all times when using 'use bytes'?

I do not see that in the Camel.  What I see is that "use bytes" should
guarantee that *if* *the* *code* *is* *byte-oriented* "use bytes"
shall cause "byte semantics" to be used for the code.  Now, what is
this "byte semantics"?  Does that mean that you shall see the internal
UTF-8 bytes encoding the characters, or that you shall see the values
of the characters modulo 256?  That we like to use { use bytes; eq }
in the test suite to double check that we got it right should not
be seen as very representative as to what people in normal day-to-day
use need or want to do.

> > In Camel the former is not used.  bytes::length() is used instead.
> > How did we come to think that 'use bytes; length' should be equal
> > to bytes::length?
> 
> The Camel also says that "use bytes" defines byte-oriented wrappers around
> functions like length; that would suggest to me that {use bytes; length} 
> is a byte-oriented length.

I quoted camel verbatim in my previous message, and it doesn't say that.
The Camel says byte::length is the wrapper, not { use bytes; length }.
The Camel says:

	use bytes();	# Load wrappers without importing byte semantics.
	...
	$charlen =        length("\x{ffff_ffff}");	# Returns 1.
	$bytelen = bytes::length("\x{ffff_ffff}");	# Returns 7.

Personally I think as a general rule it would be beneficial for Joe
and Jill Average Perl User if the number of spots where the internal
UTF-8-ness shines through were minimized.  pack/unpack, bytes::, and
Encode (and as derived from Encode, the I/O disciplines), those should
be more than enough.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About