develooper Front page | perl.perl5.porters | Postings from March 2001

Re: use bytes; - what does/should it mean?

Thread Previous | Thread Next
Jarkko Hietaniemi
March 12, 2001 06:51
Re: use bytes; - what does/should it mean?
Message ID:
On Mon, Mar 12, 2001 at 02:23:10PM +0000, Simon Cozens wrote:
> On Mon, Mar 12, 2001 at 12:56:50PM +0000, Nick Ing-Simmons wrote:
> > So at the risk of opening the flood-gates - what does p5p think 
> > use bytes should do? 
> At risk of being rude, haven't we done this *enough* times already now? Are
> you merely going to keep asking this question over and over until you get the
> answer you want?

Let's calm down, shall we.

I don't think Nick is suggesting tabling the whole issue, it's mostly
fine-tuning the corners he's suggesting.

If you were going to suggest that the "use bytes" Shall Be As In The
Camel III, I venture to suggest that this may not be enough.
The Hairy Beast is rather sparse and/or vague on "use bytes".

	Goal #1: Old byte-oriented programs should not spontaneously
	breaks on the old byte-oriented data they used to work on.

	Goal #2: Old byte-oriented programs should magically start
	working on the new character-oriented data when appropriate.


	Sometime you want to mix code that understands character
	semantics with code that has to run with byte semantics,
	such as I/O code that reads or writes fixed-size blocks.
	In this case, you may put a use bytes declaration around
	the byte-oriented code to force it to use byte semantics
	even on strings marked as utf8 strings.  You are then
	responsible for any necessary conversions.  But it's a way
	of enforcing a stricter local reading of Goal #1, at the
	expense of a looser global meaning of Goal #2.


	[this paragraph included for contrast]
	The utf8 pragma is primarily a compatibility device that
	enable recognition of UTF-8 in literals and identifiers
	encountered by thye parser.  It may also be used for
	enabling some of the more experimental Unicode support
	features.  Our long-term goal is to turn the utf8 pragma
	into a no-op.

	The use bytes pragma will never turn into a no-op.
	Not only it is necessary for byte-oriented code, but
	it also has the side effect of defining byte-oriented
	wrappers around certain functions for use outsid the scope
	of use bytes.  As of this writing, the only defined wrapper
	is for length, but there are likely to be more as time
	goes by. ...

Now, please go back and see what Nick said.

How I read it was: are we supposed to leak out the internal
UTF-8-based representation, and if so, exactly when and where?
I think that's a perfectly valid question, even now.

As far as I have understood, we are in amazingly good agreement in our
abstract character model.  Now we just have to draw the line at where
and how we allow poking holes in that abstraction.

Please look at the tests failing if Nick's DO_UTF8() tweak is used,
you'll see that it's mostly

	use bytes; length


	use bytes; eq

that break.

In Camel the former is not used.  bytes::length() is used instead.
How did we come to think that 'use bytes; length' should be equal
to bytes::length?

In Camel the latter is not used at all.  How did we start using that?
I admit that you can easily derive that interpretation from "sometimes
you want to mix code ...", and having an lexically scoped operator
like bytes::eq would be a somewhat new concept :-) But the question
is, do we want to expose the UTF-8ness exactly like this?  Somehow,
yes, but using your standard eq?  Why not a function bytes::eq()
for those that need to grovel at that level?

Camel III is the best guide we can go by -- but it's not always
as clear as we would want it to be.

$jhi++; #
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About