develooper Front page | perl.perl6.internals | Postings from June 2001

Re: The internal string API

Thread Previous | Thread Next
From:
Bart Lateur
Date:
June 20, 2001 06:15
Subject:
Re: The internal string API
Message ID:
l681jto27ehmka6phb51lrabbq3536g2ri@4ax.com
On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote:

>> * Do a substr operation by character and glyph
>
>The byte based is more useful. I have utf-8, and I want to substr it
>to another utf-8. It is painful to convert it or linear search for
>charaacter
>position.

I tend to agree.

I currently use substr(), length() and read()/sysread(), based on a byte
count. It's a mindset. Even if my encoding is in (16 bit) Unicode or
UTF8, I still prefer to use bytes as my count base.

Personally, I would prefer if it stayed this way, i.e. that the raw,
non-OO keywords for the above kept counting in bytes.

Why? Just imagine processing a binary file like a JPEG file, with
embedded comments in (16-bit) Unicode. You wouldn't want Perl preventing
you from treating this comment as Unicode, or having to process this
entire binary file as Unicode, would you? I'd hate that. I want to
remain in control.

I would not mind if OO versions of these words were smarter, and did
their count in characters for whatever character mode they're set to.
For example, if $string is a UTF8 object, then $string->length may
return a length in (UTF8) characters.

-- 
	Bart.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About