develooper Front page | perl.perl6.internals | Postings from June 2001

Re: The internal string API

Thread Previous | Thread Next
Dan Sugalski
June 20, 2001 09:50
Re: The internal string API
Message ID:
At 03:17 PM 6/20/2001 +0200, Bart Lateur wrote:
>On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote:
> >> * Do a substr operation by character and glyph
> >
> >The byte based is more useful. I have utf-8, and I want to substr it
> >to another utf-8. It is painful to convert it or linear search for
> >charaacter
> >position.
>I tend to agree.
>I currently use substr(), length() and read()/sysread(), based on a byte
>count. It's a mindset. Even if my encoding is in (16 bit) Unicode or
>UTF8, I still prefer to use bytes as my count base.

Sure, but that's at the language level. That's not where we're at.

>Personally, I would prefer if it stayed this way, i.e. that the raw,
>non-OO keywords for the above kept counting in bytes.

That one's Larry's call, and it's a language level thing anyway. The 
internals should give you access to lengths by byte and character at least, 
if not byte, character, and glyph.

>Why? Just imagine processing a binary file like a JPEG file, with
>embedded comments in (16-bit) Unicode. You wouldn't want Perl preventing
>you from treating this comment as Unicode, or having to process this
>entire binary file as Unicode, would you? I'd hate that. I want to
>remain in control.

Of course, but in that case you don't have UTF-8/16/32 data--you've binary 
data. The scalar with the info shouldn't be tagged as anything but binary.

>I would not mind if OO versions of these words were smarter, and did
>their count in characters for whatever character mode they're set to.
>For example, if $string is a UTF8 object, then $string->length may
>return a length in (UTF8) characters.

Hassle Damian about it--I expect he's got a proposal for this already. 
(Granted you might have to program in Klingon to get it... :)


--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai                         have teddy bears and even
                                      teddy bears get drunk

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About