On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote: >> * Do a substr operation by character and glyph > >The byte based is more useful. I have utf-8, and I want to substr it >to another utf-8. It is painful to convert it or linear search for >charaacter >position. I tend to agree. I currently use substr(), length() and read()/sysread(), based on a byte count. It's a mindset. Even if my encoding is in (16 bit) Unicode or UTF8, I still prefer to use bytes as my count base. Personally, I would prefer if it stayed this way, i.e. that the raw, non-OO keywords for the above kept counting in bytes. Why? Just imagine processing a binary file like a JPEG file, with embedded comments in (16-bit) Unicode. You wouldn't want Perl preventing you from treating this comment as Unicode, or having to process this entire binary file as Unicode, would you? I'd hate that. I want to remain in control. I would not mind if OO versions of these words were smarter, and did their count in characters for whatever character mode they're set to. For example, if $string is a UTF8 object, then $string->length may return a length in (UTF8) characters. -- Bart.Thread Previous | Thread Next