develooper Front page | perl.perl5.porters | Postings from November 2003

Re: [perl #24541] substr and utf8 and use bytes

Thread Previous | Thread Next
From:
Gisle Aas
Date:
November 23, 2003 00:11
Subject:
Re: [perl #24541] substr and utf8 and use bytes
Message ID:
lrr7zzld39.fsf@caliper.activestate.com
William R Ward (via RT) <perlbug-followup@perl.org> writes:

> We have a need to take a string containing utf8-encoded multibyte
> characters, and then, treating the string as bytes, extract a
> substring of N characters from it.
> 
> This is what "use bytes" was meant for, and it works great on Perl
> 5.6.1. 

"use bytes" is evil.  It exposes internal implementation details that
you are not supposed to know about and I'm not surprised the results
differ between versions of perl.

Just use Encode to clearly state your intents in a way that will work
whatever internal representation of wide chars Perl might have.
Something like this:

  substr(encode_utf8($string), $m, $n);

will do what you describe above.

Regards,
Gisle

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About