develooper Front page | perl.perl5.porters | Postings from November 2003

Re: [perl #24541] substr and utf8 and use bytes

From:
Matt Sergeant
Date:
November 30, 2003 12:03
Subject:
Re: [perl #24541] substr and utf8 and use bytes
Message ID:
485BD548-2370-11D8-ABC3-000393DA6672@sergeant.org
On 23 Nov 2003, at 8:10, Gisle Aas wrote:

> William R Ward (via RT) <perlbug-followup@perl.org> writes:
>
>> We have a need to take a string containing utf8-encoded multibyte
>> characters, and then, treating the string as bytes, extract a
>> substring of N characters from it.
>>
>> This is what "use bytes" was meant for, and it works great on Perl
>> 5.6.1.
>
> "use bytes" is evil.  It exposes internal implementation details that
> you are not supposed to know about and I'm not surprised the results
> differ between versions of perl.
>
> Just use Encode to clearly state your intents in a way that will work
> whatever internal representation of wide chars Perl might have.
> Something like this:
>
>   substr(encode_utf8($string), $m, $n);
>
> will do what you describe above.

Encode isn't available for 5.6.1 though.

Matt.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About