Simon Cozens <simon@netthink.co.uk> writes: >On Mon, Feb 19, 2001 at 09:53:14PM -0500, Andrew Pimlott wrote: >> Let me say first that the reason all of my pseudo-code has been "OO >> crap" is that I'm trying to make it as painfully clear as I can >> think to (yes, emphasis on pain). Since nobody else has yet >> proposed any specific interfaces, > >What about the one we've got? > >> I'm saying you call an explicit function, eg to_utf8(), which gives >> you back a string such that if you say "substr $str, 0, 1", you get >> the first byte of the UTF-8 representation, and "length $str" is the >> length of the UTF-8 representation. Period. > >I wonder if this would work: > > sub to_utf8 { > use bytes > return $_[0] > } That only works of $_[0] is already UTF-8 encoded. use Encode qw(utf8_encode); sub to_utf8 { my $copy = $_[0]; utf8_encode($copy); # upgrade then, turn SvUTF8_off return $copy; } Does work. Personally I would rather the functions in Encode returned modified value rather than munging-in-place. The documented interface might be use Encode qw(from_to); sub to_utf8 { my $copy = $_[0]; my $length = from_to($copy,'Unicode','UTF-8'); return $copy; } It is unclear what value there is to the _perl_ API in returning the length. -- Nick Ing-Simmons <nik@tiuk.ti.com> Via, but not speaking for: Texas Instruments Ltd.