Front page | perl.perl5.porters |
Postings from September 2000
Encode, take four
Thread Next
From:
Jarkko Hietaniemi
Date:
September 12, 2000 13:59
Subject:
Encode, take four
Message ID:
20000912155915.A16457@chaos.wustl.edu
=head1 NAME
Encode - character encodings
=head2 TERMINOLOGY
byte a B<number> in the range 0..255
char a B<character> in the range 0..maxint (at least 2**32-1)
The marker [INTERNAL] marks Internal Implementation Details, in
general meant only for those who think they know what they are doing,
such details may change in future releases.
=head2 bytes
bytes_to_utf8(STRING)
The bytes in STRING are encoded in-place into UTF-8. Returns the new
size of STRING, or undef if there's a failure. [INTERNAL] Also the
UTF-8 flag is turned on.
utf8_to_bytes(STRING [, STRICT])
The UTF-8 in STRING is decoded in-place into bytes. Returns the new
size of STRING, or undef if there's a failure, or dies is STRICT is
true and the UTF-8 in STRING is malformed. [INTERNAL] The UTF-8 flag
of STRING is not checked.
=head2 chars
chars_to_utf8(STRING [, STRICT])
The chars in STRING are encoded in-place into UTF-8. The chars are
assumed to be encoded in US-ASCII. Returns the new size of STRING, or
undef if there's a failure, or dies if there are characters > 127.
[INTERNAL] Also the UTF-8 flag of STRING is turned on.
utf8_to_chars(STRING [, STRICT])
The UTF-8 in STRING is decoded in-place into chars. The chars are
assumed to be encoded in US-ASCII. Returns the new size of STRING,
or undef if there's a failure, or dies if there are characters > 127.
[INTERNAL] The UTF-8 flag of STRING is not checked.
utf8_to_chars_strict(STRING)
The UTF-8 in STRING is decoded in-place into chars. Returns the new
size of STRING, or dies if the UTF-8 in STRING is malformed. Note
that this interface is exceptionally named since a two-argument
utf8_to_chars() has different semantics. [INTERNAL] The UTF-8 flag of
STRING is not checked.
=head2 chars With Encoding
chars_to_utf8(STRING, ENCODING)
The chars in STRING encoded in ENCODING are recoded in-place into
UTF-8. Returns the new size of STRING, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag of STRING is turned on.
utf8_to_chars(STRING, ENCODING [, STRICT])
The UTF-8 in STRING is decoded in-place into chars encoded in
ENCODING. Returns the new size of STRING, or undef if there's a
failure, or dies if STRICT is true and the UTF-8 in STRING is
malformed. [INTERNAL] The UTF-8 flag of STRING is not checked.
from_to(STRING, FROM_ENCODING, TO_ENCODING [, STRICT])
The chars in STRING encoded in FROM_ENCODING are recoded in-place into
TO_ENCODING. Returns the new size of STRING, or undef if there's a
failure, or dies is STRICT is true and a mapping between the encodings
is impossible.
=head2 Testing For UTF-8
is_utf8(STRING [, STRICT])
[INTERNAL] Test whether the UTF-8 flag is turned on in the STRING.
If STRICT is true, also checks the data in STRING for being
well-formed UTF-8. Returns true if successful, false otherwise.
=head2 Toggling UTF-8-ness
on_utf8(STRING)
[INTERNAL] Turn on the UTF-8 flag in STRING. The data in STRING is
B<not> checked for being well-formed UTF-8. Do not use unless you
B<know> that the STRING is well-formed UTF-8. Returns the previous
state of the UTF-8 flag (so please don't test for I<not> success or
failure).
off_utf8(STRING)
[INTERNAL] Turn off the UTF-8 flag in STRING. Do not use frivolously.
Returns the previous state of the UTF-8 flag (so please don't test for
I<not> success or failure).
=head2 UTF-16 and UTF-32 Encodings
utf_to_utf(STRING, FROM, TO [, STRICT])
The data in STRING is converted from Unicode Transfer Encoding FROM to
Unicode Transfer Encoding TO. Both FROM and TO may be any of the
following tags (case-insensitive)':
tag meaning
'7' UTF-7
'8' UTF-8
'16be' UTF-16 big-endian
'16le' UTF-16 little-endian
'16ne' UTF-16 native-endian
'32be' UTF-32 big-endian
'32le' UTF-32 little-endian
'32ne' UTF-32 native-endian
UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks, and UTF-32 as
UCS-4, 32-bit or 4-byte chunks. Returns the new size of STRING, or
undef is there's a failure, or dies if the STRICT is on and the FROM
is '8' and the UTF-8 in STRING is malformed. [INTERNAL] Even if
STRICT is true and FROM is '8' the UTF-8 flag of STRING is not
checked. If TO is '8' also the UTF-8 flag of STRING is turned on.
Identical FROM and TO are fine.
=cut
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
Thread Next
-
Encode, take four
by Jarkko Hietaniemi