Front page | perl.perl5.porters |
Postings from September 2000
Encode, take three
Thread Next
From:
Jarkko Hietaniemi
Date:
September 12, 2000 11:56
Subject:
Encode, take three
Message ID:
20000912135652.A7124@chaos.wustl.edu
=head1 NAME
Encode - character encodings
=head2 TERMINOLOGY
byte a number in the range 0..255
char a character in the range 0..maxint (at least 2**32-1)
The marker [INTERNAL] marks Internal Implementation Details, in
general meant only for those who think they know what they are doing,
such details may change in future releases.
=head2 bytes
bytes_to_utf8(STRING)
The bytes in STRING are encoded in-place into UTF-8. Returns the new
size of STRING, or undef if there's a failure. [INTERNAL] Also the
UTF-8 flag is turned on.
utf8_to_bytes(STRING [, STRICT])
The UTF-8 in STRING is decoded in-place into bytes. Returns the new
size of STRING, or undef if there's a failure, or dies is STRICT is
true and the UTF-8 in STRING is malformed. [INTERNAL] The UTF-8 flag
of STRING is not checked.
=head2 chars
chars_to_utf8(STRING)
The chars in STRING are encoded in-place into UTF-8. The chars are
asssumed to be encodedin ISO 8859-1 (Latin 1) or US-ASCII. Returns
the new size of STRING, or undef if there's a failure. [INTERNAL]
Also the UTF-8 flag is turned on.
utf8_to_chars(STRING)
The UTF-8 in STRING is decoded in-place into chars. The chars are
asssumed to be in ISO 8859-1 (Latin 1) or US-ASCII. Returns the new
size of STRING, or undef if there's a failure. [INTERNAL] The UTF-8
flag of STRING is not checked.
utf8_to_chars_strict(STRING)
The UTF-8 in STRING is decoded in-place into chars. Returns the new
size of STRING, or dies if the UTF-8 in STRING is malformed.
[INTERNAL] The UTF-8 flag of STRING is not checked.
=head2 chars With Encoding
chars_to_utf8(STRING, ENCODING)
The chars in STRING encoded in ENCODING are recoded in-place into
UTF-8. Returns the new size of STRING, or undef if there's a failure.
[INTERNAL] Also the UTF-8 flag of STRING is turned on.
utf8_to_chars(STRING, ENCODING [, STRICT])
The UTF-8 in STRING is decoded in-place into chars encoded in
ENCODING. Returns the new size of STRING, or undef if there's a
failure, or dies if STRICT is true and the UTF-8 in STRING is
malformed. [INTERNAL] The UTF-8 flag of STRING is not checked.
from_to(STRING, FROM_ENCODING, TO_ENCODING [, STRICT])
The chars in STRING encoded in FROM_ENCODING are recoded in-place into
TO_ENCODING. Returns the new size of STRING, or undef if there's a
failure, or dies is STRICT is true and mapping between the encodings
is impossible.
=head2 Testing For UTF-8
is_utf8(STRING [, STRICT])
[INTERNAL] Test whether the UTF-8 flag is turned on in the STRING. In
other words, the data in STRING is B<not> checked for being
well-formed UTF-8. If STRICT is true, also checks the data in STRING
for being well-formed UTF-8. Returns true if successful, false
otherwise.
=head2 Toggling UTF-8-ness
on_utf8(STRING)
[INTERNAL] Turn on the UTF-8 flag in STRING. The data in
STRING is B<not> checked for being well-formed UTF-8. Do not
use unless you B<know> that the STRING is well-formed UTF-8.
Returns nothing.
off_utf8(STRING)
[INTERNAL] Turn off the UTF-8 flag in STRING. Do not use
frivolously. Returns nothing.
=head2 UTF-16 and UTF-32 Encodings
utf_to_utf(STRING, FROM, TO [, STRICT])
The data in STRING is converted from Universal Transfer Encoding FROM
to Universal Transfer Encoding TO. Both FROM and TO may be any of
the following:
'7' UTF-7
'8' UTF-8
'16be' UTF-16 big-endian
'16le' UTF-16 little-endian
'32be' UTF-32 big-endian
'32le' UTF-32 little-endian
UTF-16 is also known as UCS-2, 16 bit or 2-byte chunks, and UTF-32 as
UCS-4, 32-bit or 4-byte chunks. Returns the new size of STRING, or
undef is there's a failure, or dies if the STRICT is on and the FROM
is '8' and the UTF-8 in STRING is malformed. [INTERNAL] Even if
STRICT is true adnd FROM is '8' the UTF-8 flag of STRING is not
checked. If TO is '8' also the UTF-8 flag of STRING is turned on.
=cut
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
Thread Next
-
Encode, take three
by Jarkko Hietaniemi