develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

February 19, 2001 17:34
Re: The State of The Unicode
Message ID:
On Mon, Feb 19, 2001 at 07:14:07PM -0600, Jarkko Hietaniemi wrote:
> Protocols: if all I know is that my output is 500 Unicode characters
> long, how am I to print out Content-Length?

As I said to abigail, I would love a concrete explanation of what
you have in mind.  In particular, what is your mechanism for
ensuring that perl is representing $output as utf8?

Let me show you what I would fancy (modulo syntax, which I haven't
been following):

    $eh = new EncodingHandler 'UTF-8';
    $out = new IO::Socket {
        output_discipline => $eh->output_discipline, ... };
    print $out "Content-length: " . $eh->length($output);
    print $out $output;

Let me also (horror of horrors[1]) tell you what you would probably
do in Java:

    OutputStream o;
    String output;
    byte[] output_bytes = output.getBytes("UTF-8");
    String header = "Content-length: " + output_bytes.length + "\n\n";

(Note my imagined Perl interface didn't require converting the whole
string to utf8 at once.)

> If I have a scalar which according to length() is 10E7 Unicode characters,
> will it fit within my disk quota of which I have 20E7 bytes left?

Again, it depends on the output discipline you will use to get it on
disk, and thus should be part of whatever library you use for output
disciplines.  Why do you think it should be otherwise?

> Any encoding which hasn't yet been encoded in Encode?

In that case, how did it ever get internally represented as utf8?  I
would expect in this case that bytes of the string would end up as
Perl characters, just like with non-Unicode perl.


[1] Contrary to what you might guess, I mean that. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About