develooper Front page | perl.perl5.porters | Postings from April 2012

Re: unicode question

Thread Previous | Thread Next
From:
Linda W
Date:
April 26, 2012 19:02
Subject:
Re: unicode question
Message ID:
4F99FDC9.8040909@tlinx.org
Zefram wrote:
> Linda W wrote:
>   
>>   Because historically STDIN/STDOUT were not always 8-bit safe?
>>     
>
> Unix files descriptors have always been 8-bit safe, and furthermore
> binary-safe.  (9-bit binary safe in at least one implementation.)
>   
I didn't say unix file descriptors -- argue the straw man will ya?

>   
>> especially
>> over telnet connections?
>>     
>
> TCP is 8-bit safe, and so is telnet.  telnet is not by default
> *binary*-safe, however, due to its intended purpose as a virtual terminal.
>   
----
    We are talking 8-bit safe in the context of binary, so please don't
confuse people by talking between the lines.


> Actual terminals are never properly binary-safe (actually the concept
> doesn't properly apply), and many will do something funny with the top
> bit of 8-bit bytes.
>   
----
    We are talking I/O to terminals -- not files.

>   
>>   I don't know if that's the case any more, but even today, there
>> are some mail handlers that don't do well with 8-bit encoding
>>     
>
> Mail transport has historically been neither binary-safe nor 8-bit safe.
> Today it's commonly 8-bit safe, but that doesn't really matter, because
> MIME makes mail messages binary-safe even on old transport infrastructure.
>   
====
    Um...yeah. that's sorta what I said, though not as detailed.

>   
>> you need to 7-bit encode things to send such things "through pipes".
>>
>>   Pipes were designed manipulate I/O to or from a terminal or a file
>> containing terminal displayable dat
>>     
>
> Rubbish.  Unix pipes were designed to handle any data, and as such have
> always been 8-bit (or 9-bit) binary safe.
>   
----
    Well double rubbish...yeah, you are right, but I'm thinking of pipes in
a different sense -- not unix pipes as they are now, but as they were 
originally
created in unix as a way to chain together various filters that 
processed text, that DID make assumptions about the data being 
TEXTual... so it wasn't safe to blindly pipe binary data through random 
programs.  Also -- off hand, I don't
know of any linux pipes that handle a 9 bit data type... but I'm sure 
there have
been all sorts of word sizes...
>   
>> happen on some platforms if you pipe random binary text into any
>> program that does I/O
>> to STDIN/STDOUT.
>>     
>
> Unix utility programs have historically often not been binary-safe.
> That's a bug in those programs, not present in modern versions, and not
> a feature of the OS.
>   
----
    This is the whole point -- expecting your STDIO/STDOUT to be binary
safe is not logical -- it's not done.

    Perl wasn't written as a binary processor.  It was written as
a super "shell+awk+grep+sed+tr" all rolled into one -- those all processed
TEXT... None of those were designed for binary (doesn't mean they might not
be used for such), but Perl was designed for text files.

Text today is Unicode in most environments (UTF-8 in *nix, and usually UCS-2
in windows (though some of their progs really support unicode> V2.0, not 
many,
Unicode is at version 6.1, and MS support is somewhere around 3.5 at most...
Idiots... they build roadblocks into their SW with they could just 
display the
decoded chars by following formula...but they block chars they haven't
approved of yet (not Unicode---MS!)


>   
>>   Try using cat /etc/bash sometime and see how well your terminal
>> likes that.
>>     
>
> Terminals again; yes, they're not binary-safe.  Terminals, as the
> name suggests, are not intended to act as transparent pipes to convey
> arbitrary data.
>   
> You grossly misunderstand Unix by supposing that stdin and stdout
> necessarily refer to terminals, or that any other part of the I/O
> infrastructure is specific to terminals.
>   
-----
    You grossly misunderstand the problem.

    We are talking usage of perl to process material on STDIN/STDOUT at 
a terminal in an environment with a standard locale set.

Any other stuff about unix pipes is you confusing the issue..

    I know you can transfer binary data over pipes -- but those are pipes
that are not connected to terminals (usually)... sockets, named pipes, etc.
all handle binary data.... But UTF-8 is an encoding designed for humans 
to look
at -- not machines.   We are discussing perls ability to decipher and/or 
encode
data to be read directly by humans -- not binary data.
Please don't confuse the issue.


>>   Besides, doesn't perl do default text processing on STDIN/OUT?
>>     
>
> If I understand you correctly, "default text processing" has historically
> been null on Unix.
You don't understand -- Perl != unix.  Unix != Perl.   I'm sure Larry 
would glow at your equating the two, but they aren't the same.


>   A stream can be used equally well for ...
>   
---
    Looking to fabricate arguments?   Bored?  Troll much?







Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About