develooper Front page | perl.perl5.porters | Postings from August 2012

perl #114602] utf8 problems (still)...

Thread Next
From:
Linda W
Date:
August 31, 2012 00:29
Subject:
perl #114602] utf8 problems (still)...
Message ID:
504067C9.8070400@tlinx.org


Leon Timmermans wrote:
>>> When three or four clever people on this list tell you ....  
>> yeah, I'm well aware this side of the problem:
>>
>>   http://en.wikipedia.org/wiki/Illusory_superiority
>>
>> http://www.newyorker.com/online/blogs/frontal-cortex/2012/06/daniel-kahneman-bias-studies.html
>>
>> Ability to learn and correct is inversely proportional to self perceived
>> cleverness and knowledge.
>
> Insulting people who are putting an effort into helping you is
> exceedingly aggregating. This habit of yours is downright abusive and
> simply not appropriate on this list, or any other place for that
> matter.
>   
----

 It wasn't intended as an insult.  It's was presenting data backed by 
evidence. It really wasn't intended personally -- it describes a group 
of people who have certain characteristics -- most certainly those who 
have a high self-perception of their ability, talent and/or knowledge.  
It describes a tendency of *EVEN* those that ARE bright, to be blind to 
their blind spots -- but it's much worse as the discrepancy increases 
with what the person actually knows and what they think they know.   

I'm sorry if you took it personally - it wasn't intended as such.

For myself,  I am aware I know only a little about multiple things, 
however,  in my experience, the number of things and the amount usually 
is enough to put most people to sleep if I go into it too much.  But I 
would definitely NOT think nor claim to know more about someone who is 
truly a master of their field (usually I find that they are ones that 
don't claim such -- but you find out in talking to them.   Those that 
repeatedly tell you -- are more often trying to convince you for 
purposes of some point or argument).

Now anyone can take that as an insult if they want, or not.  It's not 
personally directed at anyone.  But if it offends you, I would suggest 
that perhaps it is touching on an uncomfortable truth.

> Stop playing the victim of a conspiracy, start taking responsibility
> for your own actions.
>   
-----

  So far Nicholas Clark has been the only person who'd say has been on 
the level and truthful.

That doesn't mean I necessarily agree with his stance, but he is someone 
with whom I **could** discuss the problem that I was trying to 
show/demonstrate/discuss for the past week or more (months if you count 
earlier discussions).  

That he would 'get' what I was talking about in 1 response, -- that's  
someone who is is able to communicate (bidirectionally) me feeling like 
someone is playing word games.

    You (Leon) at least got what the essential function of 'P' with out 
me feeling like I was talking to people who had no clue of programming 
or perl or had it bother them so much they couldn't understand the point 
I was trying to make.   I felt the details and exact implementation of P 
would simply be another side track for discussion about it's internals 
when they had nothing to do with the point I was making.  

And folks, given the directions this has gone off on -- when Nick summed 
up the issue in 1 note -- you know, I'm right.   Anything and everything was
picked at about how I said this or not  having crossed a 't' or dotted 
an 'i'.   

Eric went off on E9 vs. 0xe9... and my point wasn't about my thinking I 
was writing "E9" vs. "\xe9", but that I was using 'chr(0xe9)... which I 
would expect to produce different output than if I did a
printf("%c", 0xe9)  (cf. printf("%c", chr(0xe9)) ).

   I could have posted post the module, but I felt it would detract 
focus from the essential issue of perl -- instead of doing something 
useful with output -- throws up an warning (and maybe even an error 
someday). 

Instead of throwing a warning, on a wide char and then corrupting 
output, it could do what it does for every OTHER wide char not in the 
0x80-0xff range -- and put it it's unicode representation.

Whereas -- I knowingly,  **for this example*** didn't set UTF-8  -- this 
isn't a problem in most of my programs...
   BUT it comes up frequently enough because there are many gotches in 
perl related to this problem.


Having perl knowingly do the wrong thing (as it does now),or having it 
die altogether when it has a good idea of what the user wants  -- cannot 
be called as something "serving backwards compatibility".

It's a warning that you want to make an error?   How can that be backwards
compatible with any code?

I assert that people refer back to basic perl design philosophy: DWIM.

    If this was cobol or fortran, I'd expect it to stay broken on 
principle / standard.  But being 'hard assed' and deliberately throwing
errors and warnings on output AND corrupting it to make sure they are
screwed --  rather than following perl's internal design that would normally
auto-convert to the right format.

Compare:

   my $a="42"; $b="43"; my $c=$a+$b; print "c=$c";
c=85

Do you get a warning for string to integer to string conversion?

It happens automatically.

Why generate a warning when printing a wide char out to a terminal -- why
not assume the user has a terminal that prints in unicode and just print it
like you do with the string?   You don't print "Warning integer 
encountered in string" or "strings encountered in addition".  

"Perl is about helping you get from here to there with minimum fuss and 
maximum enjoyment."   What about generating warnings and then converting 
output inconsistently is either?

"...One of the things that changes is how the community thinks Perl 
should behave by **default** [emphasis mine].  (This is in conflict with 
the desire for Perl to behae as it always did.)".... so added was 
strict, threads came and morphed ... "Other things have come or gone.  
Some experiments didn't work out and we took them out of Perl, replacing 
them with other experiments.  Pseudohashes, for instance..."...   (Camel)

The point is perl changes and changing to default to Unicode would be a 
move toward the future that wouldn't hurt compatibility -- as it's 
already an "illegal case".   I simply propose to make it put out UTF-8 
output and
be consistent ACROSS it's characters set -- because right now, it throws out
a warning and only converts wide chars <0x100 (& >0x7f) to binary -- the 
rest IT's ALREADY PUTTING OUT IN UTF-8.   So why the "deadzone" in 0x7f-xff?
It doesn't work without warnings in any program today.  If some chase their
kneejerk reactions, it won't work at all -- so it CAN'T be for 
compatibility.

What is the point?



to be something that tried to "Do what you meant" -- it was a stated 
design philosophy.



This isn't about compatibility -- as it already warns anyone who would 
try to use
the feature set the way I am describing it wouldn't be able to without 
suppressing warnings.

emerphq wrote:

On 30 August 2012 11:44, Linda W <perl-diddler@tlinx.org> wrote:

> demerphq wrote:
>   
>> Let me say this once again, there is no bug.
>>
>>     
> By bug you mean there is nothing there that isn't "intentional".  I'm
> not disputing that.
>
>
>
>   
>> And stop arguing with people that are trying to help you.
>>
>>     
> By argue you mean try to get you to understand something that others have
> said they don't understand?
>
> By help me, what do you mean?   Do you mean they are helping me to
> try to get perl to process characters uniformly on output by default?
>
> As the fact that perl does not do that by default is my problem.  It doesn't
> generate UTF-8 for characters, it  doesn't generate binary for charact

demerphq wrote:

> C) Unless you tell it otherwise If you ask Perl to output a string
which is flagged as "unicode" and that string contains "wide
characters" which would require it to output octets whose values do
not correspond 1 to 1 with the codepoints of the unicode string it
warns that it is doing so.

----
Exactly.


D) absent requests to do otherwise chr() outputs a binary string
containing one octet for the range 0..255 and a unicode string for
codepoints of 256 and above. The internal representation of this
codepoint will be in UTF8 and will be multi-octet.


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About