develooper Front page | perl.perl5.porters | Postings from February 2001

Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2)

Thread Previous | Thread Next
From:
Ilya Zakharevich
Date:
February 16, 2001 15:27
Subject:
Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2)
Message ID:
20010216182727.B21880@math.ohio-state.edu
On Fri, Feb 16, 2001 at 04:17:11PM -0600, Jarkko Hietaniemi wrote:
> Ilya, I'm starting to think that we are so far from agreement in our
> Unicode and locale issues that we generating more heat than light.

What locale issues?

> You have some strong obvious objections to both, I as the pumpkin am
> defending the current model and implementation because it seems to
> work, and agree with what the Camel III says about the matter.  If you
> don't like what the Camel says, I can't help you.

If you want to *keep* broken the things which are broken, nobody can
help Perl.  And the louder you declare this, the less people you will
find willing to fix *other* broken aspects.

> I have more than enough to do in applying patches and
> closing/fixing smaller bugs, I do not have the time to redesign all
> that your seem to dislike.

I'm not a little bit surprised.  Given the amount of time you needed
to put in rewriting REx code instead of adding 10 lines of code for
switching the struct regexp*.

> > The &|^ disaster should have taught us this.
> 
> The &|^ disaster that is so well-known.  What are you talking about?

  $modified = $var | $flag;

> > Of course, but here we discuss the internal operations, not the I/O.
> > Each I/O channel (including system-calls) needs to be marked by the
> > translation used.
> 
> Including cases where one single I/O channel needs to carry both
> 8-but data and UTF-8.

Irrelevant.  You cannot mix bytes and utf8.  You can mix bytes and
UTF-8 translation of Perl strings (since this translation consists of
characters in 0..255 range).

> Wrong.  There is no such thing in the EBCDIC implementations of Perl today.
> If you are talking something new, your are not talking about 'use locale'.

AFAIU, you err (but I may be wrong).  What I'm discussing is the
situation of today, or 2 years ago (as far as locales are concerned).

You imply that there is a difference between EBCDIC and switching on
locale.  Which one?  [And do not consider the difference between
0..127 and 128..255 relevant.  AFAIR, Perl has no knowledge of this
difference.]

> The current implementation of locales in Perl is tightly tied to the
> (regrettably non-standard and broken) implementation of locales in
> vendors' lib(c)s.  The locale implementation you seem to be
> referring to does not exist, not supported by the vendors nor
> implemented in the Perl, so I have hard time commenting on
> what you are saying.

I'm not discussing details of implementation, I'm discussing the
semantic, i.e., the rationale behind the implementation.  I would
think that all that 'use locale' is doing *semantically* is it is
switching on different upcase/lowercase rules and tables of what is a
digit etc (it may also switch sorting rules, which is indeed not
easilty covered by the "cultural information" carpet).

> It does not, but the problem lies in the part "might have utf8 mark if
> byte-encoded, but does not contain chars above 127".  How does that
> utf8 mark get in there, 

Everything touched by high-chars is marked as "may have high chars" -
unless explicitely cleared, as in pack 'C*'.

As I wrote, $a = substr "\x{101}", 1;

>			  and how does get it clered?

Most of the time, there is no need to.  Or some kind of pack 'C*' if
the need arises.

>						       Do we set/clear
> on all input strings?

Depending on the translation registered for the input channel.

> We cannot set it if there are any high-bit bytes, and we must clear
> it if the string gets modified and such high-bit bytes are
> introduced.

Cannot happen.

Ilya

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About