develooper Front page | perl.perl5.porters | Postings from February 2001

Re: Perl-Unicode fundamentals

Ilya Zakharevich
February 21, 2001 15:35
Re: Perl-Unicode fundamentals
Message ID:
On Wed, Feb 21, 2001 at 03:06:49PM -0600, Jarkko Hietaniemi wrote:

> You seem to misinterpret everything I try to say about the matter

To calm yourself down, change "misinterpret" to "misunderstand".  I
explicitly asked you several times to clarify things which I find so
doubtful, to no avail.

> I have several times asked you to supply either the list of things to do
> or the list of things you seem as broken to be removed.  You have failed
> to supply such a list.

I supplied this list many times already.  Cannot make it definite now,
just some things:

  x) h2xs not producing a backward-compatible code by default;

  x) v-thingies which are neither numbers nor strings;

  x) "support for integer operations" - at least it needs some more
     thinking about; (this is the situation now - the real problem
     with p5p is that it was *included*)

  x) all non-transparent operations w.r.t. byte/utf8 duality in the core;

  x) unneeded obfuscation of the REx engine;

  x) qu// operator which in transparent world is equivalent to qq//
     (if I understood what it makes correct).

> You keep on froth-mouthing about "transparency" without ever giving a
> clear definition of it

Are not you froth-mouthing now?  You did see my original proposal on
the Unicode, right?  What is the reason for all this sudden animosity?

Transparency is a very simple concept.  

  An operation is byte/utf8 transparent if it produces the "same"
  output given the "same" arguments.

Here two strings are the "same" if they contain the same characters.
[Characters are numbers in 0..FFFFFFFFFFFFFFFFFF range, but this is
probably not that important for this discussion.  Take them to be in
the 0..FFFF range if this can make things simpler.]

> while at the same time accusing me of fatally breaking that
> transparency, without listing item by item to things that are not
> "transparent".

I have no idea which things are not "transparent".  The Unicode stuff
is not documented (or at least I did not see any documentation), so I
have no way to figure it out what works how.

What I know is that were there *plans* to make things transparent,
there would be no qu// operator (equivalent to )

> As Nick translated to you, we now have transparency: Unicode works.

If we had it now, how would you explain
<>?  Why the rush to fix things
which are not broken?

> Now you are telling me that I both don't understand Unicode and that
> I lie.

Well, this "lie" thing is completely new to me.  Sorry if I said
something which lead you to this impression.  About understanding
Unicode: either you do not understand it, or I+Nick+Graham do not
understand it.  I'm quite ready to admit that I may be confused, but
what was your answer to all my pleas to unconfuse me?

> You seem to be all paniced about qu, how I sneakily floated it to the
> language.

Yes, very much.  Putting it in says volumes about how you understand
Unicode - and how "we" do not understand this understanding of your.

> You keep on talking about locales and Unicode and EBCDIC, with some
> magic "locale-think" as the solution, without any technical details
> that would make any sense, not at least in the context of the current
> locale implementation.  What 'use locale' currently affects are the
> following things:
> (1) definition of some character classes like \w are changed,
>     similarly for uc() et al

This is the change of the "cultural info" which I mention.

> (2) the collation order of strings gets changed

Here you use buzzwords which I think are not relevant to the operation
of Perl.  I think it is better to restate it as "the results of cmp
and related operations change to a system-defined locale-cognizant
order", right?

> (3) the decimal number separator (".", ",", ..) used in output
>     and input is changed, similarly date format may change (strftime)

I thought this behaviour changes from one version of Perl to another,
so may be considered as an implementation detail.

> (4) the error messages given by various libraries may be changed
>     (to be in languages other than English), so "$!" may change

This is outside of the control of Perl.  The string value of $! was
never defined, and may fluctuate basing on other parameters as well
(compiler vendor etc).

The only effects which I want people to concentrate their attention to
are (1) and (2).  After having done this, I also through away (2).  ;-)
[It is not very hard to bring (2) into the discussion later, but let
discuss simple things first.]

Only *then*, after this great simplification, the effects of 'use
locale' become parallel to the other Larry's idea: that it may make
sense to let Perl use other "cultural info" tables than unicode.  I note
that we already do it on EBCDIC and after 'use locale' - when working
in the 0..255 range.

My new proposal *defines* the interaction of the effects of (1) and
Unicode by defining an *extension* of the cultural info modification
done in 0..255 range to the whole 0..whatever range.

Sorry that I bombastized what 'use locale' does, but the "other 2"
effects of 'use locale' are not (?) well defined, so I instictively
omited them...  ;-)

> Why I keep repeating that this has nothing to do with Unicode or

Who cares?  All we need is to *deduce* which changes happened in the
0..255 range.  And we already do it.

> Notice also that there is no correlation between or locale and a
> character codeset or its encoding.  You can have "fr" or "es_AR" and
> you have no way of knowing whether they are using ISO 8859-1, -15, or
> Unicode, if Unicode, whether they are using UTF-8, UTF-16, or UTF-32.

My proposal on the behaviour of the "cultural info" table does not
need this information.  This information is vital in *other* aspects
of Perl operations, like i/o filters, but it is needed for these
operations without the proposal as well.  So I see no reason to bring
this murky topic into the discussion.

Ilya Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About