develooper Front page | perl.perl5.porters | Postings from October 2018

Re: [perl #133588] Symbol for 'micro' is erroneously uppercased toGreek 'MU'.

Thread Previous | Thread Next
From:
"G.W. Haywood" via perl5-porters
Date:
October 20, 2018 06:31
Subject:
Re: [perl #133588] Symbol for 'micro' is erroneously uppercased toGreek 'MU'.
Message ID:
alpine.DEB.2.11.1810191029030.19099@mail6.jubileegroup.co.uk
Hi there,

Thanks for the reply, very much appreciated.

On Thu, 18 Oct 2018, karl williamson via RT wrote:

> On 10/18/18 1:07 AM, Bo Lindbergh wrote:
>> Quoth G.W. Haywood via perl5-porters:
>>>
>>> If you're saying that Perl does this because that's what the Unicode
>>> rules say it must do then I can understand the dilemma, but AFAICT
>>> it's the only symbol to be abused this way and at the very least it
>>> seems to be a violation of the principle of least astonishment.
>
> Yes we are saying that we are obeying Unicode rules ...

Understood.  We'll just have to live with the astonishment.

> Before computers, there was no need for semantic compatibility ...

Quite so.  I was, unfortunately, there at the time. :/  I still have
letters from my bank which now look like they were produced by my
great-nephew for his play group.

> The Greek question mark looks like ";", and I consider it a defect
> in Unicode that whenever you encounter one, the program is supposed
> to immediately turn it into the semi-colon.

I wasn't aware of that - and that particular issue appears to be much
more serious and deeply-rooted than the one I've reported.

> .. you can use https://www.unicode.org/reporting.html but I doubt
> the response you will receive would be very enlightening.

In view of your view, perhaps I'll let things lie.  Seems like I'd be
raking over old ground to little purpose, and there are other fish to
fry around here.

>> The abbreviation of the micro- prefix is an ordinary lowercase Greek
>> letter, just as the abbreviation of the milli- prefix is an ordinary
>> lowercase Latin letter.  The duplicate encoding exists only for historical
>> compatibility.

And there's me thinking that the duplicate encoding was so you can use
the one that's appropriate!

> There are more similar cases, e.g. U+212A KELVIN SIGN.

I suppose then I'd need to complain about what lc() does to it, but I
haven't tried it... :)

> http://unicode.org/reports/tr25/tr25-6.html#_Toc25

Hmmm.  Looks like the only thing that's unequivocal is a pair of angle
brackets.  Even if I cared about them, I can't see them being affected
by uc() nor lc().

Thanks once again for the clarifications.  Oddly enough the impetus
came from a growing number of spammers who use Portugese, not Greek,
and our mishandling of ISO-8859-1.  A milter had been producing the
odd less than intuitive result.  We'll just deal with it differently.
Thankfully the Greek-speaking spammers seem less troublesome at the
moment, but I guess we'd better think about them too.

-- 

73,
Ged.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About