develooper Front page | perl.perl5.porters | Postings from March 2012

Re: with malice aforethought (Re: Unicode cheatsheet for Perl)

Thread Previous | Thread Next
From:
Christian Hansen
Date:
March 1, 2012 17:59
Subject:
Re: with malice aforethought (Re: Unicode cheatsheet for Perl)
Message ID:
1A27A6BF-10DE-4A98-8FBD-1E25E9C6FBD7@mac.com

26 feb 2012 kl. 20:22 skrev Tom Christiansen:

> Christian Hansen <christian.hansen@mac.com> wrote
>   on Tue, 21 Feb 2012 02:07:08 +0100:
> 
>>>> I would love for this to happen, I have advocated this on #p5p several
>>>> times, but there is always the battle of  "backwards compatibility
>>>> disease". About 10 months ago I reported a security issue reading the
>>>> relaxed UTF-8 implementation (still undisclosed and still exploitable)
>>>> on the perl security mailing list.
> 
> Then we are currently in a security-through-obscurity situation, wherein
> only overall ignorance of an exploit "protects" us.  That's not protection;
> it's a vulnerability.  Would you estimate the vulnerability is severe
> enough for us to consider whether in this particular case we should
> consider issuing patches for old releases, like make a 5.12.5 or 5.10.2?

The vulnerability is present from early realises of 5.8.X (I haven't confirmed all perl releases, but the implementation is the same). The vulnerability makes it possible to smuggle through character strings (specially crafted for malicious purposes) using the :utf8 layer, which (in this case) bypass the perls regex engine (which fails the match/validation).

Wheter or not this is severe enough to patch older releases or not, I'll leave unsaid.

>>> There is absolutely no need to remain compatible with security-related
>>> bugs, and every reason not to.  Indeed, security is the only thing that
>>> we ever issue patches to releases that are past their end-of-life support.

I agree!

>> I lack the political skills to make this happen, but I'm more than willing
>> to provide the proper UTF-8 implementation for this (as defined by
>> Unicode/ISO/IEC 10646:2011) we could always discuss the need for the
>> invented meaning of relaxed. During my years as a professional programmer
>> for several high profile financial institutions in Sweden, I have only
>> encountered Ill-formed UTF8 through malicious attempts or clients that
>> thought that they where sending UTF-8 but using ISO-8959-1, thats my
>> experience, perhaps yours looks different?
> 
> My own experiences are finding the wrong encoding used by accident, not by
> malicious intent.  The situation you mention is therefore outside of my own
> experiences, which makes me all the more concerned about it.  I have gigabytes
> of corrupt data because of Java having the wrong defaults for what to do 
> with wrong encodings.  It was a design mistake, but they locked themselves
> into it forever and everyone keeps paying for that blunder.  Let's not
> mimic their bad decisions.  Let's fix ours.

Sounds like the CESU-8 issue, been there ;)

> The thing I don't want is to have to tell people that they cannot trust
> perl -C, that they cannot trust PERL_UNICODE, that they cannot trust use
> utf8, that that they cannot trust use open, that they cannot trust binmode,
> that they cannot trust :encoding(UTF-8), and that the only thing they can
> trust is laborious and error-prone manual encoding/decoding with FB_CROAK.

":encoding(UTF-8)" is currently whats offered by "core" perl, PerlIO::encoding provides a global to alter the behaviour of Encode, $PerlIO::encoding::fallback, I have not tested altering this global (using FB_CROAK, but I guess by looking at the internals that exceptions isn't expected).

> If that position is nonetheless correct, it drastically needs to be fixed.
> Christian, I don't know what political skills you allude to as needed to
> make this happen.  Political skills to achieve a consensus that backwards
> compatibility with previous behavior known to be wrong is undesirable?

It's quite easy, we need a Benevolent Dictator, such as Larry Wall. Someone who can make the though calls. Personally I think we should just implement Unicode as most people expect it to work (according to the Unicode standard).

What happened to the Perl mantra "Making Easy Things Easy and Hard Things Possible"?

> It seems to me that Python went through a transition where encoding-decoding
> errors changed from some sort of non-fatal to proper exceptions.  I don't know
> what sort of conniptions they experience there, since it's not a backwards-
> contemptible change.  But it doesn't have to be b-c, and probably shouldn't be.
> Jarkko is right.

What you are saying is correct, Python supports two different compile options UC2 and UCS4. Our case is worse, we support two different internal encodings depending on platform, on EBCDIC we use UTF-EBCDIC and on US-ASCIIplatforms we use a relaxed UTF-8 encoding.

Just to cut to the shit, there seem to be a group of people that likes EBCDIC, but so far we haven't heard from anyone with this facilities.

Why are we trying to support two differential encodings when we can barley support the proper one? 


> It's better to fix bugs than to document them, and it's better to document them
> than not.  Right now I'm very hazy on the real status of all this stuff, and I
> am very uncomfortable with the idea of relentlessly charginWe g ahead toward a
> release like a freight train with no brakes.

We should  

> Absolutely nothing depends upon any particular release date, but quite a bit
> depends on correct behavior, especially if it is security-related.  I know which
> one of those *I* consider immeasurably more important, but Aristotle appears to
> be of the opposite opinion.  Is this the "poltical will" problem you mention?

Partly. 
But you have the power to change the track!

MvH
chansen


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About