develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
September 28, 2011 04:50
Subject:
Re: [perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
20110928115017.GA23881@plum.flirble.org
On Tue, Sep 27, 2011 at 05:09:33PM -0600, Karl Williamson wrote:

> My understanding is that the the original reason for not doing the input 
> checks was performance.  Security is a far more important issue now, and 
> Nicholas has demonstrated code that does the parsing with a minimal 
> performance hit.

I had hoped to work on it over last Christmas, but everyone got ill and
my laptop power supply failed. So it didn't happen.

Whilst I have a feel for how to do it for UTF-8, I have no idea how do to
it for UTF-8 and UTF-EBCDIC, or at least "not break EBCDIC platforms" or
"make something hard to port to EBCDIC" as a side effect.

I also wasn't sure how to benchmark it properly, to be confident about the
magnitude of the performance change. I had thought that my test code should
be *more* efficient that the current code in utf8.c [it did less work], but
all the numbers I could collect showed it to be slightly slower. Hence why
I'm not trusting my intuition about what's happening.

It's also blocking on lack of feedback to bug #79960

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About