develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

From:
mark
Date:
February 7, 2007 16:07
Subject:
Re: Future Perl development
Message ID:
20070208000653.GB7800@mark.mielke.cc
On Wed, Feb 07, 2007 at 03:32:44PM -0800, Jan Dubois wrote:
> >I mean heck, utf8 was a kudge worked out on a napkin to make it
> >possible to store unicode filenames in a unix style filesystem. (utf8
> >has the property that no encoding of a high codepoint contains any
> >special character used by a unix filesystem) WTF would we use a kludge
> >as our primary internal representation when there are better
> >representations to use? Especially when you consider the performance
> >impact of doing so (use unicode and watch the regex engine get much
> >sloooooweeeeeerrrrrrr.)
> This is probably the main reason some big enterprise users stick with
> Perl 5.6.1.  I've seen several companies approach ActiveState, desperate
> to get help in moving to 5.8 while maintaining their application
> performance.  Unfortunately there is not much you can do to help them
> beyond the "avoid using Unicode strings, and downgrade every time a
> module returns stuff in Unicode" advice.

I don't understand this. Computers are much faster than they ever were
before. I don't understand how a company would be 'desperate' to stick
with Perl 5.6.1, because Perl 5.8 is slower at some task. 18 months
later? Problem solved.

It's come to the point again where I've reverted back to "write the code
that is simple and maintainable instead of efficient, as you notice
the difference without benchmark testing anyways."

Then there is the subject of people assuming that the world is ASCII.
UTF-8 is only more efficient that UTF-16 at storage for ASCII characters.
For non-ASCII characters, UTF-16 is equal or more efficient in terms of
storage, and much more efficient in performance. While popular
processors are almost all 64-bits now, code is still doing per-byte
comparisons. I've done timings before and found that my processor (AMD64)
can deal with 16-bit and 8-bit quantities at approximately the same
speed. Even though more cache lines are used with 16-bit. This makes me
conclude that 8-bit is actually *slower* on modern architectures in terms
of processing requirements.

It's a different world. Perl tries to play both sides with its
8-bit/UTF-8 strings. The result is confusion. Perl should have done
a better job of making the encoding transparent. Should have. Could
have. Oh well.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About