develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
March 30, 2007 15:04
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
Hash: SHA1


On Friday 30 March 2007 21:28:44 Juerd Waalboer wrote:
> Tels skribis 2007-03-30 22:32 (+0000):
> > However, if you have 200Mbyte of ASCII string, it is more efficient to
> > *not* copy the data around just to find out that, yes, all of it is
> > 7bit :)
> Indeed, but this is an optimization. Optimization isn't part of teaching
> how things work, it always comes after.

I almost agree. :)

Some decisions really need to be done early on, in the design phase. You 
cannot optimize when the design is broken. E.g. if your data needs to be 
copied around *per design*, the best you can achive is O(N). When you do 
not have to copy the data, you suddenly can achive O(1). This distinctions 
is quite important, and not something you can fix aftwards apart from 
redesigning (aka let's break and re-assemble it :)

A recent (non-Perl) example for such a methodology/design change was 
zero-copy networking - I remember there being a lot of talk about this, 
especially in Unix/Linux world. Basically, when you want to send data to 
the network it is wastefull to copy it many times around just to output it 
to the hardware - up to the point where the copy takes more time than all 
the rest of work to be done. However, avoidn the copy isn't that easy :)

I know it is hard to design your code so that it works fine for small data 
("A") and large data ("A" x 10000000) alike, but usually, these things need 
to be considered early on, or you end up with a system that is only usefull 
for demos and toying around and breaks under real-world access :)

Just like security, a performant design usually can't just bolted on later.

And how to design your program to be secure, ast, reliable etc. should be 
teached, too. Maybe not in the same hour, but close :-)

Just saying... :)

> Information overload is probably the single most problematic thing in
> Perl's unicode documentation. Constantly people are told all those
> internal implementation details that they don't have to know. It's no
> wonder that they start assuming that they actually need this
> information, and use manual setting of UTF8 flags as their first resort
> in case of trouble.

I think I agree. Luckily I managed to completely avoid this whole issue by 
ignoring unicode until very recently - and then the doc and code had 
improved quit a lot so that Unicode is really usable in Perl (Thank you 
guys! especially Jarkko!)

All the best,


- -- 
 Signed on Fri Mar 30 23:55:12 2007 with key 0x93B84C15.
 View my photo gallery:
 PGP key on or per email.

 "Elliot, Sie Schwachkopf!"

Version: GnuPG v1.4.2 (GNU/Linux)


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About