On Fri, Mar 30, 2007 at 10:09:29PM +0200, Juerd Waalboer <juerd@convolution.nl> wrote: > Marc Lehmann skribis 2007-03-30 14:24 (+0200): > > In fact, I teach a lot of people about unicode in perl. > > At the German Perl Workshop, I saw your unicode presentation. I don't > know if this is a good representation for your teaching of unicode, but It is, if a bit short (and I consider it a matter of taste). > > If perl had the abstract model juerd dreams of > > and uses in day-to-day coding, without encountering ANY of the problems > that you describe Frankly, that is not a very good sign. It means eitehr you are extremely lucky or you don't use any of the many XS modules that silently break, or even the Perl modules (such as the example from Compress::Zlib) that break less silently, but more miraciously. > It kind of makes one wonder if this dream might be reality (and your > reality a dream?) The dream isn't reality. If it ere, people would not report bugs against JSON::XS because it happens to create scalar values with the UTF-X bit set. And they do so for some of my other modules doing that, too. And there are two options to me: either tlel them perl is broken w.r.t. to e.g. "C", or their code is broken becasue they do not call downgrade. Obviously, I prefer the former over the latter, but last time I was told unpack "C" was mentioned to break the abstraction in the camelbook, so its correct. Which suddenly invalidates a lot of code. > > then perl would have a very easy unicode model that boils down to > > what I talked about on the perl workshop: encode/decode when doing > > I/O, oherwise, enjoy. > > And keep text strings and byte strings separate!!!!!!!!!!!!!eleven I find "text strings" and "byte strings" not adequate either, as Perl makes no difference between those two concepts (being typeless), and they do not map well to encoded/decoded text either. Perl only knows how toc oncatenate characters, it does not know anything about byte or text, so utf8::encode does not necesarily create a byte string out of a text string. It could juts as well create a text string out of a byte string (think JSON, which creates json _text_ out of e.g. byte strings by encoding them to UTF-8). > So, recap: encode/decode when doing I/O, keep text strings and byte > strings separate, otherwise, enjoy. I do not think that maps clearly to Perl (or my programs either). It might be a good and simplified advice to a beginner, though, although I prefer to never tell people simplified (but wrong) things. The perl unicode model is rather simple, but leaves you in control, and I found teaching people about how perl just allows more than 0..255 for a character index works best (although people differ). -- The choice of a -----==- _GNU_ ----==-- _ generation Marc Lehmann ---==---(_)__ __ ____ __ pcg@goof.com --==---/ / _ \/ // /\ \/ / http://schmorp.de/ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPEThread Previous | Thread Next