Gerard Goossen skribis 2007-02-08 14:22 (+0100): > > You'd look for \x{20ac}, you'd be looking for \xe2\x82\xac. > Do NOT use \xe2\x82\xac to create bytes. Use pack (or > \x[e2]\x[82]\x[ac]) to create bytes. As if all previous discussion never happened. Tiresome. I've also said to avoid \x for creating bytes. Later I learned that it is a safe way to create bytes, as long as you're not under "use encoding". \xff HAS TO BE a safe way to create a single byte, because otherwise it would not be backwards compatible with a decade of pre-existing code. Of course, these bytes are upgraded to UTF8 (**INTERNALLY**!!) if you use them with strings that are also in UTF8 (again, internally). That's perfectly okay, because one cannot mix byte strings like "\xe2\x82\xac" with text strings like "3,00: goedkóóp!", in any meaningful way, because "3,00: goedkóóp" makes no sense in the context of bytes, if you do not encode it. \x[] does not exist in Real Perl, mind you! > But looking for this byte sequence is already what the current regex > engine does: I don't know how exactly your benchmark turns out those results. Maybe because the single match is at position 0, maybe because you "use bytes", maybe for some other reason. But I'll just show you one of the benchmarks that I did before: juerd@lanova:~$ perl -MBenchmark=cmpthese -MEncode -e' my $unicode = "f\x{20ac}oo"; Encode::_utf8_off(my $utf8 = $unicode); my $re_unicode = qr/\x{20ac}/; my $re_utf8 = qr/\xe2\x82\xac/; cmpthese -1, { unicode => sub { (my $dummy = $unicode) =~ s/$unicode_re/E/; }, utf8 => sub { (my $dummy = $utf8) =~ s/$utf8_re/E/; } }' Rate unicode utf8 unicode 314139/s -- -27% utf8 428740/s 36% -- Nomenclature: unicode: Unicode string ("text string", "character string") utf8: The same unicode string, encoded to utf8 (by the ugly means of removing the UTF8 flag from the aforementioned unicode string). It is now a byte string. By the way, when you say "current perl", do you refer to stable, blead, or your own branch? I'm currently using 5.8.8. -- korajn salutojn, juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig> convolution: ict solutions and consultancy <sales@convolution.nl> Ik vertrouw stemcomputers niet. Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.