develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #117355] [lu]cfirst don't respect 'use bytes'

Thread Previous | Thread Next
Victor Efimov via RT
August 12, 2013 23:04
[perl #117355] [lu]cfirst don't respect 'use bytes'
Message ID:
On Mon Aug 12 14:02:02 2013, Hugmeir wrote:
> but the way to get it is not necessarily "use bytes", but "if you don't
> want unicode semantics, encode your strings before matching" 

Well, utf8::encode($bytes) will change the string. So if

a) I have ASCII regexp
b) I have data, which sometimes ASCII-7-bit (in most cases), and
sometimes Unicode with wide characters
c) I want the regexp to work fast, at least when data is ASCII
d) I want to code to not be broken, if data is not ASCII.

utf8::encode($bytes) won't work as needed. It will damage string if it's
Unicode. It won't be a character string anymore, (I might want to
process it after regexp match, or I want to use regexp match variables)

> and as of
> blead, looks like ascii+utf8 now matches just as fast as plain ascii.

Yes, indeed, 5.18 still slow, but blead already fast.

via perlbug:  queue: perl5 status: open

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About