develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #117355] [lu]cfirst don't respect 'use bytes'

Thread Previous | Thread Next
From:
Victor Efimov via RT
Date:
August 12, 2013 23:04
Subject:
[perl #117355] [lu]cfirst don't respect 'use bytes'
Message ID:
rt-3.6.HEAD-2552-1376348647-205.117355-15-0@perl.org
On Mon Aug 12 14:02:02 2013, Hugmeir wrote:
> but the way to get it is not necessarily "use bytes", but "if you don't
> want unicode semantics, encode your strings before matching" 

Well, utf8::encode($bytes) will change the string. So if

a) I have ASCII regexp
b) I have data, which sometimes ASCII-7-bit (in most cases), and
sometimes Unicode with wide characters
c) I want the regexp to work fast, at least when data is ASCII
d) I want to code to not be broken, if data is not ASCII.


utf8::encode($bytes) won't work as needed. It will damage string if it's
Unicode. It won't be a character string anymore, (I might want to
process it after regexp match, or I want to use regexp match variables)

> and as of
> blead, looks like ascii+utf8 now matches just as fast as plain ascii.

Yes, indeed, 5.18 still slow, but blead already fast.


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=117355

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About