develooper Front page | perl.perl5.porters | Postings from December 2017

Re: Behavior of bitwise ops on unencountered wide characters

Thread Previous | Thread Next
From:
Karl Williamson
Date:
December 20, 2017 18:01
Subject:
Re: Behavior of bitwise ops on unencountered wide characters
Message ID:
4ab6e8bd-687e-31b0-b76f-dc10d93297a8@khwilliamson.com
On 12/20/2017 09:42 AM, Paul "LeoNerd" Evans wrote:
> On Tue, 19 Dec 2017 19:16:24 -0700
> Karl Williamson <public@khwilliamson.com> wrote:
> 
>> 3) Is there enough usage of quantified [[:ascii:]] in the wild to
>> justify doing this optimization?  (I was surprised to see only 132
>> CPAN modules have plain :ascii: (this grep also would catch negation))
> 
> Perhaps they use
> 
>   [\x00-\x7F]
> 
> or something similar?

Good point, there are a bunch more modules that use this.  Doing so 
means their code is not portable to EBCDIC machines.

Perhaps that's why I didn't think to optimize these on ASCII machines, 
into [[:ascii]]:, though it's easy to do so, and I will.


  I'd imagine looking for one of those would be
> much shorter too, as you can AND with 0x80808080 (or 64bit equivalent)
> and get 4 (8) chars at once.
> 

That's what I meant by vectorization, or word-at-a-time operations. 
That's what I just added to core.

In fact this will work on any exact pattern whose length evenly divides 
the word size.  The regex engine could be changed to take advantage of 
this in several places.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About