develooper Front page | perl.perl5.porters | Postings from August 2013

Re: [perl #119445] performance bug: perl Thread::Queue is 20x slower than Unix pipe

Thread Previous | Thread Next
From:
John Heidemann
Date:
August 26, 2013 20:35
Subject:
Re: [perl #119445] performance bug: perl Thread::Queue is 20x slower than Unix pipe
Message ID:
7259.1377532694@dash.isi.edu
On Mon, 26 Aug 2013 08:11:12 -0700, "Dave Mitchell via RT" wrote: 
>On Sun, Aug 25, 2013 at 05:37:39PM -0700, James E Keenan via RT wrote:
>> On Fri Aug 23 17:28:00 2013, johnh@isi.edu wrote:
>> > Why is Thread::Queue *so* slow?
>> > 
>> > I understand it has to do locking and be careful about data
>> > structures, but it seems like it is about 20x slower than opening up a
>> > Unix pipe, printing to that, reading it back and parsing the result.
>
>Because it is nothing like a UNIX pipe.
>
>A  UNIX pipe takes a stream of bytes, and read and writes chunks of them 
>into a shared buffer.
>
>A T::Q buffer takes a stream of perl "things", which might be objects or
>other such complex structures, and ensures they they are accessible by
>both the originating thread and any potential consumer thread. Migrating a
>perl "thing" across a thread boundary is considerably more complex than
>copying a byte across.
>
>
>> > To speculate, I'm thinking the cost is in making all IPC data shared.
>> > It would be great if one could have data that is sent over
>> > Thread::Queue that is copied, not shared.
>
>But T::Q is build upon a shared array, and is designed to handled shared
>data. 
>
>I think the performance you are seeing is the performance I would expect,
>and that this is not a bug.

I understand that Thread::Queue and perl threads allow shared data, and that
that's much more than a pipe.

My concern is that Thread::Queue also *forces* shared data, even when
it's not rqeuired.  If that sharing comes with a 20x performance hit,
that should be clear.

From perlthrtut, the "Pipeline" model

       The pipeline model divides up a task into a series of steps, and passes
       the results of one step on to the thread processing the next.  Each
       thread does one thing to each piece of data and passes the results to
       the next thread in line.

For the pipeline model, one does not need repeated sharing, just a
one-time hand-off.  Each queue is FIFO with data touched by only one
thread at a time.  That's exactly what my particular applications needs
to do.

But one does not *want* sharing (for the pipeline model) there if it's a
20x performance hit.

If the statement is that queues should require shared data and the
corresponding performance hit, that's a design choice one could make.
Then I'd suggest the bug becomes: perlthrtut should say "don't use
Thread::Queue for the pipeline model if you expect high performance,
roll your own IPC".

Alternatively, I'd love some mechanism to share data between threads
that allows a one-time handoff (not repeated sharing) with pipe-like
performance.  One would *think* that shared memory should be able to be
faster than round-tripping through a pipe (with perl parsing and kernel
IO).  It seems like a shame that perl is forcing full-on sharing since
it's slow and not required (in this case).

   -John


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About