Front page | perl.perl5.porters |
Postings from August 2013
Re: [perl #119445] performance bug: perl Thread::Queue is 20x slowerthan Unix pipe
Thread Previous
From:
Eric Brine
Date:
August 26, 2013 20:42
Subject:
Re: [perl #119445] performance bug: perl Thread::Queue is 20x slowerthan Unix pipe
Message ID:
CALJW-qF6+ZJpOaWihVyE2=syRbmFUH0865Gwc3us3w58OxZbQw@mail.gmail.com
How does Thread::Queue::Any compare?
On Mon, Aug 26, 2013 at 11:58 AM, John Heidemann <johnh@isi.edu> wrote:
> On Mon, 26 Aug 2013 08:11:12 -0700, "Dave Mitchell via RT" wrote:
> >On Sun, Aug 25, 2013 at 05:37:39PM -0700, James E Keenan via RT wrote:
> >> On Fri Aug 23 17:28:00 2013, johnh@isi.edu wrote:
> >> > Why is Thread::Queue *so* slow?
> >> >
> >> > I understand it has to do locking and be careful about data
> >> > structures, but it seems like it is about 20x slower than opening up a
> >> > Unix pipe, printing to that, reading it back and parsing the result.
> >
> >Because it is nothing like a UNIX pipe.
> >
> >A UNIX pipe takes a stream of bytes, and read and writes chunks of them
> >into a shared buffer.
> >
> >A T::Q buffer takes a stream of perl "things", which might be objects or
> >other such complex structures, and ensures they they are accessible by
> >both the originating thread and any potential consumer thread. Migrating a
> >perl "thing" across a thread boundary is considerably more complex than
> >copying a byte across.
> >
> >
> >> > To speculate, I'm thinking the cost is in making all IPC data shared.
> >> > It would be great if one could have data that is sent over
> >> > Thread::Queue that is copied, not shared.
> >
> >But T::Q is build upon a shared array, and is designed to handled shared
> >data.
> >
> >I think the performance you are seeing is the performance I would expect,
> >and that this is not a bug.
>
> I understand that Thread::Queue and perl threads allow shared data, and
> that
> that's much more than a pipe.
>
> My concern is that Thread::Queue also *forces* shared data, even when
> it's not rqeuired. If that sharing comes with a 20x performance hit,
> that should be clear.
>
> From perlthrtut, the "Pipeline" model
>
> The pipeline model divides up a task into a series of steps, and
> passes
> the results of one step on to the thread processing the next. Each
> thread does one thing to each piece of data and passes the results
> to
> the next thread in line.
>
> For the pipeline model, one does not need repeated sharing, just a
> one-time hand-off. Each queue is FIFO with data touched by only one
> thread at a time. That's exactly what my particular applications needs
> to do.
>
> But one does not *want* sharing (for the pipeline model) there if it's a
> 20x performance hit.
>
> If the statement is that queues should require shared data and the
> corresponding performance hit, that's a design choice one could make.
> Then I'd suggest the bug becomes: perlthrtut should say "don't use
> Thread::Queue for the pipeline model if you expect high performance,
> roll your own IPC".
>
> Alternatively, I'd love some mechanism to share data between threads
> that allows a one-time handoff (not repeated sharing) with pipe-like
> performance. One would *think* that shared memory should be able to be
> faster than round-tripping through a pipe (with perl parsing and kernel
> IO). It seems like a shame that perl is forcing full-on sharing since
> it's slow and not required (in this case).
>
> -John
>
>
Thread Previous