On Mon, 26 Aug 2013 08:11:12 -0700, "Dave Mitchell via RT" wrote: >On Sun, Aug 25, 2013 at 05:37:39PM -0700, James E Keenan via RT wrote: >> On Fri Aug 23 17:28:00 2013, johnh@isi.edu wrote: >> > Why is Thread::Queue *so* slow? >> > >> > I understand it has to do locking and be careful about data >> > structures, but it seems like it is about 20x slower than opening up a >> > Unix pipe, printing to that, reading it back and parsing the result. > >Because it is nothing like a UNIX pipe. > >A UNIX pipe takes a stream of bytes, and read and writes chunks of them >into a shared buffer. > >A T::Q buffer takes a stream of perl "things", which might be objects or >other such complex structures, and ensures they they are accessible by >both the originating thread and any potential consumer thread. Migrating a >perl "thing" across a thread boundary is considerably more complex than >copying a byte across. > > >> > To speculate, I'm thinking the cost is in making all IPC data shared. >> > It would be great if one could have data that is sent over >> > Thread::Queue that is copied, not shared. > >But T::Q is build upon a shared array, and is designed to handled shared >data. > >I think the performance you are seeing is the performance I would expect, >and that this is not a bug. I understand that Thread::Queue and perl threads allow shared data, and that that's much more than a pipe. My concern is that Thread::Queue also *forces* shared data, even when it's not rqeuired. If that sharing comes with a 20x performance hit, that should be clear. From perlthrtut, the "Pipeline" model The pipeline model divides up a task into a series of steps, and passes the results of one step on to the thread processing the next. Each thread does one thing to each piece of data and passes the results to the next thread in line. For the pipeline model, one does not need repeated sharing, just a one-time hand-off. Each queue is FIFO with data touched by only one thread at a time. That's exactly what my particular applications needs to do. But one does not *want* sharing (for the pipeline model) there if it's a 20x performance hit. If the statement is that queues should require shared data and the corresponding performance hit, that's a design choice one could make. Then I'd suggest the bug becomes: perlthrtut should say "don't use Thread::Queue for the pipeline model if you expect high performance, roll your own IPC". Alternatively, I'd love some mechanism to share data between threads that allows a one-time handoff (not repeated sharing) with pipe-like performance. One would *think* that shared memory should be able to be faster than round-tripping through a pipe (with perl parsing and kernel IO). It seems like a shame that perl is forcing full-on sharing since it's slow and not required (in this case). -JohnThread Previous | Thread Next