> Eh what? Word counting is embarrassingly parallel, at least for files. wc does...

vidarh · on Oct 16, 2019

You don't need to seek: Spin up workers, have them wait on a mutex, and read blocks. If the producer is slow, then you won't need to feed multiple workers. If it's fast, it's a matter of finding the tradeoff between block size vs. the processing time per block.

foxhill · on Oct 16, 2019

i expect you're being downvoted because it's likely that the cost of deciding which thread to assign work to is an order-of-magnitude more expensive than accumulating the reqiured counters.

in any case: you're right, it is of course possible. what i should have said was:

> if you are piping into wc, then you'll never see a performance gain from any attempts at parallelisation.

vidarh · on Oct 17, 2019

Well, I'm back at 1 point now. In any case, the amount of work needed to assign to a thread is trivial. For a small number of threads, it at most requires a couple of mutex operations and a couple of load/stores to distribute each block. Given the block size can be set arbitrarily high, the overhead of work assignment per byte can be made arbitrarily low. The biggest issue is that you won't know the input size, and for small input sizes there will be a cutoff point where multiple threads adds cost. But for small input sizes, performance won't be an issue anyway.

EDIT: Actually, the simplest way of assigning work is probably just to have each thread try to acquire a mutex, call read(), and then release the mutex and do their work.