Assuming pv uses splice(), there is one only copy in the workload: copy_from_user() from fixed source buffer to some kernel allocated page, then those pages are spliced to /dev/null.
If the pages are not "recycled" (through LRU scheme for allocation), the destination changes every time and the L2 cache is constantly trashed.
I only learned of pv from this article so I can't speak much about its buffering. I would guess that the kernel would try to re-use recent freed pages to minimise cache thrashing. But anyway, on the 'yes' side, the program isn't re-allocating its 8kb buffer after every write(), so there's a lot of data being re-read from the same memory location.
Assuming pv uses splice(), there is one only copy in the workload: copy_from_user() from fixed source buffer to some kernel allocated page, then those pages are spliced to /dev/null.
If the pages are not "recycled" (through LRU scheme for allocation), the destination changes every time and the L2 cache is constantly trashed.