What are you precisely trying to measure? Theoretically if you know the performance counters you want to measure, you can replace the IPC counter in the kernel module. I believe Dick has a different version of the kernel module which measure LLC misses instead of CPU cycles. Does that answer your question?
Hey, thanks for the response. Is it just a matter of measuring the LLC miss rate and then figuring out the max DRAM bandwidth somehow? What about in a multicore setting? NUMA? It would be nice to have a library or tool that works this out - always surprises me there isn't something off the shelf.
You might be interested to use Intel VTune then if you have an Intel CPU. I believe it has a profiling option that shows memory bandwidth over time [1].