I use phantomjs + jqyery for my own scraping/testing engine so I thought I would chuck in some extra information for those not familiar with PhantomJs.
The one thing that set phantomjs apart is that is it a full headless webkit browser rather than just an html parsing engine which most other solutions are. The big win with the above in mind is that you can scrape and test comet/heavy javascript apps without having to mock the polling or submit/responses.
I run it like a bot controlled by NodeJs with NowJs sending commands to it and it returning the results of tests, though I believe there are plans to get process to process communication working to make the process of controlling and pushing data out easier.
I, too, use a nodejs server to control multiple phantomjs processes. There's a patch that lets your script read from stdin -- last weekend I modified it to support my platform's preferred line ending. I also added commands for mousemove/mousedown/mouseup; they stuff actual mouse events in the Qt event queue, so you don't have to worry about the edge cases where javascript-faked mouse events fail.
I am also working on a jquery/js scraping framework of my own. I think this is the way go, because there is no library that used more to extract HTML then jQuery. And it also enables you to scrape JS code on the page.
I used node+jsdom so far. I will have a look at phantom js.
The one thing that set phantomjs apart is that is it a full headless webkit browser rather than just an html parsing engine which most other solutions are. The big win with the above in mind is that you can scrape and test comet/heavy javascript apps without having to mock the polling or submit/responses.
I run it like a bot controlled by NodeJs with NowJs sending commands to it and it returning the results of tests, though I believe there are plans to get process to process communication working to make the process of controlling and pushing data out easier.