One metric I've seen is gzip-compressed size, which has the nice property that it identifies the size of the incompressible elements -- ie it discounts repetitive boilerplate.
Another interesting set of metrics is Halstead's "software science" metrics[1]. They fell out of favour because initially they were hard to count and didn't seem to correlate with anything else.
You're trying to understand the "true" size of the software in spite of the idiosyncrasies of a given language.
As I noted somewhere above, "size" is an abstract, dimensionless quality. It can only be approached through proxies. The more the merrier, I reckon, especially if they turn out to correlate with different things.
In the case of most projects, copy/paste code is not just because of the language. It's because of lousy programmers. I've seen large codebases which are made up of a full 40% duplicate code. There's no way to blame that on the language.
You're missing the point of the parent: There is no 'one true' metric. If you use different metrics (actual lines, logical lines, gzip'd size, etc) you may well find different correlations.
I've changed my mind. I'm interested in what I originally said: what's the best way to measure code size, and what are those studies (if they exist). Otherwise we get into debates about size vs. complexity, which is actually less interesting IMO. Size as a proxy for complexity is good enough for me.
Another interesting set of metrics is Halstead's "software science" metrics[1]. They fell out of favour because initially they were hard to count and didn't seem to correlate with anything else.
[1] http://en.wikipedia.org/wiki/Halstead_complexity_measures