One metric I've seen is gzip-compressed size, which has the nice property that i...

anon_d · on Sept 26, 2011

I never understood the gzip one. Repetitive boilerplate is bad; why hide it?

jacques_chester · on Sept 26, 2011

You're trying to understand the "true" size of the software in spite of the idiosyncrasies of a given language.

As I noted somewhere above, "size" is an abstract, dimensionless quality. It can only be approached through proxies. The more the merrier, I reckon, especially if they turn out to correlate with different things.

jcromartie · on Sept 26, 2011

In the case of most projects, copy/paste code is not just because of the language. It's because of lousy programmers. I've seen large codebases which are made up of a full 40% duplicate code. There's no way to blame that on the language.

pyre · on Sept 26, 2011

You're missing the point of the parent: There is no 'one true' metric. If you use different metrics (actual lines, logical lines, gzip'd size, etc) you may well find different correlations.

gruseom · on Sept 26, 2011

But repetitive boilerplate is exactly the last thing that should get away scot-free in a measurement of code size.

jacques_chester · on Sept 26, 2011

It depends on what you want to know. Pure physical lines is one thing, "size" is a another.

gruseom · on Sept 26, 2011

I want a way to measure how complicated a program is that's independent of language and obviously extraneous things like line length.

jacques_chester · on Sept 26, 2011

You may find that the Halstead metrics I mentioned are closer to what you're after.

gruseom · on Sept 26, 2011

I've changed my mind. I'm interested in what I originally said: what's the best way to measure code size, and what are those studies (if they exist). Otherwise we get into debates about size vs. complexity, which is actually less interesting IMO. Size as a proxy for complexity is good enough for me.