Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why would random answers correlate? Statistical significance for something like this is all about rejecting results consistent with randomness. Correlation means it appears to be non-random.


So for a simple example:

Suppose 1/20 people are sadistic, and 1/20 people love eating bitter food.

Let's suppose each question is multiple choice with a T/F.

Let's suppose also 1/10 respondents are bots that answer randomly.

Of the people who answer they like sadism on a given question, 66% will be bots. And of the people who say they like bitterness 66% will be bots.

For simplicity sake consider a simple two-question survey (one question about sadism, one about bitter food).

In this case you will get the following numbers, even if there's no genuine correlation:

[One bot in each category] - 1/40 like both

- 3/40 like bitterness but NOT sadism

- 3/40 like sadism but NOT bitterness

- 33/40 like neither

So you would conclude if you like bitterness (4 people) you have a (1/4) 25% chance of liking sadism, whereas if you don't like bitterness (36 people) you have a (3/36) 8% chance of liking sadism. Therefore liking bitterness would appear to predict to liking sadism (when really both are just predictors of being a bot).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: