Correlation vs Causality
Have you ever wondered, how would it be possible to predict something someone will do? Of course, predicting the action of a single individual would be almost impossible, but the behavior of a portion of the population is already something a little more plausible, as described in Asimov’s book Foundation.
So, to do this, we could look at people who have already taken a certain action, for what we want to predict. This search should try to understand what motivates that action. And it would be possible to establish correlation and causality, and as incredible as it seems, this is an important step when working with machine learning.
Of course, this is not a simple task, and sometimes these relationships are not so clear. For example, it’s common for your mother to tell you to take an umbrella, when you’re a rebellious teenager and you want to go out, you don’t take your umbrella because it’s sunny, and what happens? Yes, the city is almost flooded! Do we have a causality here? Unless your mother is Storm from the X men, we just have a coincidence.
Jokes aside, one important thing to keep in mind is: correlation is not the same thing of casuality. Both occur, and are related to each other, but they are different concepts and do not always happen in the same scenario.
So with in mind, we can get a point that a event can be highly correlated, but have no causality. But what is each of this things?
Let´s see correlation first. We can say that correlation is a statistical concept, that shows how close two variables are related. So if the value of a variable change the value of the other variable change too, as a reponse. For example, in a hypothetical scenario, it is more likely that a customer of a telecom operator, who has a service with a package with several services and automatic debit payment, remains for a longer time as a customer, than someone who has only one basic service in the monthly plan.
This relationship can be expressed in three categories. It can be positive, while the value of one variable increases the other increases as well; can be negative, the value of one variable increases and the value of the other variable decreases; and there may be no correlation. How shows the picture.
Correlation can be measured in a range, which is between 1 (positive) and -1(negative).
And what about the other concept? For have cauaslity the event A must be the ignition of the event B, and sometimes can be difficult establish this link. I say this, because not every correlation will have causality, however, every causality will have a correlation.
In this case we can see the cause-and-effect relationship. For instance, a poor service could be the cause of your client signing a contract with the competition.
So, these two concepts become important, to be applied correctly, because they help in the discovery of relationships in a universe of data. However, care must be taken not to “find” causalities that do not exist in real life, how describe in the firt example.