Correlation vs Causation in Data Science

A short and sweet explanation using real-world examples.

Sundas Khalid
3 min readAug 2, 2020

Let’s jump into it right away.

Correlation

Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For example, ice-cream sales go up as the weather turns hot.

A positive correlation means, the movement is in the same direction (left plot); negative correlation means that variables move in opposite direction (middle plot). The farther right plot is when there no correlation between the variables.

Causation

Causation means that one variable causes another to change, which means one variable is dependent on the other. It is also called cause and effect. One example would be as weather gets hot, people experience more sunburns. In this case, the weather caused an effect which is sunburn.

Photo by Anthony Figueroa correlation is not causation

Correlation vs Causation Difference

Let’s try another example with this visualization. Your computer running out of battery causes it to shut down. It also causes video player to shut down. Now, computer and video player shutting down events are correlated; the actual cause is running out…

--

--

Sundas Khalid

I write about data science, diversity & lifestyle | currently at Google | more learning content at sundaskhalid.com