Correlation is a statistical measure that describes the extent to which two variables change together. It quantifies the degree to which the variables are related. When the values of one variable change systematically with the values of another variable, they are said to be correlated.
Key Points about Correlation
Direction of Correlation:
Positive Correlation: Both variables move in the same direction. As one variable increases, the other also increases. Conversely, as one decreases, the other also decreases.
Negative Correlation: The variables move in opposite directions. As one variable increases, the other decreases, and vice versa.
No Correlation: There is no systematic relationship between the variables. Changes in one variable do not predict changes in the other.
Strength of Correlation:
Perfect Correlation: When two variables move exactly together, they have a correlation of +1 (perfect positive correlation) or -1 (perfect negative correlation).
Strong Correlation: The variables have a correlation close to +1 or -1.
Weak Correlation: The correlation is close to 0.
Zero Correlation: There is no relationship between the variables (correlation is 0).
Correlation Coefficient:
The correlation coefficient (often denoted as r) is a numerical value that ranges from -1 to +1 and quantifies the degree of correlation between two variables.
+1 indicates a perfect positive correlation.
-1 indicates a perfect negative correlation.
0 indicates no correlation.
Types of Correlation Coefficients
Pearson Correlation Coefficient:
Measures the linear relationship between two continuous variables. It assumes that the data is normally distributed.
r=∑(xi−xˉ)2∑(yi−yˉ)2∑(xi−xˉ)(yi−yˉ)
where xi and yi are the individual sample points, and xˉ and yˉ are the means of the x and y variables, respectively.
Spearman's Rank Correlation Coefficient:
Measures the strength and direction of the association between two ranked variables. It does not assume a linear relationship or normally distributed data.
rs=1−n(n2−1)6∑di2
where di is the difference between the ranks of corresponding values, and n is the number of observations.
Kendall's Tau:
Measures the strength of association between two variables by considering the number of concordant and discordant pairs.
Example in Python Using Pearson Correlation
Here’s how you can calculate and visualize the Pearson correlation coefficient using Python's pandas and seaborn libraries:
Understanding correlation is crucial for determining relationships between variables, which can help in predictive modeling, risk management, and decision-making processes. However, it's important to remember that correlation does not imply causation. Just because two variables are correlated does not mean that one causes the other to change.
You are given two integer variables, x and y. You have to swap the values stored in x and y.
---------------------------------------------------------------------- Input: Two numbers x and y separated by a comma.
Output: Print 5 lines. The first two lines will have values of variables shown before swapping, and the last two lines will have values of variables shown after swapping. The third line will be blank.