Very well done for all your hard work so far!
The concepts we have covered are complex and difficult
Mastery takes time and practice
You have all made an excellent start!!!!
Very well done for all your hard work so far!
The concepts we have covered are complex and difficult
Mastery takes time and practice
You have all made an excellent start!!!!
We will now begin putting these ideas into practice
Much less new information!
Applying the same ideas to different research questions/scenarios
Very well done for all your hard work so far!
The concepts we have covered are complex and difficult
Mastery takes time and practice
You have all made an excellent start!!!!
We will now begin putting these ideas into practice
Much less new information!
Applying the same ideas to different research questions/scenarios
Finally at the confluence of your stats knowledge and R skill
Let's get started!
This week: Correlation
Week 5: Chi-Square
Week 6: t-test
Week 7: The Linear Model
Week 8: The Linear Model
We will not start the lab report for a couple more weeks
We will talk about the lab report in the lectures and work on it in the practicals
After this lecture you will understand:
The concepts behind statistical correlation
How to interpret the values of the correlation coefficient r
How to read a correlation matrix
How to interpret and report significance tests of r
The relationship between correlation and causation
Everything from the past few weeks we will now put into action!
For each statistical analysis, we will have the same ingredients:
Data, from which we calculate...
A test statistic that represents the relationship of interest, which we compare to...
The distribution of that test statistic under the null hypothesis to get...
The probability p of getting a test statistic as large as the one we have (or larger) if the null hypothesis is true
We want to believe true things about the world, and disbelieve false things
Statistics is a system to help us make decisions about whether, and to what degree, we believe something is supported by evidence
Essential question: how do two variables change in relation to each other?
When one variable changes, does the other...
Change in a similar way?
Change in the opposite way?
Not change very much at all?
Essential question: how do two variables change in relation to each other?
When one variable changes, does the other...
Change in a similar way?
Change in the opposite way?
Not change very much at all?
In other words: to what degree do two variables behave the same way?
Quantifies the degree and direction of a relationship
Typically used with two (or more) continuous variables
Today's example: Gender and Sexuality data from the questionnaire
People who gave high ratings for femininity tended to give low ratings for masculinity, and vice versa
We might like to know:
How strong is this relationship?
Should we believe that it's real (ie representative of people/first-year psychology students in general?)
We can quantify the strength and direction of the relationship between femininity and masculinity with Pearson's correlation coefficient r
Values range from -1 (perfect negative) through 0 (no relationship) to 1 (perfect positive)
We can quantify the strength and direction of the relationship between femininity and masculinity with Pearson's correlation coefficient r
Values range from -1 (perfect negative) through 0 (no relationship) to 1 (perfect positive)
Absolute value of r between 0 and 1
We can quantify the strength and direction of the relationship between femininity and masculinity with Pearson's correlation coefficient r
Values range from -1 (perfect negative) through 0 (no relationship) to 1 (perfect positive)
Absolute value of r between 0 and 1
Whether the value of r is positive or negative
gensex %>% select(Gender_fem_1, Gender_masc_1) %>% cor(method = "pearson")
## Gender_fem_1 Gender_masc_1## Gender_fem_1 1.0000000 -0.7563823## Gender_masc_1 -0.7563823 1.0000000
So, our correlation coefficient r is -.76
POP QUIZ: How can we interpret this?
The negative sign (-) means as femininity increases, masculinity tends to decrease (and vice versa)
The absolute value of .76 is very strong - quite close to 1!
We now have our data, from which we calculated...
Our test statistic r (-.76)
We also know the distribution of r with different degrees of freedom
We now have our data, from which we calculated...
Our test statistic r (-.76)
We also know the distribution of r with different degrees of freedom
We can now ask how likely we are to get a value of -.76 (or larger) if in fact femininity and masculinity have a true r of 0
i.e. the null hypothesis is in fact true
We will use the standard significance level of .05 in this case
## ## Pearson's product-moment correlation## ## data: Gender_fem_1 and Gender_masc_1## t = -20.128, df = 303, p-value < 0.00000000000000022## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## -0.8006746 -0.7038665## sample estimates:## cor ## -0.7563823
We can report this as: "There was a significant negative correlation between femininity and masculinity, r(303) = -.76, p < .001."
Correlations are often presented in matrices
Each cell contains the correlation coefficient r for the variables in the corresponding row and column
POP QUIZ: Why is there a diagonal line of 1s?
## comfortable masc fem stability## comfortable 1.00 -0.31 0.17 0.61## masc -0.31 1.00 -0.76 -0.28## fem 0.17 -0.76 1.00 0.18## stability 0.61 -0.28 0.18 1.00
More useful version with GGally::ggscatmat()
Scatterplots, distributions, and r values
Our analysis showed that higher ratings of femininity tended to correspond to lower ratings of masculinity, and vice versa
Can we conclude from this that being more feminine causes you to be more masculine?
Our analysis showed that higher ratings of femininity tended to correspond to lower ratings of masculinity, and vice versa
Can we conclude from this that being more feminine causes you to be more masculine?
Why not? :(
No distinction between cause and effect
Which is the chicken and which is the egg?
Which came first: femininity or masculinity?
Why not? :(
No distinction between cause and effect
Which is the chicken and which is the egg?
Which came first: femininity or masculinity?
No experimental manipulation (randomisation)
Why not? :(
No distinction between cause and effect
Which is the chicken and which is the egg?
Which came first: femininity or masculinity?
No experimental manipulation (randomisation)
The problem of tertium quid - a third variable that influences both the variables you're actually measuring
Consider the number of hours per day you and a friend on this course spend studying
Both tend to study less on similar days (e.g. the weekend)
Both tend to study more on similar days (e.g. right before an assessment is due)
Consider the number of hours per day you and a friend on this course spend studying
Both tend to study less on similar days (e.g. the weekend)
Both tend to study more on similar days (e.g. right before an assessment is due)
So, you and your friend's hours studying are likely to be highly correlated
Does this mean that you studying more (or less) causes your friend to study more (or less)?
Of course not! Which of you "causes" the other to study more/less?
Tertium quid: An unmeasured third factor that influences both of you
Of course not! Which of you "causes" the other to study more/less?
Tertium quid: An unmeasured third factor that influences both of you
Some sources of variation:
Differences in experience or interest
Which electives you're each taking
Friends and family obligations
Part time work
In common language, "correlated" means "related to in some way, usually causally"
In statistics-ese, it means "the (standardised) degree to which two or more variables covary", ie change in relation to each other
In common language, "correlated" means "related to in some way, usually causally"
In statistics-ese, it means "the (standardised) degree to which two or more variables covary", ie change in relation to each other
"Correlation" is a technical term!
In your reports, do not say two things are "correlated" unless you report r as evidence!
Instead: variables "have a relationship"/"are related to each other"
The correlation coefficient r quantifies the strength and direction of relationships between variables
The p-value associated with r is the probability of encountering a value of r as large as the one we have, or larger, if in fact the true value of r in the population is 0
Correlation DOES NOT IMPLY CAUSATION!!!!!!!
More practice with interpreting r with this fun little game
Recognise people who have helped your or others by nominating them for a SavioR award
Give us feedback, ideas, or suggestions in the Suggestion Box
Recognise people who have helped your or others by nominating them for a SavioR award
Give us feedback, ideas, or suggestions in the Suggestion Box
Don't try to go it alone!
Ask to study with practical teams, friends on the course
Set up Zoom calls to work on the tutorials together
Be the change you wish to see in the world 😄
Revise all new definitions/concepts (see previous slide)
Revise how to read the output of GGally::ggscatmat()
and cor.test()
Do NOT need to memorise function names or syntax
Revise all new definitions/concepts (see previous slide)
Revise how to read the output of GGally::ggscatmat()
and cor.test()
Do NOT need to memorise function names or syntax
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |