Summaries of Data

Setting Up

Task 1

Open your project for this week in RStudio. Then, open a new Markdown file with HTML output and save it in the r_docs folder. (Give it a sensible name, like worksheet_03 or similar!)

For each of the tasks in Analyses, write your code to complete the task in a new code chunk.

Remember, you can add new code chunks by:

Using the RStudio toolbar: Click Code > Insert Chunk
Using a keyboard shortcut: the default is Ctrl + Alt + I (Windows) or ⌘ Command + Option + I (MacOS), but you can change this under Tools > Modify Keyboard Shortcuts…
Typing it out: ```{r}, press ↵ Enter, then ``` again.
Copy and pasting a code chunk you already have (but be careful of duplicated chunk names!)

To prepare for the take-away paper, make sure you knit this document when you’ve finished the tasks.

Task 2

Load the tidyverse package and read in the data in the setup code chunk.

library(tidyverse)
gensex <- readr::read_csv("https://and.netlify.app/datasets/gensex_2022.csv")

Task 3

Review the Codebook at the link below, which has all the information you need about this dataset.

View the Codebook

Analyses

You will need the output from all of the following tasks in order to complete the worksheet quiz. If you are having any trouble with this, or you aren’t sure how to understand what the output means, ask for help in your practical!

Task 4

Starting with the gensex data, do the following in a single pipeline.

Keep only rows where the participant completed the questionnaire in 10 minutes or less.
- Hint: use the Codebook to check how the duration variable is measured!
Group by gender
Create a summary of the means of each of the four gender_ rating variables for each gender group, and the number of people in each group.

gensex %>% 
  dplyr::filter(duration <= 600) %>% 
  dplyr::group_by(gender) %>% 
  dplyr::summarise(
    mean_comf = mean(gender_comfort),
    mean_masc = mean(gender_masc),
    mean_fem = mean(gender_fem),
    mean_stable = mean(gender_stable, na.rm = TRUE),
    n = dplyr::n()
  )

# A tibble: 4 x 6
  gender mean_comf mean_masc mean_fem mean_stable     n
  <chr>      <dbl>     <dbl>    <dbl>       <dbl> <int>
1 Female      8.77      3.19     6.88        8.27   231
2 Male        8.70      6.23     3.96        8       47
3 Other       5.6       6.6      5.4         2.2      5
4 <NA>        8.67      3.67     6.33        7        3

## A nicer (less typing) solution - optional!
## Run ?across or vignette("colwise") to learn more about this method
## Or ask in your practical or at a help desk

gensex %>% 
  dplyr::filter(duration <= 600) %>% 
  dplyr::group_by(gender) %>% 
  dplyr::summarise(
    dplyr::across(starts_with("gender_"), ~ mean(.x, na.rm = TRUE)),
    n = dplyr::n()
  )

# A tibble: 4 x 6
  gender gender_comfort gender_masc gender_fem gender_stable     n
  <chr>           <dbl>       <dbl>      <dbl>         <dbl> <int>
1 Female           8.77        3.19       6.88          8.27   231
2 Male             8.70        6.23       3.96          8       47
3 Other            5.6         6.6        5.4           2.2      5
4 <NA>             8.67        3.67       6.33          7        3

Task 5

Starting with the gensex data, do the following in a single pipeline.

Group by gender
Create a summary of the mean, standard deviation, number of observations, and standard error of romantic_freq for each group

Remember that you can find the formula for standard error in the lecture slides from last week, or indeed anywhere on the Internet. You may find the sqrt() function useful for this - if you can’t guess what it does, try bringing up its help documentation.

See the R Graphics Cookbook if you’re really stuck!

gensex %>% 
  dplyr::group_by(gender) %>% 
  dplyr::summarise(
    mean_rom_freq = mean(romantic_freq),
    sd_rom_freq = sd(romantic_freq),
    n = dplyr::n(),
    se_rom_freq = sd_rom_freq/sqrt(n)
  )

# A tibble: 4 x 5
  gender mean_rom_freq sd_rom_freq     n se_rom_freq
  <chr>          <dbl>       <dbl> <int>       <dbl>
1 Female          5.95        2.09   250       0.132
2 Male            5.65        2.27    48       0.328
3 Other           5.8         1.64     5       0.735
4 <NA>            8           1        3       0.577

Task 6

Turn the summary you produced in the previous task into a nicely formatted HTML table, including the following elements:

Human-readable column names (NOT variable names)
Numbers rounded to two decimal places
An informative caption

Remember to load the kableExtra package, and see skills lab 2 for more help!

gensex %>% 
  dplyr::group_by(gender) %>% 
  dplyr::summarise(
    mean_rom_freq = mean(romantic_freq),
    sd_rom_freq = sd(romantic_freq),
    n = dplyr::n(),
    se_rom_freq = sd_rom_freq/sqrt(n)
  ) %>% 
  kableExtra::kbl(
    col.names = c("Gender", "*M*", "*SD*", "*N*", "*SE*"),
    digits = 2,
    caption = "Descriptives for ratings of frequency of romantic attraction"
  ) %>% 
  kableExtra::kable_styling()

Table 1: Descriptives for ratings of frequency of romantic attraction
Gender	M	SD	N	SE
Female	5.95	2.09	250	0.13
Male	5.65	2.27	48	0.33
Other	5.80	1.64	5	0.73
NA	8.00	1.00	3	0.58

Knit!

Task 7

Knit your worksheet once you’ve finished. You should see all of your code and output in the HTML document that it produces. This HTML will be saved in the same folder as the RMarkdown you knitted it from.

If you encounter a knitting error, ask for help in your practical!

Well done!

Make sure you have the RMarkdown or knitted HTML on hand when you take the worksheet quiz - you will need your answers to the above tasks.

Good luck!

Instructions

Academic Honesty

Contents

Setting Up

Task 1

Task 2

Task 3

Analyses

Task 4

Task 5

Task 6

Knit!

Task 7

Well done!