What is the difference between the average age of death due to breast cancer for women within each of the following races WHITE , BLACK OR AFRICAN AMERICAN, ASIAN, and AMERICAN INDIAN OR ALASKA NATIVE?
Express this difference in years (or fraction of a year) and the resultant tibble should be sorted into ascending order by these average ages.
# Load the data and packages
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2 v purrr 0.3.4
v tibble 3.0.1 v dplyr 1.0.0
v tidyr 1.1.0 v stringr 1.4.0
v readr 1.3.1 v forcats 0.5.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
load("/Users/Aki/Desktop/small_brca.Rdata")
small_brca %>%
select(participant, gender, race, age_at_diagnosis, death_days_to) %>%
filter(gender == "FEMALE" &
# Select for samples from dead patients
death_days_to != "[Not Applicable]" &
# Remove outliers
race != "[Not Evaluated]" &
race != "[Not Available]") %>%
# Filter by unique patients
distinct(participant, .keep_all = TRUE) %>%
# Add new column with age of death
# The floor function rounds down. You don't need to do this
mutate(age_of_death = as.numeric(age_at_diagnosis) +
floor(as.numeric(death_days_to)/365)) %>%
# Group by race and summarise for mean
group_by(race) %>%
summarise(mean.age.of.death = mean(age_of_death), .groups = 'drop') %>%
# Sorted into ascending order by mean.age.of.death
arrange(mean.age.of.death)
race <chr> | mean.age.of.death <dbl> | |||
---|---|---|---|---|
ASIAN | 56.00000 | |||
BLACK OR AFRICAN AMERICAN | 60.78947 | |||
WHITE | 65.11392 |
Notice how AMERICAN INDIAN OR ALAKSA NATIVE is no longer in the result. This is because there was only one patient in that category and she is alive (death_days_to == “[Not Applicable]”), therefore, she was filtered out.
( which(small_brca$race == "AMERICAN INDIAN OR ALASKA NATIVE") )
[1] 660
c(small_brca[660, ]$vital_status, small_brca[660, ]$death_days_to)
[1] "Alive" "[Not Applicable]"