What is the difference between the average age of death due to breast cancer for women within each of the following races WHITE , BLACK OR AFRICAN AMERICAN, ASIAN, and AMERICAN INDIAN OR ALASKA NATIVE?
Express this difference in years (or fraction of a year) and the resultant tibble should be sorted into ascending order by these average ages.
# Load the data and packages
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2 v purrr 0.3.4
v tibble 3.0.1 v dplyr 1.0.0
v tidyr 1.1.0 v stringr 1.4.0
v readr 1.3.1 v forcats 0.5.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
load("/Users/Aki/Desktop/small_brca.Rdata")
small_brca %>%
select(participant, gender, race, age_at_diagnosis, death_days_to) %>%
filter(gender == "FEMALE" &
# Select for samples from dead patients
death_days_to != "[Not Applicable]" &
# Remove outliers
race != "[Not Evaluated]" &
race != "[Not Available]") %>%
# Filter by unique patients
distinct(participant, .keep_all = TRUE) %>%
# Add new column with age of death
# The floor function rounds down. You don't need to do this
mutate(age_of_death = as.numeric(age_at_diagnosis) +
floor(as.numeric(death_days_to)/365)) %>%
# Group by race and summarise for mean
group_by(race) %>%
summarise(mean.age.of.death = mean(age_of_death), .groups = 'drop') %>%
# Sorted into ascending order by mean.age.of.death
arrange(mean.age.of.death)
Notice how AMERICAN INDIAN OR ALAKSA NATIVE is no longer in the result. This is because there was only one patient in that category and she is alive (death_days_to == “[Not Applicable]”), therefore, she was filtered out.
( which(small_brca$race == "AMERICAN INDIAN OR ALASKA NATIVE") )
[1] 660
c(small_brca[660, ]$vital_status, small_brca[660, ]$death_days_to)
[1] "Alive" "[Not Applicable]"