Assignment 1

Please note that due dates can be found in the Syllabus; submission instructions can be found on the Assignment Instructions page. In this assignment, you can submit a Google Doc (or other text editor, pictures, etc.) but also your R code via Google Drive. Aki will go over the submission process in the lab.

50 total marks.

Question 1 [7 points] You obtained a gmail account for this course. With this account, you can also initiate your Google Drive workspace. (i) Does Google have the right to look at your emails, documents, drawings or other objects? (ii) Does Google have the right to use this information? What is meta-data? Can it collect meta-data about your information? (iii) How can you turn on or off the geographical tracking mechanism associated with your account? (iv) Are you satisfied with the level of security and user agreement with Google? (Answer in ~10 sentences please.)

Question 2 [3 points] Does cloud computing reduce the overall cost for computers and computation for the “average” person? If so, how? Make reference to the definitions and components of cloud and traditional computing from the lecture notes. 4 or 5 sentences.

Question 3 [3 points] Give an example of how concepts from computation drifted into the life sciences? 3 or 4 sentences.

Question 4 [3 points] Argue for and against each of the following items as a computing device (make references to the 3 fundamental properties of modern computers).

  1. compass
  2. a car
  3. an assembly line to make cars at the Ford Inc. factory

Question 5 [4 points] What privacy issues might arise in -omic studies? Are the issues of privacy more or less acute in a transcriptome-based study (as we saw in class) than say a genomic DNA-based study (sequencing the genome of individuals)? Remember that both studies are based on sequencing RNA and DNA of samples

Question 6 [4 points] Create a R script under File/NewFile. Write R code to load the tidyverse library and the small_brca dataset. Note that in the course slides, I load the dataset from my directory specific to my computer. However, if you look in the R code in the src on RStudio Cloud (Project 03), you will find the correct path for you.

Make a comment that this is Question 6, Assignment 1 before your code. Find the function in R that reports the date and the version of R that you are using. Put the code in your file.

Save your R code in your src directory of the project and name the file lastname_assignment1.R. Take a screenshot with your file open (top left), the Environment list showing (top right), the code executed in your R session (bottom left), and the contents of the src folder (bottom right). Congratulations, you are now an R programmer.

For Questions 7-10 below. Put a comment in your file that states what question you are working on and put your code below it. For any pictures (eg the plot that your code generates) and text, put them into a text document (eg using Google Docs) stating what question you are working on.

Question 7 [5 points]

Recall from the lecture that HER2 is an important protein in some subtypes of breast cancer, and remmeber that ERBB2 is the official name for HER2.

The variable ERBB2 in our tibble corresponds to estimates of the number of transcripts present in each sample (row). This is obtained using RNA-seq technologies as discussed in the lecture. Clinically, HER2 is not measured using transcriptomics. Typically the copy number of HER2 is measured at at the DNA level. This is because we believe that HER2 over-expression at the transcript and protein levels are due to a amplification of the genomic region tha contains HER2. In the clinic, Fluorescence In Situ Hybridization (FISH) is used.

The variable her2_fish_status gives exactly this, although it is not available for many observations (rows/patient samples).

Using ggplot, making the following scatter plot. Make a comment in your file from Question 6 and make a comment that this is Question 7. Put your R code there.

Comment or interpret the graph in 1 or 2 sentences: does it make sense? is it what you expected? are there issues? etc.

Question 8 [5 points]

It is a little bit hard to see the status of tumor and her2_fish_status in the figure of Question 7 because so many points are bunched up around the origin of the graph. Using online resources, find out how to draw a pie chart for each of these variables. Does this add any insight to the plot from Question 7?

Question 9 [5 points] Similar to Question 7, the variable ESR1 corresponds to transcript levels of the estrogen receptor. In the clinic, the estrogen receptor protein is measured by ImmunoHistoChemistry.

The variable er_status_by_ihc has this value. Create a scatterplot of ESR1 transcript versus ERBB2 transcript, but use colors, shape, size or other options to also display both the er_status_by_ihc and her2_fish_status status. Interpret this graph in 1 or 2 sentences.

Question 10 [5 points] Create a barplot (like Slide 28 of L03) that explores the relationship between ajcc_pathologic_tumor_stage and er_status_by_ihc. Comment (interpret) in 1 or 2 sentences on the graph.

Question 11 [2 points] Suppose I create five logical variables; the first three represent each of my children and the last two represent my wife and I. The kids variables are set to TRUE if and only they had vegetables at lunch. My wife and I are set to TRUE if and only if we are exhausted. So in the example below, my wife is exhausted and I’m ok; two of our kids had vegetables.

c1 <- TRUE; c2 <- FALSE; c3 <- TRUE; wife <- TRUE; me <- FALSE

I would like you to write some R code to evaluate whether we will order pizza or not. The condition is that at least one of my children ate vegetables at lunch.

To give you an example, suppose I don’t care if my kids had vegetables or not; we order pizza if both my wife and I are exhausted. My logical expression would be:

(wife & me)
## [1] FALSE

Question 12 [4 points] Building on Question 11, write a logical expression using AND &, OR| and NOT! operators on these five variables. Here the logical expression should be TRUE only when all of my children had vegetables at lunch and if both my wife and I are exhausted. Otherwise it is FALSE and we make salad.

Good luck!