Assignment 4

Please note that due dates can be found in the Syllabus; submission instructions can be found on the Assignment Instructions page. In this assignment, you can submit a Google Doc (or other text editor, pictures, etc.) but also your \({\tt R}\) code via Google Drive. Aki will go over the submission process in the lab.

You might consider (but it is not mandatory) using R Markdown to write your answers.

\({\bf 50}\) total marks.

Question 1 [points 10] Using the S. cerevisiae (Baker’s yeast) data that we imported into R in Lectures 11 and 12, show R code of how you would estimate the frequency of A, C, G, T nucleotides in coding regions only. Use only chromosome 1.

Question 2 [points 10] Using the S. cerevisiae (Baker’s yeast) data that we imported into R in Lectures 11 and 12, show R code of how you would estimate the frequency of A, C, G, T nucleotides in non-coding regions only. Use only chromosome 1. Estimate the self-transition probabilties (coding to coding, non-coding to non-coding) to transition probabilities to and from coding and non-coding.

Question 3 [points 20] Using the \({\tt HMM}\) package in R (available in {\(\tt A04-Assignment/src/A04-Assginment.Rmd}\)), implement your model. The documentation for this package is here. Note that you might want to look at the \({\tt dishonestCasino()}\) function as an example. Perhaps follow the \({\tt viterbi}\) function and the example there. Show your code. Apply it back to chromosome 1. Apply it chromosome 2 too.

Questiom 4 [points 10] Compute the specificity, sensitivity and accuracy on both chromosomes individually. Comment on your findings.

Good luck!