Question 0 – Hello World!We’ve included a little “Hello

World!” example. There will be an accompanying video on Moodle (the link

will be posted on the forum as an announcement) which you can follow to

learn how to use this Jupyter Notebook.

Even if you’ve used

Jupyter before, it’s highly recommended that you watch the video and go

through this introductory exercise, because this assignment uses

auto-marking and requires you to follow a certain (straightforward)

convention.

Question 0.a – Saying Hello in MarkdownFor

this assignment, you will need to answer the questions in the “cells”

below each question. Some questions will require written work (which you

can either do on paper and scan, or do in the cell using Markdown), and

some will require R code (which you must do using R code in the

provided cells).

Watch through the video to see how this should be done.Question 0.b – Saying Hello in RYou

will need to write some R code in this assignment. In this section,

save a variable called “hello.world” (don’t include the quotes in the

variable name) and set it to the value “Hello World!”. Then run the cell

below the one you wrote your code in to verify that your answers have

been registered and given the correct variable names.

Like before, follow the video tutorial to see how this is done.Question 1 – ProbabilitiesSuppose

we are playing a simple collectable card game (e.g. like Hearthstone,

or Magic the Gathering). In this game, each player has a card deck which

contains 30 cards (with no duplicate cards). At the start of this game,

both players shuffle their decks. Then the player going first draws

five cards, and the player going second draws six cards. After this, the

game starts, and players alternate turns.

Each player draws an

additional card at the start of their turn. So, for example, after their

third turn player one should have drawn eight cards in total (the 5

cards they started with, plus another three cards over three turns).

Player two should have drawn nine cards in total after their third turn.

For

the following questions, suppose there is a special combination of five

cards, and if you have those five cards in your hand you instantly win

the game.

A bit of helpAs a little hint for some of the

questions below, you’re reminded that if you have some product n Ã— (n

âˆ’ 1)Ã— (n âˆ’ 2)Ã— …Ã— (n âˆ’ k), we can express this as n!/(n âˆ’ k

âˆ’ 1)!. That is,

n(n âˆ’ 1)(n âˆ’ 2)(n âˆ’ 3)…(n âˆ’ k) = ( 1)!This is because we can think of the product n Ã— (n âˆ’ 1)Ã— (n âˆ’ 2)Ã— …Ã— (n âˆ’ k) to be , which isn Ã— (n 1)Ã— (n 2)Ã— …Ã— 2 Ã— 1, but with the last (n-k-1) parts removed. Because this is a multiplication ofâˆ’ âˆ’terms, we can think of removing terms as the same as dividing by them, meaning that

n Ã— (n âˆ’ 1)Ã— (n âˆ’ 2)Ã— …Ã— 2 Ã— 1 n Ã— (n âˆ’ 1)Ã— (n âˆ’ 2)Ã— …Ã— (n âˆ’ k) = ( 1)( 2)…(2)(1) = ( 1)!

Question 1.aWhat

is the probability that the first player will draw this combination on

their first turn and win the game immediately? What about the second

player?

Question 1.bWhat is the probability that the

five cards required for victory are all at the bottom of a player’s deck

(i.e. they are the last five cards in their deck)?

Question 1.cSuppose

a player has drawn 15 cards from their deck. What is the probability

that all of the cards in the winning combination are still in their

deck?

Question 1.dSuppose a player has drawn cards

from their deck, where is between 0 and 30. What is the probability that

all of the cards in the winning combination is still in their deck

(i.e. that they have not drawn any piece of the winning combination yet)

in terms of ?

Question 2 – PDFs and ExpectationsSuppose we have defined a probability density function for a random variable as follows:

2 0 â‰¤ x â‰¤ Î± p(x) = Notice that our PDF has two constants, and . is a parameter, and is a coefficient which we will carefullychoose so the integral of 2 between and (with respect to ) is equal to .

Question 2.aSuppose . Find the value of which would cause the integral of p(x) from 0 to with respect to x to beequal to . That is, find such that 1 c 2 dx = 1 0

Question 2.bFind

the value of for a general value of That is, find such that (you can do

this in a way similar to how you answered question 2.a).

Î±c 2 dx = 1Question 2.cSuppose = 3 and = 1. Find E(X ), the expected value of our variable .

Question 2.dSuppose = 3 and = 1. Find Var(X ), the variance of our variable .

Question 3 – DistributionsSuppose we are given the following information:You

are modelling the number of people visiting a particular doctor’s

office within a day, with the hope of identifying a disease outbreak in

the local area of the doctor

It is known that, on an average day, 30 patients will see this doctor, with a standard deviation of 3 patients per day

Question 3.aDescribe

a model you might use to model the number of patients on a given day

(there might be more than one choice, so pick one and justify it). Also

give the parameters of this model based on the given information.

Question 3.bOn

one particular day, 45 patients visit the doctor. Considering the model

you developed in your answer to the previous question, do you think

that this number of patients in a given day is cause for alarm? Use

calculations to back up your answer by determining the probability of

seeing 45 or more patients in a given day.

Question 4 – Maximum Likelihood Estimation of ParametersSuppose

we are developing a new plant treatment which will (hopefully) improve

crop yields. We have a dataset which contains weights for two candidate

treatments, as well as a control group (which receives neither of the

candidate treatments).

Question 4.aSuppose we want to

create models for the weight of each group. You think a normal

distribution would be suitable for this purpose, but a colleague has

suggested that you should use a binomial distribution instead. Someone

else proposed using a uniform distribution instead.

For both the binomial and uniform distributions, explain whether they would be a good choice (justifying your answer).

Also justify why using the normal distribution is a good choice here.

Question 4.bSuppose,

rather than modelling the weights directly, we instead want to model

the probability that a plant will grow to weigh over 6 units of weight,

for each of the three treatments we are testing (treatments 1 and 2, and

the control). Suggest a model that would be suitable for this purpose,

and justify your choice.

Question 4.cAfter considering

our answers to questions 4.a and 4.b, we have decided to model the

weights directly (i.e. we will use the model discussed in question 4.a,

not 4.b). To do this, we will create three models: one for each of the

three groups. We will use normal distributions to model each of the

three groups, and then compare the estimated means of each group.

We

now have to decide how we will calculate our estimates of the mean (Î¼)

and standard deviation ( ) of each of our datasets. One approach is to

use the maximum likelihood method, where we wish to maximize the

likelihood of the data given the parameters Î¼ and (that is, we wish to

find the values of Î¼ and which cause P(x Î¼,Ïƒ) to be maximized). Note

that maximizing something is the same as maximizing the log of that

thing, because log (for any base) is “monotonically increasing”- that

is, if , log(a) > log(b). We’re actually going to maximize the

log-likelihood below.

A colleague of yours seems to think

that maximizing the log-likelihood is the same as minimizing the mean

absolute error. Another colleague disagrees, saying that they are

misremembering and the likelihood is the same as minimizing the mean

squared error. Yet another colleague seems to believe that we minimize

the negative log-likelihood by minimizing the log-cosh loss (since they

both have the word “log” in them; you are not convinced by this

argument).

Question 4.dOne of your colleagues in

Question 4.c is correct; which one does it appear to be based on our

calculations in the previous question? Prove this colleague correct

using algebra (you only have to prove them correct; you don’t have to

disprove the other two).Question 4.eGiven your

maximum likelihood estimates for the mean of each population (and

keeping in mind that we have a very small number of samples for each

group), which treatment appears to work best?

Question 5 – Central Limit TheoremSuppose

our company is trialling a new production method for phone cases, based

on 3D printing. 3D printing can a volatile process, and the company has

decided to accept the fact that there will be a certain proportion of

failures out of the total number of 3D prints.

However, before

committing to the new process, management would like to estimate the

probability of failure by printing a number of phone cases. They have

asked you how many cases they should print to ensure they have a

reasonably good idea of the probability of failure.

The engineers

developing the new production method assure management that the

probability of failure is somewhere between 1% and 20%, but they are

unwilling to make any guarantees beyond this without testing the method

first.

Question 5.aWe will model this problem with a

binomial distribution. Justify why the binomial distribution is a good

choice for this problem.

Question 5.bSuppose that we are considering three potential failure probabilities:

We

also are considering three potential sizes for our test production run

(i.e. the number of phone cases we will print in our test run):

For

each combination of failure probability and number of cases printed,

calculate the limiting distribution for the sample mean. You should

calculate 9 limiting distributions in total

For this question, do this using written calculations (i.e. not using R) and with the Central Limit Theorem.

Question 5.cVerify the results you obtained by hand in the previous question using R code.

Question 5.dFor

each of the sample sizes and potential failure probabilities listed

above, we now know the theoretical distribution by the Central Limit

Theorem (we calculated this in Questions 6.b and 6.c). However,

management is still not convinced and have asked us to develop a

simulation which will experimentally demonstrate our calculations were

correct.

R has a built-in function called rbinom, which takes

three arguments (the number of simulations you want to run, the number

of trials per simulation, and the probability of success for each

trial). Hint: you are allowed to use the rbinom function, although you

don’t have to.

Question 5.eWe’re presenting our

findings to management; they have asked us to provide visualisations for

our results. For each failure probability discussed above (0.01, 0.05

and 0.2) and for each potential sample size discussed above (50, 200,

and 800), produce a histogram plot of the maximum likelihood estimates

of the failure probability (calculated 50,000 times through 50,000

simulations).

Question 5.fManagement has asked us

recommend how many tests they should run. Based on all the information

we have computed, do you recommend 50, 200, 800, or even more tests than

that? Justify your answer using relevant calculations and/or by

referring to the above plots.

Let’s block ads! (Why?)

Send us your paper details now

We’ll find the best professional writer for you!

This template supports the sidebar's widgets. Add one or use Full Width layout.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More

Privacy & Cookies Policy

error: Content is protected !!