**Assignment n.9 (due January 19)**

**NEW!**

Let's try Moodle, the worldwide spread learning platform now available to Unicas students as well.

Please, sign and login yourself on

http://webuser3.unicas.it/dipeg_didattica_innovativa/moodle/login/index.php

By selecting "Global Economy and Business", and then "Applied Statistics", you should be able to see the assignments for the next week. In case of troubles, please write me an e-mail.

Monday 16 lecture cancelled. Extra class on **Friday 20 January** at 9:00.

**Assignment n.8 (due November 21)**

*Readings*

Sections 1.6; 3.3, 3.4, and 3.5 from W.

*Problems*

Come prepared to the next class. You will be required to discuss one of the problems assigned at home during the term. You choose the problem. You are allowed to bring your laptop and/or a file with your R scripts. Ten minutes at max each student. Partial solutions are admitted. Students that do not discuss any problem in class are not allowed to sit at the midterm exam on December 20.

Do problems 1.5 Water runoff in the Sierras, 1.6 Professor ratings, 2.16 United Nations data, 3.3 Berkeley Guidance Study, 3.6 Water runoff again.

**Assignment n. 7 (due November 14)**

*Readings*

Read the Section "Testing" in Hand (2008), pages 85-89.

Chapter 2, Section 6, from Weisberg (2014).

*Problems*

1. As an established scholar, you are requested to evaluate if Customer Relationship Management affects the financial performance of firms. Your plan is to carry on a study using some available data. Then, the main issue will be solved by means of a test of hypothesis. Two hypothesis will be tested one against the other: CRM is related to performance, CRM is not related. Please, state wich one will be chosen as null and which as alternative in your report. Motivate your answer.

2. You are the owner of a firm that produces small cookies. Your consultant suggested you a new package to increase your sales. You are not really convinced, and thus you decided to make a simple experiment. Cookies in the new package will be positioned in a big store. After two weeks, the variation in number of packages sold in this latter will be compared with respect to the variation obtained in a different store where cookies are still sold in the old package. Then, differences in variation will be tested. The point is to decide upon which is the null and which the alternative hypothesis between the following. The new package is effective, it is not.

3. Do problem 2.1.1 in the W textbook. In addition, for the same dataset:

a) Obtain the estimated standard errors of beta_hat_0 and beta_hat_1.

b) Test the hypotheses that beta_hat_0 and beta_hat_1 are equal to zero at the 5% level.

c) Discuss if these hypotheses must be rejected or not.

d) Discuss if your conclusion implies there is a relationship between Height and Weight at the population level.

e) Find a mistake on how this exercise is presented.

4. Solve 2.5 and 2.6 from Weisberg.

*Optional problem* (quite difficult, just for those of you that may like that !):

5. Let Y be a Bernoulli random variable with parameter p. You want to test the following hypotheses system:

H0 : p = 0.4 vs H1 : p = 0.6.

Consider the test with rejection rule given by:

L(0.4)/L(0.6) < k

where L(.) is the appropriate likelihood function. Assume a sample of size n=5 is drawn.

a. Compute the probability of the first type error for k=1.

b. Find the value of k such that the probability of the first type error is equal to 0.01.

c. (even more difficult) For such value of k, compute the probability of the second type error.

**Assignment n. 6 (due November 7)**

*Readings*

(just to justify the importance of problem 2.14): Why every statistician should know about cross-validation

From:

http://robjhyndman.com/hyndsight/crossvalidation/

*Problems*

1. Come prepared for the next two classes. Do your best to get a feeling on what hypothesis testing is about (use books, videos, your friend, whatever ...).

2. Do problems you have not done in the past weeks.

**Assignment n. 5 (due Thursday November 3)**

*Readings*

Chapter 2, Sections 7, and 8; Chapter 3, Sections 1, and 2 from Weisberg's 2014 textbook (4th Edition).

*Problems*

Do problems 2.9 Invariance (comparison between the t-tests excluded), 2.14 Average prediction error, 2.21 Windmills (quite difficult, but very nice - just try!), 3.1, 3.2.1, 3.2.4, 3.2.5.

**Assignment n. 4 (due October 24)**

*Readings*

Chapter 2, Sections 4, 5, and 6 from Weisberg's 2014 textbook (4th Edition).

*Problems*

Do problems 2.7 More with Forbes’s data, 2.8 Deviations from the mean,

2.13 Heights of mothers and daughters, 2.15 Smallmouth bass, 2.20 Old Faithful (2.20.3 optional).

**Assignment n. 3 (due October 17)**

*Readings*

Chapter 1, Sections 3, 4 and 5; Chapter 2, Section 1, 2, and 3, from Weisberg's 2014 textbook (4th Edition).

*Problems*

1. Come prepared for the next two classes. Do your best to get a feeling on what a sampling distribution and a confidence interval is (use books, videos, your friend, whatever ...).

2. Do problems 1.2, 2.2, 2.3 (from Weisberg 2014).

**Assignment n. 2 (due October 10)**

*Readings*

Chapter 1, Sections 1 and 2, from Weisberg's 2014 textbook (4th Edition).

*Problems*

1. Do problems 1.1, 1.3, 1.4 (from Weisberg 2014). To obtain the datasets, the R package "alr4" is needed. Alternatively, in case you do not possess the 2014 edition, you may do problems 1.2, 1.3, 1.4 from Weisberg 2005 edition. To obtain the datasets, the R package "alr3" is needed.

**Assignment n. 1 (due October 3)**

*Readings*

To have an idea of possible applications of statistics, please read the Section "Examples" starting on page 13 from Hand (2008).

To get an introduction to maximum likelihood estimates, read Point Estimation - pages 76-78 from Hand.

*Problems*

1. Let us assume that a random sample of size n=6 has been drawn from a certain population of students. They have been asked if they have ever worked with people from other countries.

The sample {Yes, No, Yes, Yes, Yes, No} has been observed. Let p be the probability of a single student answering "Yes". That is, let p=P("Yes").

Compute:

a) the probability of observing such a sample if p=0.2

b) the probability of observing such a sample if p=0.3

c) the probability of observing such a sample if p=0.8

d) the probability of observing such a sample if p=0.9

Write the likelihood function.

Provide the value of p that maximizes the function.

2. Let us now assume that the sample {Yes, No, No, Yes, No, No} has been observed.

Compute:

a) the probability of observing such a sample if p=0.2

b) the probability of observing such a sample if p=0.3

c) the probability of observing such a sample if p=0.8

d) the probability of observing such a sample if p=0.9

Write the likelihood function.

Provide the value of p that maximizes the function.

3. Using the software R, draw the likelihood function of Problem 1. Just download the software, open it and write the command line:

curve(x^4 * (1-x)^2, xlim=c(0, 1))

Then, draw a series of plots by substituting the values "4" and "2" with other numbers of your choice within the formula. Observe what you obtain, and write a small sentence about your findings (you do not need to report the plots within your solution).

4. Explain in your own words what the likelihood function represents and why the value that maximizes it is of some interest.