Tutorial 3: Worked Examples

Image credit: Mick Haupt

Introduction

This is the third tutorial of a series of common problems in Statistics, alongside some suggested solutions. This particular tutorial primarily focuses on problems (deemed either basic or medium) for undergraduate Statistics (or First years master’s education). You suggestions are are appreciated, and highly welcomed.

Problem 1 (Basic)

1.(a)

Suppose the length of a square is a random variable uniformly distributed on [0,1]. If X is the length of the square, calculate the expected area of the square.

Sol.

Given XU(0,1). Define g(x) as the area the square such that; Area=g(x)=X2E(g(x))=01g(x)fX(x)dxE(X2)=01x2dx=13x3|01=13sq. units

1.(b)

Let X1,X2,X3, be independent random variables. Suppose the distribution of each Xi is Poisson with parameter λii=1,2,n. Using moment generating function, show that Y=i=1nXi has Poisson distribution with parameter i=1nλi.

Sol.

If XiPoi(λi), then its moment generating function, m.g.f, is MX(t)=e[λi(et1)].

Since the variables are independent, and for Y=i=1nXi, we have; MY=MX1(t)MX2(t)MXn(t)=e[λ1(et1)]e[λ2(et1)]e[λn(et1)]=e[(λ1+λ2++λn)(et1)]MXi(t)=e[i=1nλi(et1)] i=1nXiPoi(inλi)

1.(c)

Let X be a random variable which follows a Gamma distribution defined as; f(x;n,β)=1(n1)!βnxn1exβ0<x<,β>0,n>0

Find the probability density function for g(x)=1x.

Sol.

Given a density function, one way to solve this is to use the method of transformation.

Let y=g(x)=1x. The density of g(y) can be computed as; g(y)=|dxdy|f(W(y)) where W(y) is the inverse function of g(x).

y=1xx=1y|dxdy|=|1y2|=1y2

g(x)=1y2×1(n1)!βn(1y)n1e1yβ=1(n1)!βn(1y)(n+2)1e1yβg(1x)Gamma(n+2,β)

Problem 2 (Medium)

2.(a)

In a certain population, it is believed that 1.5% of the population has disease X. Assuming health providers in that community decide to embark on a free screening exercise for the said disease. And it is known that, for a person who has the disease, the test has an accuracy of 97% for a positive test result. Also, for a person who does not have the disease, the test has an accuracy of 95% for a negative test result.

i.) What is the probability that a test result returns positive?

ii.) Assuming you presented yourself for testing and your test result came out positive for the disease. what is the probability that you actually have the disease?

Sol.

This is clearly a Bayesian problem. Define D as an event that a subject has the disease; T+ be event that a subject’s test result returns positive, and the event that a subject’s test result T returns negative. Thus far, we proceed with the following pieces of information;

P(D)=0.015(prevalence)P(T+|D)=0.97(sensitivity)P(T|D)=0.95(specificity)

i.)

We partition T into T+ and T. Thus,

P(T+|D)=P(D)P(T+|D)+P(D)P(T+|D)=0.015×0.97+(10.015)×(10.95)=0.905975

ii.)

We’re interested in P(D|T+), the probability you have the disease given you tested positive to the disease. Using the Bayesian formulation, we proceed as follows:

P(D|T+)=P(D)P(T+|D)P(D)P(T+|D)+P(D)P(T+|D)=0.015×0.970.905975=0.0161(1.61%)

2.(b)

40% of the products produced in a certain factory are produced by machine A. Machine B produces 10% of the products whiles machine C produces 50% of the products. Of these products produced, defective ones produced by these 3 machines are 2%,3% and 4% respectively. One of the products in the factory is selected at random.

i.) Find the probability that this product is defective.

ii.) If the product is defective, find the probability it is coming from:

α. Machine A

β. Machine B

γ. Machine C

Sol.

P(A)=0.40,P(B)=0.10,P(C)=0.50

Let D be the event that a defective product is produced

P(D|A)=0.02
P(D|B)=0.03
P(D|C)=0.04

i.)

P(D)=P(A)P(D|A)+P(B)P(D|B)+P(C)P(D|C)=0.40(0.02)+0.10(0.03)+0.50(0.04)=0.031

ii.)

α. P(A|D)=P(A)P(D|A)P(D)=0.40×0.020.031=831

β. P(B|D)=P(B)P(D|A)P(D)=0.10×0.030.031=331

γ. P(C|D)=P(C)P(D|A)P(D)=0.50×0.040.031=2031

2.(c)

X1,X2 are independent variables and Y is the dependent variable. A sample of 10 units is drawn as shown in the table below.

X.1 X.2 y
5 1 3
5 1 4
6 3 5
6 4 5
7 5 5
7 6 6
7 6 7
8 5 7
9 3 8
10 6 10

Find the equation of regression, y^=β0+β1X1+βX2+ϵ

Sol.

Using Matrix approach, Let y^=Xβ+ϵ such that;

β=(XX)1Xy

XX=(1115610116)(1511611106)=(1070407051429840298194)

(XX)1=(2.21790.33740.06100.33740.06910.03660.06100.03660.0488)

Xy=(60449264)

β^=(2.21790.33740.06100.33740.06910.03660.06100.03660.0488)(60449264)=(2.31461.11950.1098)y^=2.3146+1.1195X1+0.1098X2

Go a little further and test H0:β1=0vsH1:β10 at α=0.05.

Hint: Use t=β1^β1SE(β1^) and reject H0 if |t|>tα,(np)

Problem 3 (Basic-medium)

3.(a)

Suppose X is a random variable with its density defined such that;

f(x)={25|x|,1<x<20,otherwise

i. Evaluate xf(x)dx

ii. Find the variance of X,Var(X).

Sol.

i.

xf(x)dx=102x|x|5dx+022x|x|5dxE(X)=2510x2dx+2502x2dx=215x3|10+215x3|02=215+1615=1415.

ii.

Var(X)=E(X2)(E(X))2=102x2|x|5dx+022x2|x|5dx(1415)2=110x4|10+110x4|02(1415)2=110+1610(1415)2=1710196225=95102.

3.(b)

A random variable, Y, is uniformly distributed over the interval (-1, 8), i.e. YU(1,8).

Find the probability that the equation 2x2+4Yx+3Y+2=0 has real roots.

Sol.

YU(1,8)={19,1<x<80,otherwise

For the equation 2x2+4Yx+3Y+2=0 to have real roots, the discriminant must be non-negative, thus;

b24ac0(4Y)24(2)(3Y+2)016Y224Y+1602Y28Y+20Y2orY12

Using the probability density function above, and the idea of mutually exclusive events, we have;

P(Y12Y2)=P(Y12)+P(Y2)=112fY(y)dy+28fY(y)dy=19(12+6)=1318.

3.(c)

A random variable X has moment generating function MX(t)=0.50.5t,t>12 Using this piece of information;

i. P(X>In4)
ii. E(X)
iii. Var(X).

Sol.

Note that the moment generating function for the Exponential distribution is given as MX(t)=λλt,t<λ. Comparing this with the above function, we may infer that;

f(x)={12e12λt,t<λ0,otherwise

i.

The cumulative density function for the Exponential distribution is thus; 1F(x)=eλt,t<λ P(X>In4)=e12In4=12.

ii.

E(X)=ddxMX(t)|t=0=0.5(0.5t)2|t=0=2E(X2)=d2dx2MX(t)|t=0=2(0.5)(0.5t)3|t=0=8Var(X)=E(X2)[E(X)]2=822=4Var(X)=4.

Did you find this post helpful? Any suggestions? Consider sharing it😊😊😊

Abubakari Sumaila Salpawuni
Abubakari Sumaila Salpawuni
PhD candidate (Biostatistics)

My research interests include the applications of survival analysis in Medicine, sequential decision processes, dynamics of visualizations in R and Python.

Related