Tutorial 1: Worked Examples

Image credit: Sigmund

Introduction

This is a tutorial of common problems in Statistics alongside their suggested solutions. No particular level is the target. Thus, you may find any of the problems to be at any level (undergrad, master’s, or doctoral level). Questions may come from any topic in Statistics, rather than on a chapter-by-chapter basis. In most cases, these questions are not designed by me. You may find some of the questions may be coming from my lectures notes, online portals where solutions to the problems are not given, well-written Statistics books such as Robert V. Hogg et al., Sheldon Ross, or questions whose original source are not clearly known (anonymous). In cases where I am certain of the original source of the question, I would do my best to cite the source of it. If you feel a problem is not properly cited, please draw my attention to it.

Problem 1.

To study the effect of temperature on yield in a chemical process, five batches were produced at each of three temperature levels. The results results are given below.

i.) Construct an analysis of variance ANOVA table for this problem.

ii.) Use a 0.05 level of significance to test whether the temperature level has an effect on the mean yield of the process.

Source

multiple online sources

Table 1: Temperature level versus batch
Batch 50°C 60°C 70°C
1 34 30 23
2 24 31 28
3 36 34 28
4 39 23 30
5 32 27 31

Suggested solution

Clearly, this is a one-way ANOVA problem, and can be solved in many ways. I think it’s best, IMHO, to decompose the sum of squares into an array comprising treatment sum of squares and residual sum of squares.

xlj be an observation for the lth bath under treatment j. Thus, by decomposition:

xlj=x¯+x¯lx¯treatment+xljx¯lresidual;where

x¯..=34++32+30++27+23++31=30x¯1.=34++32=33x¯2.=30++27=29x¯3.=23++31=28

Decomposing sum of squares is, thus;

(342436393230313423272328283031)=(303030303030303030303030303030) +(333331111122222)+(193611256250023)

The sum of squares (SS) from the above are:

SSobs=SSmean+SStreatment+SSresidualSScor=SSobsSSmean

SSobs=342+242++302+312=13806SSmean=302+302++302=13500SScor=1380613500=306SStrt=32+32++(2)2+(2)2=70SSres=12+(9)2++22+32=236

i.

ANOVA table

Source DF Sum of squares Mean sum of squares F-value
Treatment 2 70 35
Error 12 236 19.667 1.78*
Total 14 306

ii.

Hypothesis formulation

H0:μ1=μ2=μ3(mean yield across temperature level)H1:μiμjfor allij Fcal=1.78Ftable=F[2,12,:1α=0.95]=3.89

Since Fcal<Ftable, we fail to reject H0 and conclude that there is exists no sufficient evidence to conclude that temperature level appears to have an effect on the mean yield of the process. [Insert Note here!]

Rather than decomposing the data into an array format, you could also use formulae to obtain the sum of squares, and subsequently, an ANOVA table.

Problem 2.

Compute the correlation coefficient for each of the following probability densities:

i.) f(x,y)={13(x+y)0x1;0y20elsewhere

ii.) f(x,y)={122(x+2y)(1,1),(1,3),(2,1),(2,3)0elsewhere

Source

Hogan Craig et al.

Suggested solution

Mathematically, correlation coefficient, ρ, is defined as ρX,Y=Cov(X,Y)σXσY. This implies that, to find it, we need to find the marginal densities, fX(x),fY(y), as well as E(X),E(Y),E(XY),var(X),var(Y).

i.)

f(x,y)={13(x+y)0x1;0y20elsewhere

The marginals are; fX(x)=02f(x,y)dy=0213(x+y)dy=13(xy+y22)|02=23(x+1)

fY(y)=01f(x,y)dx=0113(x+y)dx=13(x22+xy)|01=16(1+2y)

The expectation and variance of X: E(X)=01xf(x)dx=01x23(x+1)dx=59E(X2)=01x2f(x)dx=01x223(x+1)dx=718Var(X)=E(X2)(E(X))2=718(59)2=13162

The expectation and variance of Y: E(Y)=01yf(y)dy=01y16(x22+xy)dy=119E(Y2)=01y2f(y)dy=01y216(x22+xy)dy=169Var(Y)=E(Y2)(E(Y))2=169(119)2=2381

The covariance of X and Y: E(XY)=0102xyf(x,y)dydx=0102xy13(x+y)dydx=010213(x2y+xy2)dydx=0113(x2y22+xy33)!|02dx=0113(2x2+8x3)dx=23Cov(XY)=E(XY)E(X)E(Y)=2359119=181

Correlation coefficient: ρX,Y=Cov(X,Y)σXσY=1/81131622381=0.0818

ii.) f(x,y)={122(x+2y)(1,1),(1,3),(2,1),(2,3)0elsewhere

The bivariate joint distribution is:

X/Y 1 3
1 3/22 7/22
2 2/11 4/11

The marginals are; fX(x)={511if=1611ifx=20elsewhere

fY(y)={722ify=11522ify=30elsewhere

The expectation and variance of X: E(X)=xf(x)=(1)511+(2)611=1711E(X2)=x2f(x)=(12)511+(22)611=2911Var(X)=E(X2)(E(X))2=2911289121=30121

The expectation and variance of Y: E(Y)=yf(y)=(1)722+(3)1522=5222E(Y2)=y2f(y)=(12)511+(92)1522=14222Var(Y)=E(Y2)(E(Y))2=14222676121=105121

The covariance of X and Y: E(X,Y)=xyf(x,y)=(1)(1)322+(1)(3)722+(2)(1)211+(2)(3)411=4011Cov(X,Y)=E(XY)E(X)E(Y)=401152221711=2121

Correlation coefficient: ρX,Y=Cov(X,Y)σXσY=2/12130121105121=0.0356

Problem 3.

A construction company wins two road rehabilitation projects, RH1 and RH2 which are to be executed simultaneously. The marginal distribution of each of the projects is given below.

The marginal distributions therefore are: fRH1(t1)={0.30;ift1=240.60;ift1=300.10;ift1=360;elsewhere

fRH2(t2)={0.30;ift2=180.50;ift2=240.20;ift2=300;elsewhere

Assuming that the completion times of the projects are independent, find the probability that:

i,) the two projects will be completed at the same time,

ii.) both projects will be completed in less than 30 months, and

iii.) project RH1 takes longer time to complete than project RH2.

iv.) Find the expected completion time for each project and interpret your results.

Source

unknown

Suggested solution

The joint distribution table is shown below.

RH1/RH2 18 24 30
24 0.09 0.15 0.06
30 0.18 0.30 0.12
36 0.03 0.05 0.02

i.) P(t1=t2)=P(t1=30,t2=30)+P(t1=24,t2=24)=0.15+0.12=0.270

ii.) P(t1<30,t2<30)=P(t1=24,t2=18)+P(t1=24,t2=24)=0.09+0.15=0.240

iii.) P(t1>t2)=P(t1=24,t2=18)+P(t1=30,t2=18)+P(t1=30,t2=24)+P(t1=36,t1=18)+P(t1=36,t2=24)+P(t1=36,t2=30)=0.09+0.18+0.30+0.03+0.05+0.02=0.670

iv.) E(X)=xf(x)E(RH1)=t1fT1(t1)=24(0.30)+30(0.60)+36(0.10)=28.8E(RH2)=t2fT1(t2)=18(0.30)+24(0.50)+30(0.20)=23.40

When ever the company wins a rehabilitation project categorized as RH1, the expected time to completion of the project is about 28.8 months. In the case of a project categorized as RH2, the time to completion is approximately 23.4months.

Did you find this post helpful? Consider sharing it😊😊😊

Dr. Abubakari S. Sumaila
Dr. Abubakari S. Sumaila
Former Research Assistant

My research interests include the applications of survival analysis in Medicine, sequential decision processes, dynamics of visualizations in R and Python.