Tutorial 1: Worked Examples

Last updated on Sep 8, 2021 Question and Answers

Image credit: Sigmund

Introduction

This is a tutorial of common problems in Statistics alongside their suggested solutions. No particular level is the target. Thus, you may find any of the problems to be at any level (undergrad, master’s, or doctoral level). Questions may come from any topic in Statistics, rather than on a chapter-by-chapter basis. In most cases, these questions are not designed by me. You may find some of the questions may be coming from my lectures notes, online portals where solutions to the problems are not given, well-written Statistics books such as Robert V. Hogg et al., Sheldon Ross, or questions whose original source are not clearly known (anonymous). In cases where I am certain of the original source of the question, I would do my best to cite the source of it. If you feel a problem is not properly cited, please draw my attention to it.

Problem 1.

To study the effect of temperature on yield in a chemical process, five batches were produced at each of three temperature levels. The results results are given below.

i.) Construct an analysis of variance ANOVA table for this problem.

ii.) Use a 0.05 level of significance to test whether the temperature level has an effect on the mean yield of the process.

Source

multiple online sources

Table 1: Temperature level versus batch
Batch	50°C	60°C	70°C
1	34	30	23
2	24	31	28
3	36	34	28
4	39	23	30
5	32	27	31

Suggested solution

Clearly, this is a one-way ANOVA problem, and can be solved in many ways. I think it’s best, IMHO, to decompose the sum of squares into an array comprising treatment sum of squares and residual sum of squares.

$x_{lj}$ be an observation for the $l^{th}$ bath under treatment $j$. Thus, by decomposition:

$$x_{lj} = \bar{x} +\underbrace{\bar{x}_l - \bar{x}}_{treatment} +\underbrace{ x_{lj} - \bar{x}_l}_{residual};\: \mbox{where}$$

\begin{equation*} \begin{split} \bar{x}_{..} &= 34 + \ldots + 32 + 30 + \ldots + 27 + 23 + \ldots +31 = 30\\
\bar{x}_{1.} &= 34 + \ldots + 32 = 33\\
\bar{x}_{2.} &= 30 + \ldots + 27 = 29\\
\bar{x}_{3.} &= 23 + \ldots +31 = 28 \end{split} \end{equation*}

Decomposing sum of squares is, thus;

\begin{equation} \begin{pmatrix} 34&24&36&39&32\\
30&31&34&23&27\\
23&28&28&30&31 \end{pmatrix}= \begin{pmatrix} 30&30&30&30&30\\
30&30&30&30&30\\
30&30&30&30&30 \end{pmatrix} \end{equation} \begin{equation} + \begin{pmatrix} 3&3&3&3&3\\
-1&-1&-1&-1&-1\\
-2&-2&-2&-2&-2 \end{pmatrix} + \begin{pmatrix} 1&-9&3&6&-1\\
1&2&5&-6&-2\\
-5&0&0&2&3 \end{pmatrix} \end{equation}

The sum of squares (SS) from the above are:

\begin{equation*} \begin{split} \mbox{SS}_{obs} &= \mbox{SS}_{mean} + \mbox{SS}_{treatment} + \mbox{SS}_{residual}\\
\mbox{SS}_{cor} & = \mbox{SS}_{obs} - \mbox{SS}_{mean} \end{split} \end{equation*}

\begin{equation*} \begin{split} \mbox{SS}_{obs} &= 34^2 + 24^2 + \ldots + 30^2 + 31^2 = 13806\\
\mbox{SS}_{mean} &= 30^2 + 30^2 + \ldots + 30^2 = 13500\\
\mbox{SS}_{cor} &= 13806 - 13500 = 306\\
\mbox{SS}_{trt} &= 3^2 + 3^2 + \ldots + (-2)^2 + (-2)^2 = 70\\
\mbox{SS}_{res} &= 1^2 + (-9)^2 + \ldots + 2^2 + 3^2 = 236 \end{split} \end{equation*}

ANOVA table

Source	DF	Sum of squares	Mean sum of squares	F-value
Treatment	2	70	35
Error	12	236	19.667	1.78*
Total	14	306

ii.

Hypothesis formulation

\begin{equation*} \begin{split} &H_0 : \mu_1 = \mu_2 = \mu_3 \quad \mbox{(mean yield across temperature level)}\\
&H_1 : \mu_i \neq \mu_j \quad \mbox{for all}\quad i \neq j \end{split} \end{equation*} \begin{equation*} \begin{split} F_{cal} &= 1.78\\
F_{table} &= F_{[2, 12, :1-\alpha = 0.95]} = 3.89 \end{split} \end{equation*}

Since $F_{cal} < F_{table}$, we fail to reject $H_0$ and conclude that there is exists no sufficient evidence to conclude that temperature level appears to have an effect on the mean yield of the process. [Insert Note here!]

Rather than decomposing the data into an array format, you could also use formulae to obtain the sum of squares, and subsequently, an ANOVA table.

Problem 2.

Compute the correlation coefficient for each of the following probability densities:

i.) \begin{equation*} f(x,y) = \begin{cases} \frac13 (x+y) & 0 \le x \le 1; 0\le y \le 2\\
\\
0 & \mbox{elsewhere} \end{cases} \end{equation*}

ii.) \begin{equation*} f(x,y) = \begin{cases} \frac{1}{22} (x+2y) & (1,1), (1, 3), (2, 1), (2, 3)\\
\\
0 & \mbox{elsewhere} \end{cases} \end{equation*}

Source

Hogan Craig et al.

Suggested solution

Mathematically, correlation coefficient, $\rho$, is defined as $\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}$. This implies that, to find it, we need to find the marginal densities, $f_X(x),\: f_Y(y)$, as well as $E(X),\: E(Y),\: E(XY),\: \mbox{var}(X),\: \mbox{var}(Y)$.

i.)

\begin{equation*} f(x,y) = \begin{cases} \frac13 (x+y) & 0 \le x \le 1; 0\le y \le 2\\
\\
0 & \mbox{elsewhere} \end{cases} \end{equation*}

The marginals are; \begin{equation*} \begin{split} f_X(x) & = \int_0^2 f(x,y)dy = \int_0^2 \frac13(x+y)dy\\
& = \frac 13(xy + \frac{y^2}{2})\Big\rvert_0^2\\
& = \frac23(x+1) \end{split} \end{equation*}

\begin{equation*} \begin{split} f_Y(y) & = \int_0^1 f(x,y)dx = \int_0^1 \frac13(x+y)dx\\
& = \frac 13(\frac{x^2}{2}+ xy)\Big\rvert_0^1\\
& = \frac16(1 +2y) \end{split} \end{equation*}

The expectation and variance of $X$: \begin{equation*} \begin{split} E(X) &= \int_0^1 xf(x)dx = \int_0^1 x\cdot\frac 23(x+1)dx = \frac 59\\
E(X^2) & = \int_0^1 x^2f(x)dx = \int_0^1 x^2 \cdot \frac 23(x+1)dx = \frac{7}{18}\\
Var(X) & = E(X^2) - \left(E(X)\right)^2 = \frac{7}{18} - \left(\frac59\right)^2 = \frac{13}{162} \end{split} \end{equation*}

The expectation and variance of $Y$: \begin{equation*} \begin{split} E(Y) &= \int_0^1 yf(y)dy = \int_0^1 y\cdot\frac 16\left(\frac{x^2}{2}+ xy\right)dy = \frac{11}{9}\\
E(Y^2) & = \int_0^1 y^2f(y)dy = \int_0^1 y^2\cdot\frac 16\left(\frac{x^2}{2}+ xy\right)dy = \frac{16}{9}\\
Var(Y) & = E(Y^2) - \left(E(Y)\right)^2 = \frac{16}{9} - \left(\frac{11}{9}\right)^2 = \frac{23}{81} \end{split} \end{equation*}

The covariance of $X$ and $Y$: \begin{equation*} \begin{split} E(XY) & = \int_0^1\int_0^2 xyf(x,y)dydx = \int_0^1\int_0^2 xy\cdot\frac13(x+y)dydx\\
&= \int_0^1\int_0^2\frac13(x^2y+xy^2)dydx = \int_0^1\frac13\left(\frac{x^2y^2}{2} + \frac{xy^3}{3}\right)!\Bigg\rvert_0^2dx\\
& = \int_0^1 \frac13 (2x^2 + \frac{8x}{3})dx = \frac23\\
Cov(XY) & = E(XY) - E(X)E(Y)\\
& = \frac23 - \frac59 \cdot \frac{11}{9} = -\frac{1}{81} \end{split} \end{equation*}

Correlation coefficient: \begin{equation*} \begin{split} \therefore \rho_{X,Y}& = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}\\
&= \frac{-1/81}{\sqrt{\frac{13}{162}\cdot \frac{23}{81}}} = \mathbf{-0.0818} \end{split} \end{equation*}

ii.) \begin{equation*} f(x,y) = \begin{cases} \frac{1}{22} (x+2y) & (1,1), (1, 3), (2, 1), (2, 3)\\
\\
0 & \mbox{elsewhere} \end{cases} \end{equation*}

The bivariate joint distribution is:

X/Y	1	3
1	3/22	7/22
2	2/11	4/11

The marginals are; \begin{equation*} \begin{split} f_X(x) & = \begin{cases} \frac{5}{11} & \mbox{if}\: = 1\\
\frac{6}{11} & \mbox{if}\: x = 2\\
0 & \mbox{elsewhere} \end{cases} \end{split} \end{equation*}

\begin{equation*} \begin{split} f_Y(y) & = \begin{cases} \frac{7}{22} & \mbox{if}\: y = 1\\
\frac{15}{22} & \mbox{if}\: y = 3\\
0 & \mbox{elsewhere} \end{cases} \end{split} \end{equation*}

The expectation and variance of $X$: \begin{equation*} \begin{split} E(X) & = \sum xf(x) = (1)\cdot\frac{5}{11} + (2)\cdot\frac{6}{11} = \frac{17}{11}\\
E(X^2) & = \sum x^2f(x) = (1^2)\cdot\frac{5}{11} + (2^2)\cdot\frac{6}{11} = \frac{29}{11}\\
Var(X) & = E(X^2) - \left(E(X)\right)^2 = \frac{29}{11} - \frac{289}{121} = \frac{30}{121} \end{split} \end{equation*}

The expectation and variance of $Y$: \begin{equation*} \begin{split} E(Y) & = \sum yf(y) = (1)\cdot\frac{7}{22} + (3)\cdot\frac{15}{22} = \frac{52}{22}\\
E(Y^2) & = \sum y^2f(y) = (1^2)\cdot\frac{5}{11} + (9^2)\cdot\frac{15}{22} = \frac{142}{22}\\
Var(Y) & = E(Y^2) - \left(E(Y)\right)^2 = \frac{142}{22} - \frac{676}{121} = \frac{105}{121}\\
\end{split} \end{equation*}

The covariance of $X$ and $Y$: \begin{equation*} \begin{split} E(X,Y) & = \sum_xyf(x,y)\\
&= (1)(1)\cdot\frac{3}{22} + (1)(3)\cdot\frac{7}{22} + (2)(1)\cdot\frac{2}{11} + (2)(3)\cdot\frac{4}{11} = \frac{40}{11}\\
Cov(X,Y) & = E(XY) - E(X)E(Y) = \frac{40}{11} - \frac{52}{22}\cdot \frac{17}{11} = -\frac{2}{121} \end{split} \end{equation*}

Correlation coefficient: \begin{equation*} \begin{split} \therefore \rho_{X,Y}& = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}\\
& = \frac{-2/121}{\sqrt{\frac{30}{121}\cdot \frac{105}{121}}}\\
& = \mathbf{-0.0356} \end{split} \end{equation*}

Problem 3.

A construction company wins two road rehabilitation projects, $RH1$ and $RH2$ which are to be executed simultaneously. The marginal distribution of each of the projects is given below.

The marginal distributions therefore are: \begin{equation*} f_{RH1}(t_1)= \begin{cases} 0.30; &\mbox{if}\quad t_1 = 24\\
0.60; &\mbox{if}\quad t_1 = 30\\
0.10 ; &\mbox{if}\quad t_1 = 36\\
0; &\mbox{elsewhere} \end{cases} \end{equation*}

\begin{equation*} f_{RH2}(t_2)= \begin{cases} 0.30; &\mbox{if}\quad t_2 = 18\\
0.50; &\mbox{if}\quad t_2 = 24\\
0.20 ; &\mbox{if}\quad t_2 = 30\\
0; &\mbox{elsewhere} \end{cases} \end{equation*}

Assuming that the completion times of the projects are independent, find the probability that:

i,) the two projects will be completed at the same time,

ii.) both projects will be completed in less than 30 months, and

iii.) project RH1 takes longer time to complete than project RH2.

iv.) Find the expected completion time for each project and interpret your results.

Source

unknown

Tutorial 1: Worked Examples

Introduction

Problem 1.

Suggested solution

ANOVA table

Hypothesis formulation

Problem 2.

Suggested solution

Problem 3.

Suggested solution

Abubakari Sumaila Salpawuni

PhD candidate (Biostatistics)

RH1/RH2	18	24	30
24	0.09	0.15	0.06
30	0.18	0.30	0.12
36	0.03	0.05	0.02

Tutorial 1: Worked Examples

Introduction

Problem 1.

Suggested solution

ANOVA table

Hypothesis formulation

Problem 2.

Suggested solution

Problem 3.

Suggested solution

Did you find this post helpful? Consider sharing it😊😊😊

Abubakari Sumaila Salpawuni

PhD candidate (Biostatistics)