Sums of Random Variables from a Random Sample

Let $X_1,\ldots,X_n$ be a random sample of size $n$ from a population and let $T(x_1,\ldots,x_n)$ be a real-valued or vector-valued function whose domain includes the sample space of $(X_1,\ldots,X_n).$ Then the random variable or random vector $Y=T(X_1,\ldots,X_n)$ is called a statistic. The probability distribution of a statistic $Y$ is called the sampling distribution of $Y.$

The sample mean is the arithmetic average of the values in a random sample. It is usually denoted by \[\overline{X} = \frac{X_1 + \cdots + X_n}{n} = \frac{1}{n}\sum_{i=1}^n X_i.\]

The sample variance is the statistic defined by \[S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i-\overline{X})^2.\] The sample standard deviation is the statistic defined by $S=\sqrt{S^2}.$

Theorem 5.2.1 Let $x_1,\ldots,x_n$ be any numbers and $\bar{x}=(x_1+\cdots x_n)/n.$ Then
a) $\min_a \sum_{i=1}^n (x_i-a)^2 = \sum_{i=1}^n (x_i-\bar{x})^2,$
b) $(n-1)s^2 = \sum_{i=1}^n (x_i-\bar{x})^2 = \sum_{i=1}^n x_i^2 - n\bar{x}^2.$

Lemma 5.2.2 Let $X_1,\ldots,X_n$ be a random sample from a population and let $g(x)$ be a function such that $Eg(X_1)$ and $Var\, g(X_1)$ exist. Then \[E\left(\sum_{i=1}^n g(X_i)\right) = n(Eg(X_1))\] and \[Var\left(\sum_{i=1}^n g(X_i)\right) = n(Var\, g(X_1)).\]

Theorem 5.2.3 Let $X_1,\ldots,X_n$ be a random sample from a population with mean $\mu$ and variance $\sigma^2 < \infty.$ Then
a) $E\overline{X} = \mu.$
b) $Var \, \overline{X} = \frac{\sigma^2}{n},$
c) $ES^2 = \sigma^2.$

Theorem 5.2.4 Let $X_1,\ldots,X_n$ be a random sample from a population with mgf $M_X(t).$ Then the mgf of the sample mean is \[M_{\overline{X}}(t) = [M_X(t/n)]^n.\]

Theorem 5.2.5 If $X$ and $Y$ are independent continuous random variables with pdfs $f_X(x)$ and $f_Y(y),$ then the pdf of $Z=X+Y$ is \[f_Z(z) = \int_{-\infty}^{\infty} f_X(w)f_Y(z-w)\,dw.\]

Theorem 5.2.6 Suppose $X_1,\ldots,X_n$ is a random sample from a pdf or pmf $f(x|\theta),$ where \[f(x|\theta) = h(x)c(\theta)\exp\left(\sum_{i=1}^k w_i(\theta)t_i(x)\right)\] is a member of an exponential family. Define statistics $T_1,\ldots,T_k$ by \[T_i(X_1,\ldots,X_n) = \sum_{j=1}^n t_i(X_j), \:\: i = 1,\ldots,k.\] If the set $\{(w_1(\theta),w_2(\theta),\ldots,w_k(\theta)),\, \theta \in \Theta\}$ contains an open subset of $\R^k,$ then the distribution of $(T_1,\ldots,T_k)$ is an exponential family of the form \[f_T(u_1,\ldots,u_k | \theta) = H(u_1,\ldots,u_k)[c(\theta)]^n\exp\left(\sum_{i=1}^k w_i(\theta)u_i\right)\]