Sample variance

Let X be a random variable and Y another independent and identically distributed random variable. We redefine the variance of X to be the expectation of the energy between X and Y:

\displaystyle V(X)=E\left(\frac{(X-Y)^2}{2}\right).

This agrees with the usual definition of variance since

\displaystyle E\left(\frac{(X-Y)^2}{2}\right)=E(X^2)-E(X)^2=E((X-E(X))^2).

Given a sample \{x_1,\dots,x_n\}, each pair \{x_i,x_j\}, i\neq j has a mutual energy (x_i-x_j)^2/2. We can then estimate the variance to be the mean of the pairwise energies which we also take to be the redefinition of the sample variance,

\displaystyle s^2=\frac{2}{n(n-1)}\sum_{i<j}\frac{(x_i-x_j)^2}{2}.

This agrees with the usual definition of sample variance since

\displaystyle \frac{2}{n(n-1)}\sum_{i<j}\frac{(x_i-x_j)^2}{2}=\frac{1}{n-1}\left(\sum_{i=1}^nx_i^2-n\bar{x}^2\right)=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2

and gives an explanation why we divide by n-1 in the final expression.

This entry was posted in statistics and tagged , , , . Bookmark the permalink.

Leave a comment