Summary:
A collection of data (hopefully randomly sampled) is usually called a sample or
a population (of data). From that sample/population one can construct the basic
sample/population statistics:
The Sample/Population Mean
Numerical measure of the average or most probable value in some distributions.
Can be measured for any distribution. Knowing the mean value alone for some
sample is not very meaningful.
The Sample/Population Distribution Plot of the frequency of occurrence of ranges of data values in the sample. The
distribution needs to be represented by a reasonable number of data intervals
(counting in bins).
The
Sample Dispersion or standard deviation Numerical measure of the range of the data about the
mean value. Defined such that +/- 1 dispersion
unit contains 68% of the sample, +/- 2 dispersion units contains 95% and +/- 3
dispersion units contains 99.7%. This is schematically shown below:
In general, we map dispersion/standard deviation units on to
probabilities: http://zebu.uoregon.edu/2003/es202/ptable.html.
For instance:
- The probability that some event will be greater
than 0 dispersion units above the mean is 50%.
- The probability that some event will be greater
than 1 dispersion unit above the mean is 15%.
-
The probability that some event will be greater
than 2 dispersion units above the mean is 2%.
- The probability that some event will be greater
than 3 dispersion units above the mean is 0.1% (1 in 1000).
The above is an approximation but its fine. The more exact
values are shown below:
For our purposes, risk assessment of natural hazards, we only care about order of magnitude probabilities; i.e. 1 in 10, 1 in 100, 1 in 1000, etc.
The
calculation of dispersion (standard deviation) in a distribution is very important because it
represents a uniform way to determine probabilities and therefore to determine
if some event in the data is expected (i.e., probable) or is significantly
different than expected (i.e., improbable).
Sampling
|