In many ways, measures of central tendency are less useful in
statistical analysis than measures of dispersion of values around the central
tendency The dispersion of values within variables is especially important in
social and political research because:
- Dispersion or "variation" in observations is what we seek to explain.
- Researchers want to know WHY some cases lie above average and others below-
Average for a given variable:
o TURNOUT in voting: why do some states show higher
rates than others?
o CRIMES in cities: why are there differences in
crime rates?
o CIVIL STRIFE among countries: what accounts for
differing amounts?
- Much of statistical explanation aims at explaining DIFFERENCES in observations -- also known as
o
VARIATION,
or the more technical term, VARIANCE
If everything were the same, we would have no need of statistics. But,
people's heights, ages, etc., do vary. We often need to measure the extent to
which scores in a dataset differ from each other. Such a measure is called the
dispersion of a distribution Some measure of dispersion are:
- Range
- Percentile Range
- Inter-quartile Range
- Average deviation
- Standard deviation and variance
- Concentration ratio
Variance and Standard Deviation
Variance is the average squared difference of score from the means score
of the distribution. Standard deviation is the square root of the variance. In
calculating the variance of date points, we square the difference between each
and the mean because if we summed the difference directly, the result would
always be zero.
For example, three friends work on campus and earn $5.50, $7.50, and
$8.00 per hour, respectively. The mean of these values is $(5.5+7.50+8.00)/3 =
$7.00 per hr. if we summed the differences of the mean from each wage, we would
get:
$(5.50 -
7.00) + $(7.50 - 7.00) + $(8.00 - 7.00)
= (1.50) +
(- 0.50) + (-1.00)
= 0.00
Instead we squared the terms to obtain the variance equal to 2.25 + 0.25
+ 1.00 = 3.50. This figure is a measure of dispersion in the set of
score.
The variance is the minimum sum of squared differences of each score
from any number, In other words, if we used any number other than the mean as
the value from which each score is subtract, the resulting sum of squared
differences would be greater.
The standard deviation is simply the square root of the variance. In
some sense, taking the square root of the variance "undoes" the
squaring of the differences that we did when we calculated the variance.
Variance and standard deviation of a population are designated by and,
respectively. Variance and standard deviation of a sample are designated by s2
and s, respectively.
The standard deviation (or s) and variance (or s2) are more complete
measures of dispersion which take into account every score in a distribution.
The other measures of dispersion we have discussed are based on considerably
less information. However, because variance relies on the squared differences
of scores from the mean, a single outlier has greater impact on the size of the
variance than does a single score near the mean.
Some statisticians view this property as a shortcoming of variance as a
measure of dispersion, especially when there is reason to doubt the reliability
of some of the extreme scores.
The standard deviation and variance are the most commonly used measures
of dispersion in the social sciences because:
· Both take
into account the precise difference between each score and the mean.
Consequently, these measures are based on a maximum amount of
information.
· The standard
deviation is the baseline for defining the concept of standardized score or
"z-score".
· Variance in
a set of scores on some dependent variable is a baseline for measuring the
correlation between two or more variables (the degree to which they are
related).
No comments:
Post a Comment