24 March 2008

Statistics - what is "average"?

As part of your Unit of Inquiry, you've collected a lot of numbers, and put them in a spread sheet, using Excel.

Now you want to "do the statistics" on your findings. Read on, to find out about the the "measures of central tendency" for each year group: mean,
mode and median. All these have to do with finding the middle ground among a set of numbers, or the "average".




(This video is reposted from YouTube, so that we can see it in class. It was created for Nutshell Math)

  • Mean

The mean is the sum of all the numbers in a dataset divided by the count. The formula for computing the mean is easy:

mean = (sum of all measurements) / (number of measurements)

This is illustrated with this set of eight numbers:

8,5,3,5,10,7,6,4

These numbers add up to 48, so the mean is:

48/6 = 6
Excel provides a simple function for computing averages, namely the
=average(RANGE)
In Excel the =average(RANGE) function ignores cells containing no data, i.e. cells that contain no data do not contribute anything to the computation of the mean. Cells that contain a zero do contribute to the average.
  • Median

The Median is value in the middle. (Imagine the line of bushes down the center of a highway. That's called the "Median Strip")

Finding the median is a two stage process. The example uses the same dataset which was used for the mean.

Step 1 - Sort the numbers into ascending order

3,4,5,5,6,7,8,10

Step 2 - Pick the value in the middle

In this case the dataset has an even number of values. The middle numbers are 5 and 6. Take the mean of these to get the median:

(5+6)/5 = 5.5

If the dataset has an odd number of values, the process is simpler as there will be single value. This is illustrated by removing the largest value from the dataset, i.e.

3,4,5,5,6,7,8

The middle value and thus the median is 5.0 (link)

The median is usually easy to compute when the data is sorted and there are not too many numbers. For unsorted numbers, or for lots of numbers, the median becomes quite tedious, mainly because you have to sort the data first.

Excel has a built-in function

=median(RANGE)
  • Mode

The mode is that observation that occurs most often. It is usually not unique, and is therefore not that often used, but it has the advantage that it applies to numerical as well as categorical variables. As with the median, the mode is easy to find if the data is small and sorted:

Example: Scores from a test were: 1, 2, 2, 4, 7, 7, 7, 8, 9. What is the mode?

The mode is 7, because that number occurs more often than any other number.

Example: Scores from a test were: 1, 2, 2, 2, 3, 7, 7, 7, 8, 9. What is the mode?

This time the mode is 2 and 7, because both numbers occur three times, more than the other numbers. Sometimes variables that are distributed this way are called bimodal variables.

For data that consists of lots of numbers, and/or data that is not sorted, the mode, as the median, is cumbersome to compute by hand.

Excel provides a mode formula:

=mode(RANGE)

However, if the cell range consists several numbers with the same frequency (i.e. a bimodal variable as in the second example above) then the Excel =mode(RANGE) function returns only the first (smallest) number as the mode.

If all values occur exactly once, the Excel mode function returns N\A for "not applicable". (link)

Resources for this post were:
Measures of Central Tendency
Basic Math — Mean, Median, Mode, Range
Mean, Median and Mode


0 comments: