1.5 Distribution

In statistics we examine one attribute at a time, and the measurements of that attribute are taken separately for each observation (e.g. individual). Each measurement differs from others to a lesser or greater extent, and the chain of measurements thus produces a distribution for the attribute concerned. The simplest distribution is obtained by using the tally method.

Population by age (5 year classification), 2006

Source: Statistics Finland, Population 2007

One way of describing a distribution on a scale is to use a histogram. In a histogram, cases from the population concerned are stacked upon one another in columns representing the category measured. The figure thus obtained describes how many cases fall into each category.

The peak of the distribution refers to the point with the highest number of observations. The peak is often near the mean of the distribution.

The shape of the distribution varies

Distributions take on many different shapes. The most commonly known distribution is the normal distributionor Gauss curve, where most observations concentrate around the mean to create a symmetrical pattern. This means that there are roughly the same number of observations deviating in either direction from the middle of the distribution. The further we move in the positive or negative direction from the mean, the smaller the number of observations. In a normal distribution half of the observations are within one standard deviation of the mean, 95% of the observations are within two standard deviations of the mean.

Most statistical models and theories have been developed precisely with a view to normal distribution. The idea is that in large groups, distributions are random, which creates a bell-shaped normal distribution. For example, height in the adult population grouped by gender is often normally distributed around the mean height.

However, not all distributions are normal. A distribution may have two peaks, or it may be skewed. In a skewed distribution the peak occurs at one end of the distribution. A two peak distribution is obtained by combining to very different groups; in the measurement of height, for example, by combining two different age groups. A good example of a skewed distribution is provided by the breakdown of incomes. The majority of income earners are at the lower end of the income distribution. In view of the overall income spread, the large groups in the middle income bracket do not earn very much. The income distribution has a long tail to the right, i.e. from the middle income bracket upwards, whereas there is no room for a tail to the left.

Taxable earned income by income category in 2005

Source: Statistics Finland, Income Distribution Statistics

The measures in a distribution are usually called variables. A variable may be continuous or categorical. In a continuous variable, all observations have their own value, whereas in a categorical variable the observations are placed into larger groups. In practice, distributions are usually presented in categorical format because that makes them easier to handle. In a sense the classification is a crude measure that overlooks minor details.

Measurement of height as a continuous variable

Height (cm) Observations
165.5 1
167 1
169.3 1
170.7 1
172 1
175 1
176.5 1
180.5 1
181 1
183.7 1

Measurement of height as a categorical variable

Height (cm) Observations
165-169 3
170-179 4
180- 3

Home > Products and services > Online services > eCourse in Statistics > How to read and use statistics > The fundamentals of statistical thinking > 1.5 Distribution

Products and services

eCourse in Statistics

Suomeksi På svenska Print version
Index| Site map| Feedback| Contact information
Etusivu| Förstasidan
Home Statistics Metadata Data collections Products and services News Statistics Finland
Statistikcentralen
Statistics Finland PalloTelephone +358 9 17 341 PalloContact information PalloCopyrights and Terms of Use Pallo Feedback