Sampling 3: The effect of bias in samples

 

Daniel Gile

September 8, 2006

 

Samples are selected to be representative of populations. However, natural variability within the population makes it likely that with respect to the feature(s) in which investigators are interested, any sample will deviate to some extent from the population. If the sample is chosen at random in the strict sense of the word (that is, every unit in the population has the same probability of being selected into the sample), such sampling error will also take random values. It can be reduced by increasing sample size, and what is more, it can be measured probabilistically, thus giving indications of the magnitude of potential differences between population characteristics and sample characteristics. In other words, sampling error limits the accuracy of inferences about the population from a sample but does not challenge their validity.

A different type of error may arise from sampling procedures which make the selection of units with certain characteristics more likely than the selection of other units. For instance, in a study of multilingualism in interpreters, if all subjects are sampled in Western Europe, for obvious reasons, it is likely that the mean number of working languages will be close to 3 or above. If all subjects are sampled in East Asia, for equally obvious reasons it is likely that the mean number of working languages will be very close to 2. In either case, a bias will have been introduced which will result in values in the sample deviating systematically and not at random from values in the population.

Note that bias will not go away if sample size is increased. Whether 10, 20, 30 or 100 West-European interpreters are included in the sample, the mean number of working languages will remain close to 3. Similarly, whether 10, 20, 30 or 100 East-Asian interpreters are included in the sample, the mean number of working languages will remain very close to 2.

Another problem with bias, a more fundamental one, is that contrary to random sampling error, it cannot be estimated. This means that if bias is present in the sample, it is difficult to draw conclusions about the population, except in the case where the direction of bias is known: or instance, if it is known that Japanese interpreters tend to have more working days per year than interpreters in all other parts of the world, if the mean number of working days per year in a Japanese sample is 125, the one inference that can be made is that the mean number of working days per year for the population of interpreters worldwide is less than 125.

Bias is therefore an important challenge in research, and the risk of bias in every study should be considered and fought.