Sampling 1: Sampling-based studies as case studies


Daniel Gile

12 July 2006


The concept of sampling implies the idea that what is sampled is representative of a larger entity with similar characteristics. Thus, the most fundamental property sought in a sample is its representativeness with respect to the larger entity.

In statistics, samples are subsets of populations (of people, light bulbs, rocks, manufacturing errors, crime occurrences, votes, etc.). In TS, units that make up samples and populations are translators, students, texts, text genres, words, errors, user reactions etc.

If all units are identical with respect to the characteristics to be investigated (translation strategies, speed, errors, quality, linguistic features of the target text etc.), studying one unit should be enough to learn about the population. In real life, such cases are rare. Variability is the rule and forces investigators to look at larger samples (their size depends to a large extent on how much the relevant features vary) and draw inferences on the basis of central tendencies and variability measured in the sample. Because of this variability, there is uncertainty in these inferences. Inferential statistics measure such uncertainty with mathematics-based tests.

It is important to understand that in most studies, sampling actually occurs in many dimensions, only one or a few of which are controlled. For instance, an experiment on the effect of experience on translation quality may be designed around the comparison of the performance of translators with different levels of experience (say 0-4, 5-9, 10-14, 15 and more years of experience) – this will be the main dimension of the study. Experimenters will probably attempt to make sure that all participants have the same language combination and similar knowledge of the passive language (a second dimension) and of the theme addressed in the text (a third dimension). Perhaps they will attempt to make sure that all participants have had similar background education (fourth dimension) and translation training (fifth dimension), that their usual professional market is similar (sixth dimension), etc., but they may not control motivation (seventh dimension), general personality features (eighth dimension), physiological parameters at the time of the experiment (ninth dimension), the subjects’ mood (tenth dimension) etc. There remains the possibility that different values in each of these dimensions could affect the subjects’ work. Variability in each of these uncontrolled dimensions may “hide” fundamental tendencies.

On the other hand, if a particular experiment does show some “significant” trend (i.e. one which is not likely to have been caused by chance alone), this trend is specific to a particular set of parameter values: a certain text or text genre, a specific language combination, certain levels of knowledge of the passive language and of the theme, etc. There is no guarantee that the same trend would be found in different sets of parameter values, say with different texts or text genres, different language combinations, etc. In other words, even with a fairly large sample, most studies remain case studies for several potentially relevant dimensions.