Item: Cross-Cultural Test Adaptation
Rating: 5
Author: memjavad

In one early cross-cultural study, Porteus administered tests of cognitive abilities to members of various cultural populations around the world. The test he relied on most consisted of mazes drawn on paper. On the basis of the distributions of the test scores, Porteus (1937) made comparative statements about the intelligence of such groups as Bushmen and Bantu-speaking groups in Southern Africa and Australian Aborigines. He concluded that the level of intelligence, reflected in such abilities as foresight and planning (presumably measured by the maze test), was lower among the Bushmen than in any other race. Porteus discussed factors that might have affected his findings, such as absence of schooling, but these he explained away and, in the end, he accepted the validity of his findings.

Few psychologists today would share Porteus’s conclusion, which implies that culture-free or culture-fair tests of intelligence are feasible, and as such this example is mainly of historical interest. However, the transfer of tests to cultural populations other than the one for which they were originally designed has greatly increased. Sometimes tests are administered without any modification except translation of verbal materials, but in most other cases there are extensive changes in item content and/or administration procedures. Any such change, including translation, is referred to as cross-cultural test adaptation.

The Logic of Cross-Cultural Test Adaptation

A test is a standardized procedure to assess a cognitive ability, a dimension or, aspect of personality, or some other domain of behavior or aspect of psychological functioning. Here the term trait is used in a generic sense. Cross-cultural use of tests only makes sense if the same trait is measured in the various populations to which testees belong. If this is not the case, there is no rationale whatsoever for transfer and adaptation.

Continuing with the earlier Porteus example, the most fundamental issue is whether traits like “foresight and planning” as aspects of intelligence are meaningful concepts to explain the psychological makeup of a people like the Bushmen. A related question is whether Bushmen themselves would consider a maze task an adequate way to make an assessment of individual differences. There are traditions in psychology, such as the school of cultural psychology, that tend to see the transfer of any (Western) tests as an imposition, likely to conflict with the meaning of behavior as it is socially construed within a particular (non-western) society (Miller, 1997). However, more common is the assumption that psychological functions and mechanisms are universal and manifestations of psychic unity. If foresight and planning are relevant scientific concepts for European American peoples, they should be relevant everywhere.

Assuming universality of foresight and planning, one may ask whether a paper-and-pencil test with mazes is an appropriate task to assess cognitive functioning in Bushmen hunter-gatherers, who only know graphical representation on ostrich egg shells, and who live in a semiarid flat environment where barriers that necessitate detours are virtually unknown. One indication of the inappropriateness of Porteus’s test is that Bushmen subjects could solve more mazes when presented with a wooden version that had actual slats between the alleys (Reuning & Wortley, 1973). This example illustrates the general principle that transfer of a test presumes that the test forms an appropriate operationalization of the trait in the cultural populations concerned.

When a test provides a valid assessment of a particular trait in each of two cultural populations, there is another difficulty. It is still possible, and even likely, that the task is influenced unequally by factors that are not part of the target trait. For example, a test of arithmetic can contribute to the validity of a battery to assess intelligence in the sense of the capacity for learning and problem solving, but testees in one population on average may gain higher scores: for example, because of better quality schooling. This implies that for members of different populations an equal standing on the trait (intelligence) can be reflected by different scores, and a different standing by equal scores.

Thus, the transfer of a test requires that there is cross-cultural invariance of the relevant trait (or composite of traits), and that the operationalization is culturally appropriate: an interpretation of cross-cultural differences in score levels also requires that the test forms a quantitatively identical scale for the trait across cultural populations.

If a test does not meet these demands the scores are said to be biased or inequivalent. Effects that are likely to influence differentially all different items in a test, or at least a major proportion, are referred to as method bias. A well-known example for personality scales are cross-cultural differences in response styles, such as social desirability and the tendency to avoid extreme responses. An example in speeded tests of cognitive abilities are differences in speed-accuracy trade-off. In addition, response procedures can be mentioned, such as the multiple choice format that can lead to method bias when inexperienced testees are reluctant to guess in case of uncertainty about their answer, while testees with more experience answer more freely or make more educated guesses. A biasing effect can also be incidental to a specific item. In this case we speak about item bias, also called differential item functioning. The most common cause is a differential exposure to particular experiences and knowledge. There is a range of judgmental procedures and psychometric techniques to identify biased items, but method bias is difficult to evaluate (Van de Vijver & Leung, 1997).

The empirical record has shown that for complex behavior, such as cognitive abilities or personality traits, it is often impossible to rule out bias as an explanation of score differences, even for cultural populations that have a quite similar behavior repertoire. This imposes severe limitations on cross-cultural comparison of test scores.

Some Frequently Adapted Tests

The cognitive tests with the most extensive cross-cultural usage probably are Raven’s Progressive Matrices and the Wechsler Adult Intelligence Scales (WAIS). [See Wechsler Intelligence Tests.] The Raven is a nonverbal test (with figural stimuli) and as such it requires no translation or adaptation of the items. It has been frequently administered to assess in a short time the “general intelligence” of testees with a low level of or no formal schooling. For quite some time this test had a reputation of being (relatively) culture-fair, although already in the 1950s it was shown that with repeated administration the last of three administrations had the highest validity for unschooled Congolese subjects (Ombredane, Robaye, & Plumail, 1956). But the most pervasive argument against culture-fairness lies in the large effects of Western-style school education on the score level.

The WAIS, which consists of verbal as well as nonverbal subtests, at the very minimum requires translation of the verbal items for non-English speaking populations. The adaptation of the WAIS, and other intelligence batteries, for a particular country involves elaborate tryouts of items and standardization studies. These batteries are rarely used for comparison of score levels between populations: the purpose is to have an instrument for assessment and diagnosis with the same qualities as in the original population.

Even more than for cognitive tests, adaptation of personality tests tends to be limited to translation of verbal items, with or without checks on the quality of this translation and the psychometric properties of the translated items. The need for more far-reaching forms of test adaptation has been recognized for a long time. For example, already in the early 1960s De Ridder (1961) published a Thematic Apperception Test (TAT) with pictures drawn for black testees in Southern Africa. However, maintaining identity of item content has been the rule.

Among the research traditions that have emerged around the cross-cultural use of specific instruments the best known example in clinical diagnosis is the Minnesota Multiphasic Personality Inventory (MMPI). Both the earlier version and the current MMPI/2 have been translated into many languages. In a number of countries the validity of diagnostic profiles originally established in the United States has been investigated. In addition, there is a literature that addresses issues of equivalence and bias (Butcher, 1996).

Another tradition, originating from the assessment of examination anxiety with the State-Trait Anxiety Inventory (STAI) and similar instruments, even has led to an international psychological association, the Stress and Anxiety Research Society (STAR). In this field cross-cultural comparison of score levels tends to be seen as problematic (Hocevar & El-Zahhar, 1992), even though it is not always avoided.

In the area of personality theory the Eysenck Personality Questionnaire (EPQ) is among the most translated instruments. Cross-cultural similarities in factor structures have been put forward as evidence for the universality of the dimensions in Eysenck’s personality theory, as well as the validity of quantitative score differences (Eysenck & Eysenck, 1983). However, there have been arguments that the psychometric methods used are rather insensitive to differences in structures and effects of item bias. ‘Ziecent cross-cultural research along similar lines as that of the Eysenck concerns the cross-cultural testing of the personality structure represented by the “Big Five” dimensions in the Neo-PI-R.

Evaluation of Cross-Cultural Test Adaptation

There are important reasons favoring the transfer and cross-cultural test adaptation over the construction of local new instruments. The first reason is mainly scientific and has a comparative flavor. The analysis of a trait across cultural populations can lead to new insights. But the availability of common standards is a necessary condition for precise and valid comparisons. As noted before, the Achilles heel of this type of research is the pervasive presence of a11 kinds of trait-irrelevant differences that collectively are referred to as cultural bias.

Other reasons for test adaptation are primarily economical and concern the local use of tests. The availability of an item set is a shortcut toward the development of a new instrument, even if this set has to be extended with locally formulated items to capture relevant manifestations of the trait not covered by the original instrument. More important, if a test does not have to undergo major changes in the process of adaptation, there is an implicit expectation that the existing literature on the original version will also be valid for the new version. There has rarely been critical examination of whether this expectation is justifiable or not, but the available evidence points to similarities in validity rather than major differences, at least for Western industrialized countries. For well-known tests like the WAIS and the MMPI, there is an extensive body of research that would be hard to replicate in any country for which these tests have been adapted.

Of the tests mentioned by name in this article none has originated in a non-English-speaking country. Two have originated in Britain (Raven’s matrices and EPQ) while the others are of U.S. origin. This reflects the dominant position of Western, particularly U.S., psychology. The consequences of this situation are quite unclear, but this is a source of serious concern, particularly in the non-Western world (Sinha, 1997).

There are tendencies toward greater recognition of the cultural identity and its psychological consequences for cultural minorities within various countries, while at the same time there are movements toward internationalization of the work force and international standardization of criteria for well-being (quality of life). educational curricula, and so on. The one tendency leads to awareness of cultural bias in tests, the need for adaptation of test content and norms, and careful interpretation of test score differences. The other tendency points to the need for a further increase in cross-cultural use of tests.

A pragmatic orientation in which weaknesses and potential errors are recognized and the process of test adaptation is optimized has been reflected in a set of “Guidelines for Adapting Educational and Psychological Instruments and Establishing Score Equivalence.” These guidelines, which were prepared by representatives of major international psychological associations at the initiative of Hambleton, address the various aspects of test adaptation, including the danger of overinterpretation of score differences.

References:

Andor, L. E. (1966). Aptitudes and abilities of the Black man in Sub-Saharan Africa 1784-1963. Johannesburg, South Africa: National Institute for Personnel Research.
Butcher, J. N. (Ed.). (1996). International adaptations of the MMPI-2: Research and clinical applications. Minneapolis, MN: University of Minnesota.
Eysenck, H. J., & Eysenck, S. B. G. (1983). Recent advances in the cross-cultural study of personality. In J. N. Butcher & C. D. Spielberger (Eds.). Advances in personality assessment (Vol. 2, pp. 4r-70). Hillsdale, NJ: Erlbaum.
Hocevar D., & El Zahhar, N. (1992). Cross-cultural differences in test anxiety: Establishing transliteral equivalence. In K. A. Hagtvet & T. Backer Johnsen (Eds.), Advances in test anxiety research (Vol. 7 , pp. 48-61). Amsterdam: Swets & Zeitlinger.
Miller, J. G. (1997). Theoretical issues in cultural psychology. In J. W. Berry, Y. H. Poortinga & J. Pandey (Eds.), Handbook of cross-cultural psychology (2nd ed.. Vol. 1, pp. 85-128). Boston: Allyn & Bacon.
Porteus, S. D. (1937). Primitive intelligence and environment. New York: Macmillan.
Reuning, H., & Wortley, W. (1973). Psychological studies of the Bushmen. Psychologia Africana, Monograph Supplement No. 7.
Sinha, D. (1997). Indigenizing psychology. In J. W. Berry, Y. H. Poortinga, & J. Pandey (Eds.). Handbook of cross-cultural psychology: Vol. I: Theory and method (2nd ed., pp. 129-169). Boston: Allyn & Bacon.
Van de Vijver, F. J. R.. & Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage.