An Analysis of Test-Wiseness

http://epm.sagepub.com/ Measurement Educational and Psychological http://epm.sagepub.com/content/25/3/707 The online version of this article can be found at: DOI: 10.1177/001316446502500304 1965 25: 707Educational and Psychological Measurement Jason Millman, Carol H. Bishop and Robert Ebel An Analysis of Test-Wiseness Published by: http://www.sagepublications.com at: can be foundEducational and Psychological MeasurementAdditional services and information for http://epm.sagepub.com/cgi/alertsEmail Alerts: http://epm.sagepub.com/subscriptionsSubscriptions: http://www.sagepub.com/journalsReprints.navReprints: http://www.sagepub.com/journalsPermissions.navPermissions: http://epm.sagepub.com/content/25/3/707.refs.htmlCitations: What is This? - Oct 1, 1965Version of Record >> at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ http://epm.sagepub.com/content/25/3/707 http://www.sagepublications.com http://epm.sagepub.com/cgi/alerts http://epm.sagepub.com/subscriptions http://www.sagepub.com/journalsReprints.nav http://www.sagepub.com/journalsPermissions.nav http://epm.sagepub.com/content/25/3/707.refs.html http://epm.sagepub.com/content/25/3/707.full.pdf http://online.sagepub.com/site/sphelp/vorhelp.xhtml http://epm.sagepub.com/ http://epm.sagepub.com/ 707 AN ANALYSIS OF TEST-WISENESS1 JASON MILLMAN AND CAROL H. BISHOP Cornell University ROBERT EBEL Educational Testing Service2 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT VOL. XXV, No. 3, 1965 THE purpose of this article is to offer an analysis of test-wiseness which can serve as a theoretical framework for empirical investi- gations. Thorndike (1949), Ebel and Damrim (1960), and Ver- non (1962), among others, write that there may be sources of vari- ance in educational test scores other than item content and random error. Test-wiseness was one suggested source. General Definition &dquo;Test-wiseness&dquo; is defined as a subjectâs ca- pacity to utilize the characteristics and formats of the test and/or the test taking situation to receive a high score. Test-wiseness is logically independent of the examineeâs knowledge of the subject matter for which the items are supposedly measures. A somewhat limited concept of test-wiseness will be analyzed in this article. The analysis will exclude factors concerned with the general mental attitude (such as anxiety and confidence) and motivational state of the examinee, and it will be restricted to the actual taking of (not preparing for) objective achievement and aptitude tests. Empirical Studies of Test-Wiseness There appears to be no sys- tematic study of either the importance of test-wiseness or the degree to which it can be taught or measured. This is true even though both professional writers (e.g. Vernon, 1962; Anastasi, 1961; Pauck, 1950) and popular writers (e.g. Whyte, 1956; Huff, 1961; Hoff- 1 This research was supported, in part, by the Cornell Social Science Re- search Center through a grant from the Ford Foundation. 2 Now at Michigan State University. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 708 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT mann, 1962) have referred to test-wiseness as a potentially large source of test variance. Almost all textbooks in educational measure- ment provide rules intended to help the test constructor avoid giv- ing away the answer. Numerous pamphlets on how to take tests are available (e.g. Hook, 1958; Heston, 1953; and Pettit, 1960). Although no comprehensive investigation of test-wiseness is known to exist, data from a few empirical studies indicate that it is a factor which may deserve attention. Comparisons made by Vernon of British and American Students on different types of examinations &dquo;illustrate the importance of facility or sophistication at such tests&dquo; (1962, p. 285). This study concludes that &dquo;assessments of the understanding of complex con- cepts, whether by objective tests, written essays, oral or other methods, are affected not only by the level and type of concept tested, but also by many factors arising from the method of testing, and the subjectâs facility in handling it&dquo; (1962, p. 285). In an unpublished study, Millman and Setijadi (in press) re- cently compared American and Indonesian students on a series of algebra items. The Indonesian students had never taken an objec- tive test before, but during the practice session quickly learned the mechanics of reporting the best choice. The two groups scored equally well on the open-ended version of the items. Nevertheless, the performance of the American Students on the multiple-choice version (having plausible options) was significantly better than that of the Indonesians. The American students gained an additional ad- vantage when the items contained one or more implausible options. Test-wiseness seems to be responsible, in part, for the effects of practice and coaching on test scores. During the early twenties and thirties, a series of studies indicated that a substantial rise on the Stanford-Binet score was possible after practice and/or coaching with similar material. (See Rugg and Colloton, 1921, for review.) The effects of practice and coaching on the Moray House Test Examinations received great emphasis in England. The average gains from these sources were estimated at around nine I.Q. points, with greater gains among testees who were genuinely unsophisti- cated about tests to begin with (Vernon, 1954). The effects of prac- tice and coaching on S.A.T. scores have not been so dramatic, per- haps because the students are more sophisticated (Levine and Angoff, 1958). Differences between coached and uncoached groups at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 709JASON MILLMAN, ET AL. are usually less than the standard error of the test, but they are in the positive direction and represent systematic bias. Recent reviews of non-equivalence of measures of supposedly the same traits (Campbell and Fiske, 1959; Smith and Kendall, 1963) suggest the importance of the effects of testing method and situation on test scores. Test-wiseness may contribute appreciably to these effects. There is some evidence that high school students can verbalize many principles of test-wiseness. Two hundred and forty high achieving students in a suburban high school were instructed by the senior investigator as follows. Let us pretend a new student from a different part of the United States has just moved into this area and is now attend- ing your school. He is having trouble getting good scores on the tests of some of the teachers. Where he comes from they do not give tests like the tests the teachers in this school give. What help can you be to this student? Write below any suggestions you can give the student. Tell him about things you have found successful when preparing for and taking tests in certain courses. The students were assured their answers would be seen only by project personnel at Cornell. Some of the responses elicited by this primarily unstructured questionnaire, together with the percent of students providing the response, are shown below: General Response Percent Plan your time. 7 Answer easier questions first. 8 Do not spend too much time on one question (or come back later if you donât know the answer). 27 Read directions (or questions) carefully. 44 Recheck your answers for errors. 20 Guess if you donât know the answer. 18 Eliminate impossible foils. 17 Look for leads from other questions. 13 Donât read into questions (or answers) too deeply. 5 Watch for specific determiners. 3 Test-wiseness was cited as an important reason for success on examinations by college students also. Gaier asked 276 students to &dquo;assume that you will receive a letter grade of (A)/(D) on the test you are to take. List the specific activities, either on your part at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 710 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT or on the part of the instructor, that you feel were influential or re- sponsible in making this grade&dquo; (1962, p. 561). Of the 136 students who were to assume they received a grade of A on the test, 21 per- cent volunteered &dquo;test understanding&dquo; as a reason for success, 21 percent indicated &dquo;comprehension and reasoning ability,&dquo; and 18 percent suggested &dquo;test characteristics&dquo; as a reason for successful test performance. &dquo;Not able to understand and to reason&dquo; (26 per- cent of the responses), &dquo;not understanding tests&dquo; (13 percent), &dquo;teacherâs tests&dquo; (12 percent), and &dquo;test characteristics&dquo; (34 per- cent) were volunteered by the 140 students who were to assume they received a grade of D on the test as reasons why their test perform- ance was unsuccessful. Further evidence of the importance of test-wiseness comes from investigations in which the problem solving styles of students an- swering objective type test items are studied (e.g., Bloom and Broder, 1950; Earle, 1950; Connerly and Wantman, 1964; French, 1965). Under the supervision of the authors, 40 college students in two institutions were interviewed individually as they were taking regular course examinations. (Time limits of the examinations were extended to permit the examinees to explain why they chose the answer they did at the time they responded to the item.) There was a great range in the sophistication of reasons given by students for responding as they did to test items in which they did not know the answers. The particular styles reported are highly specific to the nature of the items being solved. However, the problem solving styles of high and low test performers could be distinguished. Bloom and Broder (1950) report that students trained in those general problem solv- ing techniques (including the comprehension of test directions, the understanding of the nature of the specific test questions, and the ability to reason logically) used by high test performers, but not additionally trained in subject-matter knowledge, made significant gains in subsequent achievement test scores. French (1965) dem- onstrated that the factor composition of a test frequently depended upon the problem solving styles used in answering the test items. The results of the references cited above and an analysis of the literature dealing with principles of test construction or advice for taking examinations were useful in developing the following out- line. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 711JASON MILLMAN, ET AL. An Outline of Test-Wiseness Principles I. Elements independent of test constructor or test purpose. A. Time-using strategy 1. Begin to work as rapidly as possible with reasonable as- surance of accuracy. 2. Set up a schedule for progress through the test. 3. Omit or guess at items (see I.C. and II.B.) which resist a quick response. 4. Mark omitted items, or items which could use further consideration, to assure easy relocation. 5. Use time remaining after completion of the test to recon- sider answers. B. Error-avoidance strategy. 1. Pay careful attention to directions, determining clearly the nature of the task and the intended basis for response. 2. Pay careful attention to the items, determining clearly the nature of the question. 3. Ask examiner for clarification when necessary, if it is per- mitted. 4. Check all answers. C. Guessing strategy. 1. Always guess if right answers only are scored. 2. Always guess if the correction for guessing is less severe than a &dquo;correction for guessing&dquo; formula that gives an expected score of zero for random responding. 3. Always guess even if the usual correction or a more severe penalty for guessing is employed, whenever elimination of options provides sufficient chance of profiting. D. Deductive reasoning strategy. 1. Eliminate options which are known to be incorrect and choose from among the remaining options. 2. Choose neither or both of two options which imply the correctness of each other. 3. Choose neither or one (but not both) of two statements, one of which, if correct, would imply the incorrectness of the other. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 712 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 4. Restrict choice to those options which encompass all of two or more given statements known to be correct. 5. Utilize relevant content information in other test items and options. II. Elements dependent upon the test constructor or purpose. A. Intent consideration strategy. 1. Interpret and answer questions in view of previous idio- syncratic emphases of the test constructor or in view of the test purpose. 2. Answer items as the test constructor intended. 3. Adopt the level of sophistication that is expected. 4. Consider the relevance of specific detail. B. Cue-using strategy. 1. Recognize and make use of any consistent idiosyncracies of the test constructor which distinguish the correct an- swer from incorrect options. a. He makes it longer (shorter) than the incorrect options. b. He qualifies it more carefully, or makes it represent a higher degree of generalization. c. He includes more false (true) statements. d. He places it in certain physical positions among the options (such as in the middle). e. He places it in a certain logical position among an ordered set of options (such as the middle of the se- quence). f. He includes (does not include) it among similar state- ments, or makes (does not make) it one of a pair of diametrically opposite statements. g. He composes (does not compose) it of familiar or ster- eotyped phraseology. h. He does not make it grammatically inconsistent with the stem. 2. Consider the relevancy of specific detail when answering a given item. 3. Recognize and make use of specific determiners. 4. Recognize and make use of resemblances between the op- tions and an aspect of the stem. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 713JASON MILLMAN, ET AL. 5. Consider the subject matter and difficulty of neighboring items when interpreting and answering a given item. General Description of the Outline The suggested outline is di- vided into two main categories: elements which are independent of the test constructor or test purpose and those which are dependent upon the constructor or purpose. The principles falling into the former category are potentially profitable regardless of previous contact with the test constructor or previous contact with tests having a similar purpose. The first two subdivisions, time-using and error-avoidance strat- egies, contain strategies which allow the examinee to demonstrate fully his knowledge of the specific subject matter by helping him avoid losing score points for reasons other than lack of knowledge of the test content. Time-using strategy applies only to those tests which restrict the time allotted the examinee. In general, the elements are concerned with the most efficient use of allotted time. Error-avoidance strategy applies to all testing situations. The elements are simply con- cerned with the avoidance of careless mistakes. The last two subdivisions in the first category, guessing and deductive reasoning strategies, contain strategies which allow the examinee to gain points beyond that which he would receive on the basis of sure knowledge of the specific subject matter. Guessing strategy may allow the examinee to receive credits for responses made on a completely chance basis; deductive reasoning strategy deals with methods of obtaining the correct answer in- directly or with only part of the knowledge necessary to answer a question. It should be noted that some knowledge of the subject matter is involved in deductive reasoning strategies, but the correct answer itself would not be known if no choices were given or no other questions were asked. The second main category, elements dependent upon the test constructor or purpose, includes those strategies which the examinee may employ only after knowing the test constructorâs views or the test purpose, or after having had contact with and feedback from similar tests. The first subdivision in this category is consideration of intent. It is similar to the first two subdivisions in the previous category in that it is concerned with strategies which allow the examinee to at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 714 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT avoid being penalized for anything other than lack of knowledge of the subject matter of the test. The second subdivision, cue-using strategy, pertains to the use of cues which are available when a specific answer is not known. As in the last two subdivisions in the previous category, partial knowl- edge of the subject matter may be needed. The cues can be used successfully, however, only to the extent that a correlation has been established, under similar conditions, between the cues and the correct answer. Further description of selected elements, with examples, may clarify the tactics and strategies. Ellaboration of Selected Principles LA.1. Begin to work as rapidly as possible with reasonable assur- ance of accuracy. The pace at which one can work without sacrificing accuracy varies with individuals. One should attempt, however, to complete the test in less time than is allotted in order to allow time to check answers. I.A.2. Set up a schedule for progress through the test. A rule of thumb is to determine how far one should be when a specific proportion of the testing period has elapsed. A periodic check on rate of progress facilitates the maintenance of proper speed (Cook, 1957). This principle suggests the necessity of determining the scope of the test before beginning work. I.A.3. Omit or guess at items which resist a quick response. When time is limited the examinee should work first on those items which will yield the most points in a given amount of time. The order in which he works on the items may be determined by the relative difficulty of items, by the relative time needed to read and answer the items, or by possible heavier weighting of some items. If such differences are not apparent, the examinee should work in order of presentation of the items. I.A.5. Use time remaining after completion of the test to reconsider answers. Change answers if it seems desirable. Examinees generally in- crease their scores when they do (Berrien, 1939, Briggs and Reile, 1952). There is some evidence that persistence (i.e. using full time on test) pays off (Briggs and Johnson, 1942). at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 715JASON MILLMAN, ET AL. I.B.2. Pay careful attention to the items, determining clearly the nature of the question. Guard against inferring the answer before completely reading the question. Exercise special care in answering more complex questions such as negatively stated items and items having more than one clause. I.C. Guessing strategy A distinction can be made between informed and blind guessing. This subdivision is concerned with blind quessing, i.e. selecting the answer at random without considering the content of the op- tions. Although it is recognized that students rarely respond to items in this way and that they should be encouraged to make in- formed rather than blind guesses, there are times when, because of lack of time or motivation, blind guessing is used. These strategies may also be of value to the examinee who, on the basis of the item content, is able to assign to the options probabilities of being cor- rect. (But see caution in II. B.) The validity of any set of recommended strategies for guessing depends upon what the examinee is attempting to maximize. The strategies listed in subdivision I.C. are based upon the assump- tion that the examinee wishes to maximize the expected value of his test score. This expected value is a function of the probability that the correct option is selected and the utilities associated with correct and incorrect answers. This relation may be expressed algebraically as: E.V. = PoUe + (1 - Po) Ui. where E. V. is the expected value, Pc is the probability that the cor- rect option is selected and Uc and Ui are the utilities associated with correct and incorrect choices. LC.1. Always guess if right answers only are scored. Example: Consider a four alternative multiple-option item in which one point is given for the correct answer and no points are subtracted 3 If a minimax decision function were used, that is, an examinee wanted to minimize his maximum loss, he would never guess when a penalty for guessing was employed. As another example, suppose the testee wished to guess only when the probability of improving his score by guessing was greater than one- half. If the usual correction for guessing formula was employed, he would guess only when (n+1)/k = an integer (k>2), where n is the number of items to be guessed and k the number of choices per item. (Graesser, 1958). at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 716~ EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT for an incorrect answer (i.e. scored rights only). A person who makes a pure guess is expected to earn Since 1/4 is greater than 0, the value of an omitted item, the examinee should guess. 1.0.2. Always guess if the correction for guessing is less severe than a &dquo;correction for guessing&dquo; formula that gives an expected score of zero for random responding. Example: If, in the above illustration, 1/3 of a point were subtracted for an incorrect answer, then: E.V. = 1/4 (1) + (1 - 1/4) (-1/4) ;= 1/16 point. This value is greater than that of an omitted item, and there- fore, the examinee should guess. I.C.3. Always guess, even if the usual correction or a more severe penalty for guessing is employed, whenever elimination of options provides sufficient chance of profiting. Example: Assume that in the above illustration, the usual correction for guessing formula were employed, and thus 1/3 of a point is de- ducted for an incorrect answer. Assume further that the examinee can definitely eliminate one incorrect option, that is, there is no guessing involved in eliminating this option. The expected value associated with choosing from among the remaining three options is: Again, the examinee should guess. â I.D. Deductive reasoning strategy. The test-wise person who does not know the correct option di- rectly may be able to deduce the answer by logical analysis or by using information gained from other items. This strategy differs from cue-using strategy in that it is not necessary to establish cor- relations between cues and the correct answer. The following strat- egies may be used successfully in any objective testing situation, their success depending upon the examineeâs ability to reason in a logically valid manner. I.D.1. Eliminate options which are known to be incorrect and choose from among remaining options. The examinee may be able to eliminate some options with par- at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 717JASON MILLMAN, ET AL. tial knowledge of the subject matter. Options may often be eli- minated because they are logically inconsistent with the stem. Examples: The state with the highest total population in the United States in 1950 was: *a) New York b) Chicago c) Michigan d) California Option b is inconsistent with the stem since it is not a state, and the choice is, therefore, restricted to a, c, or d. Which one of the following is an advantage of using high beds in hospitals? a) High beds cost more than regular size beds. *b) The care of patients is less difficult when the beds are high. c) High beds can be used all day. d) People are less likely to fall out of high beds. Since high cost is never an advantage to the buyer, option a is logically inconsistent with the stem and may be eliminated. I.D.2. Choose neither or both of two options which imply the cor- rectness of each other. Example: A mental disorder which is often classified as a neurosis is: *a) hysteria b) dementia praecox c) schizophrenia d) involutional melancholia If only one answer can be chosen and the examinee knows that &dquo;dementia praecox&dquo; is another name for &dquo;schizophrenia,&dquo; he knows the answer must be either a or d. Care must be taken in deciding whether two options do, in fact, imply the correctness of each other. I.D.3. Choose neither or one (but not both) of two satements, one which, if correct, would imply the incorrectness of the other. This situation occurs most frequently with items containing op- tions which are the opposite of each other. Example: The 18th amendment to the U. S. Constitution a) prohibited the manufacture, sale or transportation of intoxi- cating liquors within the United States. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 718 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT b) repealed the prohibition amendment. c) gave women the right to vote. d) prohibited the President from being elected to office more than twice. Because the correctness of option a implies the incorrectness of option b, both option a and option b cannot be correct. Note, how- ever, that had the item stem asked about the nineteenth amendment, neither option a nor option b would have been correct. (See II.B.l.f.) I.D.4. Restrict choice to those options which encompass all of two or more given statements known to be correct. Examples: Which of the following men were presidents of the United States? a) George Washington b) Andrew Jackson c) Abraham Lincoln *d) All of the above A test-wise examinee who was sure that Washington and Lincoln were presidents but was undecided about Jackson would neverthe- less select option d. A statistical average is a: a) mean b) mode c) median *d) measure of central tendency The more inclusive option d would be chosen by a test-wise per- son who knew that at least two of the first three choices were correct and that there was only one keyed answer. I.D.5. Utilize relevant content information in other test items and options. Examples: Which one of the following four animals is warm-blooded? a) snake b) frog *c) bird d) lizard Which one of the following four animals is cold-blooded? *a) snake b) dog at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 719JASON MILLMAN, ET AL. c) kangaroo d) whale Assume the examinee knows that a bird is a warm-blooded ani- mal, and that all animals are either warm-blooded or cold-blooded but not both. He can then reason that a snake, which is an option in both items, must be a cold-blooded animal and can, therefore, answer the second question. II.A. Intent consideration strategy. It is possible that the examinee may receive a low score in an ob- jective test merely because his views differ from the test construc- torâs or his level of knowledge is higher than that being tested. By recognizing and acting upon the biases of the test constructor or the intent of the test, the examinee may avoid loss of points due to mis- interpretation, rather than lack of knowledge of the subject matter. (The use of this strategy assumes that the goal of the examinee is to earn the greatest possible number of points; i.e., he is not willing to lose points on principle.) II.A.l. Interpret and answer questions in view of previous emphasis of the test constructor or in view of the test purpose. It occasionally is difficult to determine what a question is &dquo;get- ting at,&dquo; and as a consequence the answer is elusive. If the test is taken with a set for the test constructorâs views or purposes, one may be able to interpret the question more easily. Example: There is a test for normality. T F If the examinee knows that the purpose of the test is to measure statistical knowledge rather than knowledge of abnormal psychol- ogy, the question may be interpreted easily. II.A.2. Answer items as the test constructor intended. If the examinee believes that the test constructor had a certain answer in mind, but the examinee can think of a possible objection to the answer, he should answer the item as he believes the test constructor intended. That is, he should choose the option he be- lieves has the greatest chance of being correct, even though others have merits and even though the chosen option is not completely satisfactory. This assumes that no explanation can be. communi- cated to the grader; only the answer may be marked.) Example: Thomas Jefferson wrote the Declaration of Independence. T F at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 720 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT The examinee may object to answering &dquo;true&dquo; on the basis that the statement is not complete since four other men were also ap- pointed to the committee in charge of writing it. The test-wise per- son, however, would answer &dquo;true&dquo; if he felt this was the answer which the test constructor considered correct. II.A.3. Adopt the level of sophistication that is expected. An item (especially a true-false item) may have different correct answers depending upon the &dquo;depth&dquo; which the constructor or pur- pose demands. The test-wise person will recognize and adopt the appropriate level. Example: Light travels in a straight line. T F This item may be keyed as true in a test given at an elementary grade level, but may be false in an advanced college course in physics. II.A.4. Consider the relevance of specific detail. Depending upon the test constructor or test purpose, the specific detail in an item may or may not have bearing upon the answer. Example: A picture of George Washington, the most influential man in shaping the history of the United States, appears on the one dol- lar bill. T F The test-wise person would have to decide whether the appositive were a mere insertion by an enthusiastic admirer of Washington, or whether it had relevance to the answer. II.B. Cue-using strategy Since one test constructor may inadvertently give away the cor- rect answer, whereas another may use these same cues as foils, the successful use of cues is dependent upon previous contact with simi- lar tests to establish the relationship between the cues and the cor- rect answer. These cues should be used only when the examinee is unable to arrive at the answer using his knowledge of the subject matter and his reasoning ability. Once the examinee has detected a cue, he is faced with the decision whether or not to make an informed guess. The blind guessing strat- egies may be of help here, with the replacement of subjective for ob- jective probabilities in the expected value formulas. In assigning subjective probabilities, the examinee should bear in mind that at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 721JASON MILLMAN, ET AL. The item writer attempts to make each wrong response so plausible that every examinee who does not possess the desired skill or ability will select a wrong response.... In actual prac- tice, this aim of the item writer is never fully realized, but it is doubtless often sufficiently realized that the standard formula markedly over-corrects (Traxler, 1951; p. 349). There is some empirical evidence that this is actually the case (e.g. Gupta and Penfold, 1961; Little, 1962). II.B.1. Recognize and make use of any consistent idiosyncracies of the test constructor which distinguish the correct answer from incorrect options. II.B.I.e. He places it in a certain logical position among an ordered set of options (such as the middle of the sequence). Example: The population of Albany, New York in 1960 was approximately *a) 130,000 b) 460,000 c) 75,000 d) 220,000 To guard against the examinee who greatly over- or underesti- mates the desired value answering the item correctly, the inexperi- enced test maker usually favors including foils with values more extreme (in both directions) than the value of the correct option. II.B.I.f. He includes (does not include) it among similar statements, or makes (does not make) it one of a pair of diametrically opposite statements. Examples containing similar statements: In a second class lever the a) effort is between the fulcrum and the weight. *b) weight is between the fulcrum and effort. c) fulcrum is not used. d) mechanical advantage is less than one. Behavior for which the specific eliciting stimulus is not de- termined is called *a) operant b) operational c) apopathetic d) prehensory at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 722 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT Example containing opposite statements: Adding more items to a test would probably a) decrease its reliability *b) increase its reliability c) decrease its cost of administration d) increase its standard error of measurement II.B.I.g. He composes (does not compose) it of familiar or stereo- typed phraseology. Example: Behavior for which the specific eliciting stimulus is not de- termined is called a) apopathetic *b) operant c) abient d) prehensory If &dquo;operant&dquo; is the only word the student recalls hearing or read- ing, he may tend to select it, even though its meaning is unclear. II.B.2. Consider the relevance of specific detail when answering a given item. Examples: According to your textbook, the best way to teach children an activity is to have them perform the activity themselves. T F The Iliad, written by Homer, is an early Greek epic. T F In the above examples, if the test-wise person did not know whether the textbook made the specific statement or whether Homer wrote the Iliad (but did know the truth of the main statement) he would need to determine whether these details were a mere insertion of no consequence or whether such details usually had relevance to the answer. The distinction between this principle and principle IIA.4. is whether or not the specific detail in a test item is known to be true by the examinee. II.B.3. Recognize and make use of specific determiners. The following lists contain examples of words which may be cor- related with correct or incorrect answers respectively: often always seldom never perhaps necessarily sometimes only generally merely may must etc. etc. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 723JASON MILLMAN, ET AL. II.B.4. Recognize and make use of resemblances between the op- tions and an aspect of the stem. These resemblances may take the form of a direct repetition, synonym or more general associative connection. Example of direct repetition: The aeronautics board which has jurisdiction over civil aircraft is called: *a) Civil Aeronautics Board b) Committee to Investigate Airlines c) Division of Passenger Airways Examples of general associative connections: &dquo;Glean 1. polish 2. gather 3. skim 4. praise&dquo; (Votaw, 1955; Form B, p. 4) The similarity between the sounds and spelling of glean and gleam creates an associative connection which would lead the test-wise person to consider option 1 as merely an attractive foil. &dquo;Descriptions Titles (E) 1. Planting a sloping field alternately A. Clean farming with rows of corn, then rows of wheat, then rows of corn, etc. (âD) 2. Plowing a crop underground instead B. Contour farming of harvesting it. (A) 3. Removing brush and weeds along C. Crop rotation the fence between the fields. (B) 4. Planting around a hillside in level D. Green manuring rows instead of planting up and down over the hill. (C) 5. Planting a field one year with wheat, E. Strip cropping&dquo; the second year with oats, the third (Ahmann and Glock, year with alfalfa, the fourth year 1963; p. 101) with corn. Associations such as alternate rows-strip; removing-clean; hill- side rows-contour; and one year wheat, second year oats-rotation lessen the difficulty of the item for the test-wise person. II.B.5. Consider the subject matter and difficulty of neighboring items when interpreting and answering a given item. A question may be embedded among questions dealing with the same general subject, and the given context may make it possible to deduce the answer. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 724 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT Examples: The 13th, 14th, and 15 amendments to the Constitution are commonly called the: *a) reconstruction amendments b) Big Three amendments c) Bill of Rights d) presidential amendments If the item were embedded among items dealing with the post Civil War period, the test-wise person would be given a cue to the correct answer. &dquo;What is the chief obstacle to effective homogeneous grouping of pupils on the basis of their educational ability? a) Resistance of children and parents to discriminations on the basis of ability. b) Difficulty of developing suitably different teaching tech- niques for the various levels. c) Increase costs of instruction as the number of groups in- creases and their average size decreases. *d) Wide differences in the level of development of various abil- ities within individual pupils.&dquo; (National Council on Meas- urement in Education, 1962; p. 239) If this item were embedded among items dealing with educational testing and evaluation, the test-wise person would favor option d. Resistance of children, teaching techniques, and costs of instruction are not of the same species as the questions and options of other items in the test. Implications for Testing. The analysis of test-wiseness proposed in this paper is intended to serve as a framework to study its im- portance. If it does make a significant difference, it would be de- sirable to seek ways to reduce differences in test-wiseness among examinees in order to provide more valid estimates of their actual abilities and achievement levels. Examples of specific questions, which would then be of interest, follow. How can we change test items and test directions, or other conditions of administration, to minimize harmful effects of differences in test-wiseness? How can we more nearly equalize test-wiseness among examinees? Can it be taught? If so, how long will it take and can guide lines be pub- lished ? What are the correlates of test-wiseness? Where is it in the spectra of intelligence? Is it related to the sheer number of tests taken? Does knowledge of how tests are constructed increase test- at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 725JASON MILLMAN, ET AL. wiseness? Is the degree of test-wiseness of an individual dependent upon the subject matter of the test? At what grade level does evi- dence of the various aspects of test-wiseness first appear? To help answer these questions, valid measures of test-wiseness are desired. REFERENCES Ahmann, J. Stanley and Glock, Marvin D. Evaluating Pupil Growth. Boston: Allyn and Bacon, Inc., 1963. Anastasi, Anne. Psychological Testing. New York: The Macmillan Company, 1961. Berrien, F. K. "Are First Impressions Best on Objective Tests?" School and Society, L. (1939), 319-20. Bloom, Benjamin S. and Broder, Lois J. Problem-solving Proc- esses of College Students: An Exploratory Investigation. Chi- cago: The University of Chicago Press, 1950. Briggs, Arvella and Johnson, D. M. "A Note on the Relation be- tween Persistence and Achievement on the Final Examination." Journal of Educational Psychology, XXXIII (1942), 623-27. Briggs, Leslie J. and Reile, Patricia J. "Should Students Change Their Initial Answers on Objective-Type Tests?: More Evi- dence Regarding an Old Problem." Journal of Educational Psychology, XLIII (1952), 110-15. Campbell, Donald T. and Fiske, Donald W. "Convergent and Dis- criminant Validation by the Multitrait-Multimethod Matrix." Psychological Bulletin, LVI (1959), 81-105. Connerly, John A. and Wantman, Morey J. "An Exploration of Oral Reasoning Processes in Responding to Objective Test Items." Journal of Educational Measurement, I (1964), 59-64. Cook, Desmond. "A Comparison of Reading Comprehension Scores Obtained Before and After a Time Announcement." Journal of Educational Psychology, XLVIII (1957), 440-46. Earle, Dotsie. "A Study of Problem-Solving Methods Used on Comprehensive Examinations." Chicago: University Examin- erâs Office, University of Chicago, 1950. (Mimeographed) Ebel, Robert L. and Damrin, Dora E. "Tests and Examinations." In Chester Harris (Editor) Encyclopedia of Educational Re- search. New York: The Macmillan Company, 1960. (Pp. 1502- 517.) French, John W. "The Relationship of Problem-Solving Styles to the Factor Composition of Tests." EDUCATIONAL AND PSYCHO- LOGICAL MEASUREMENT, XXV (1965), 9-28. Gaier, Eugene L. "Studentsâ Perceptions of Factors Affecting Test Performance." Journal of Educational Research, LV (1962), 561-66. Graesser, R. F. "Guessing on Multiple-Choice Tests." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, XVIII (1958), 617-20. Gupta, R. K. and Penfold, D. M. Edwards. "Correction for Guess- ing in True-False Tests: An Experimental Approach." British Journal of Educational Psychology, XXXI (1961), 249-56. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/ 726 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT Heston, Joseph C. How to Take a Test. Chicago: Science Research Associates, Inc., 1953. Hoffmann, Banesh. The Tyranny of Testing. New York: Crowell- Collier Press, Inc., 1962. Hook, J. N. How to Take Examinations in College. New York: Barnes & Noble, Inc., 1958. Huff, Darrell. Score: The Strategy of Taking Tests. New York: Appleton-Century-Crofts, Inc., 1961. Levine, Richard S. and Angoff, William H. "The Effects of Practice and Growth on Scores on the Scholastic Aptitude Test." Re- search and Development Report No. 58-6/SR-586. Princeton, N. J.: Educational Testing Service, 1961. Little, E. B. "Overcorrection for Guessing in Multiple-Choice Test Scoring." Journal of Educational Research, LV (1962), 245-52. Millman, Jason and Setijadi. "A Comparison of American and In- donesian Students on Three Types of Test Items." Journal of Educational Research, in press. National Council on Measurement in Education. (Wilbur L. Lay- ton, Secretary-Treasurer). "Multiple-Choice Items for a Test of Teacher Competence in Educational Measurement." Ames, Iowa: Iowa State University, 1962. Pauck, Charles E. "A Square Deal for the Examination." Educa- tion, LXXI (1950), 222-25. Pettit, Lincoln. How to Study and Take Exams. New York: John F. Rider Publisher, Inc., 1960. Rugg, Harold and Colloton, Cecile. "Constancy of the Stanford- Binet I.Q. as Shown by Retests." Journal of Educational Psy- chology, XII (1921), 315-22. Smith, Patricia Cain and Kendall, Lorne M. "Cornell Studies of Job Satisfaction: VI Implications for the Future." Cornell Notes in Psychology. Ithaca, New York: 1963. (Mimeographed) Thorndike, Robert L. Personnel Selection: Test and Measurement Techniques. New York: John Wiley & Sons, 1949. Traxler, Arthur E. "Administering and Scoring the Objective Test." In E. F. Lindquist (Editor), Educational Measurement. Wash- ington : American Council on Education, 1951. (Chapter X.) Vernon, Philip E. "Symposium on the Effects of Coaching and Practice in Intelligence Tests: Conclusions." British Journal of Educational Psychology, XXIV (1954, Part 2), 57-63. Vernon, Philip E. "The Determinants of Reading Comprehension." EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, XXII (1962), 269-86. Votaw, David F. High School Fundamentals Evaluation Test. Austin, Texas: The Steck Company, 1955. Whyte, William H., Jr. The Organization Man. New York: Simon and Schuster, 1956. Recent Reference Gibb, Bernard Gordon. "Test-wiseness as Secondary Cue Re- sponse." Unpublished doctoral dissertation, Stanford University, 1964. at UNIV OF NORTHERN COLORADO on September 29, 2014epm.sagepub.comDownloaded from http://epm.sagepub.com/

Description

Comments