Multiple-choice test items: selected references

Download 93.6 Kb.
Size93.6 Kb.

The International Research Foundation

for English Language Education


(Last updated 12 September 2016)

Ackerman, T. A., & Smith, P. L. (1988). A comparison of the information provided by essay, multiple-choice, and free-response writing tests. Applied Psychological Measurement, 12(2), 117-128.

Albanese, M. A., Kent, T. H., & Whitney, D. R. (1979). Cluing in multiple-choice test items with combinations of correct responses. Academic Medicine, 54(12), 948-50.

Al-Hamly, M., & Coombe, C. (2005). To change or not to change: Investigating the value of MCQ answer changing for Gulf Arab students. Language Testing, 22(4), 509-531. Retrieved from

Amini, M., & Ibrahim-González, N. (2012). The washback effect of cloze and multiple choice test items on vocabulary acquisition. Language in India, 12(7), 71-91.

Attali, Y., & Bar‐Hillel, M. (2003). Guess where: The position of correct answers in multiple‐choice test items as a psychometric variable. Journal of Educational Measurement, 40(2), 109-128.

Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short-answer questions in a marketing context. Journal of Marketing Education, 25(1), 31-36.
Bailey, K. M., & Curtis, A. (2015). Learning about language assessment: Dilemmas, decisions and directions (2nd ed.). Boston, MA: National Geographic Learning.

Becker, W. E., & Johnston, C. (1999). The relationship between multiple choice and essay response questions in assessing economics understanding. Economic Record, 75(4), 348-357.

Bennett, R. E., Rock, D. A., & Wang, M. (1991). Equivalence of free‐response and multiple‐choice items. Journal of Educational Measurement, 28(1), 77-92.
Ben‐Shakhar, G., & Sinai, Y. (1991). Gender differences in multiple‐choice tests: the role of differential guessing tendencies. Journal of Educational Measurement, 28(1), 23-35.
Birenbaum, M., & Tatsuoka, K. K. (1987). Open-ended versus multiple-choice response formats—it does make a difference for diagnostic purposes. Applied Psychological Measurement, 11(4), 385-395.
Bormuth, J. R. (1967). Comparable cloze and multiple-choice comprehension test scores. Journal of Reading, 10(5), 291-299.
Brame, C. J. (2014). Writing good multiple choice test questions. Nashville, TN: Vanderbilt University. Retrieved from

Bridgeman, B. (1992). A comparison of quantitative questions in open‐ended and multiple‐choice formats. Journal of Educational Measurement, 29(3), 253-271.

Bridgeman, B., & Lewis, C. (1994). The relationship of essay and multiple‐choice scores with grades in college courses. Journal of Educational Measurement, 31(1), 37-50.
Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33-63.
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment. New York, NY: McGraw Hill.
Bruno, J. E., & Dirkzwager, A. (1995). Determining the optimal number of alternatives to a multiple-choice test item: An information theoretic perspective. Educational and Psychological Measurement, 55(6), 959-966.
Buck, G., Tatsuoka, K., & Kostin, I. (1997). The subskills of reading: Rule‐space analysis of a multiple‐choice test of second language reading comprehension. Language Learning, 47(3), 423-466.
Burton, R. F. (2005). Multiple‐choice and true/false tests: Myths and misapprehensions. Assessment & Evaluation in Higher Education, 30(1), 65-72.
Burton, S. J., Sudweeks, R. R., Merrill, P. F., & Wood, B. (1991). How to prepare better multiple-choice test items: Guidelines for university faculty. Provo, UT: Brigham Young University Testing Services.
Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of Further and Higher Education, 25(2), 157-163.
Butler, A. C., Karpicke, J. D., & Roediger III, H. L. (2007). The effect of type and timing of feedback on learning from multiple-choice tests. Journal of Experimental Psychology: Applied, 13(4), 273.
Butler, A. C., & Roediger, H. L. (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36(3), 604-616.
Celce-Murcia, M., Kooshian, G. B., & Gosak, A. J. (1974). Goal: Good multiple-choice language test items. English Language Teaching 28(3), 257-262.
Cheng H.F.(2004). A comparison of multiple-choice and open ended formats for the assessment of listening proficiency in English. Foreign Language Annals, 37(4), 544-555. Retrieved from

Chiramanee, T., & Currie, M. (2010). The effect of the multiple-choice item format on the measurement of knowledge of language structure. Language Testing, 27(4), 471-491.

Cizek, G. J., & O'Day, D. M. (1994). Further investigation of nonfunctioning options in multiple-choice test items. Educational and Psychological Measurement, 54(4), 861-872.
Crocker, L., & Schmitt, A. (1987). Improving multiple-choice test performance for examinees with different levels of test anxiety. The Journal of Experimental Education, 55(4), 201-205.
Cross, L. H., & Frary, R. B. (1977). An empirical test of Lord's theoretical results regarding formula scoring of multiple‐choice tests. Journal of Educational Measurement, 14(4), 313-321.
Currie, M., & Chiramanee, T. (2010). The effect of the multiple-choice item format on the measurement of knowledge of language structure. Language Testing, 27(4), 471-479. Retrieved from
Daneman, M., & Hannon, B. (2001). Using working memory theory to investigate the construct validity of multiple-choice reading comprehension tests such as the SAT. Journal of Experimental Psychology: General, 130(2), 208.
Davis, F. B. (1959). Estimation and use of scoring weights for each choice in multiple-choice test items. Educational and Psychological Measurement, 19(3), 291-298.
Delgado, A. R., & Prieto, G. (2003). The effect of item feedback on multiple‐choice test responses. British Journal of Psychology, 94(1), 73-85.
Divgi, D. R. (1986). Does the Rasch model really work for multiple choice items? Not if you look closely. Journal of Educational Measurement, 23(4), 283-298.
Dolly, J. P., & Williams, K. S. (1986). Using test-taking strategies to maximize multiple-choice test scores. Educational and Psychological Measurement, 46(3), 619-625.
Drasgow, F., Levine, M. V., Tsien, S., Williams, B., & Mead, A. D. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19(2), 143-166.
Dressel, P. L., & Schmid, J. (1953). Some modifications of the multiple-choice item. Educational and Psychological Measurement, 13(4), 574-595.
Dudley, A. (2006). Multiple dichotomous-scored items in second language testing: Investigating the multiple true-false item type under norm-referenced conditions. Language Testing, 23(2), 198-227. Retrieved from

Ellsworth, R. A., Dunnell, P., & Duell, O. K. (1990). Multiple-choice test items: What are textbook authors telling teachers? The Journal of Educational Research, 83(5), 289-293.

Farley, J. K. (1989). The multiple-choice test: Writing the questions. Nurse Educator, 14(6), 10-12.
Farr, R., Pritchard, R., & Smitten, B. (1990). A description of what happens when an examinee takes a multiple‐choice reading comprehension test. Journal of Educational Measurement, 27(3), 209-226.
Frary, R. B. (1980). The effect of misinformation, partial information, and guessing on expected multiple-choice test item scores. Applied Psychological Measurement, 4(1), 79-90.
Frary, R. B. (1995). More multiple-choice item writing do's and don'ts. Practical Assessment, Research & Evaluation, 4(11). Retrieved from

Frary, R. B., Tideman, T. N., & Watts, T. M. (1977). Indices of cheating on multiple-choice tests. Journal of Educational and Behavioral Statistics, 2(4), 235-256.

Frederick, R. I., & Foster, H. G. (1991). Multiple measures of malingering on a forced-choice test of cognitive ability. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3(4), 596-602.
Freedle, R., & Kostin, I. (1999). Does the text matter in a multiple-choice test of comprehension? The case for the construct validity of TOEFL's minitalks. Language Testing, 16(1), 2-32.
Friedman, S. & Cook, G. (1995). Is an examinee’s cognitive style related to the impact of answer-changing on multiple-choice tests? Journal of Experimental Education, 63(3), 199-213.
Fuhrman, M. (1996). Developing good multiple-choice tests and test questions. Journal of Geoscience Education, 44(4), 379-84.
Geiger, M. (1991a). Changing multiple choice answers: A validation and extension. College Student Journal, 25(2), 181-186.

Geiger, M. (1991b). Changing multiple-choice answers: Do students accurately perceive their performance? The Journal of Experimental Education, 59(3), 250-257.

Geiger, M. (1996). On the benefits of changing multiple-choice answers: Student perception and performance. Education, 117, 108-116.

Green, K. (1981). Item-response changes on multiple-choice tests as a function of test anxiety. Journal of Experimental Education, 49(4), 225-228.

Haladyna, T. M., & Downing, S. M. (1989). Validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 51-78.

Haladyna, T. M. (2012). Developing and validating multiple-choice test items. New York, NY: Routledge.

Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item-writing rules. Applied measurement in education, 2(1), 37-50.
Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-choice test item?. Educational and Psychological Measurement, 53(4), 999-1010.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309-333.
Haladyna, T. M., & Shindoll, R. R. (1989). Item shells: A method for writing effective multiple-choice test items. Evaluation & the Health Professions, 12(1), 97-106.
Hambleton, R. K., Roberts, D. M., & Traub, R. E. (1970). A comparison of the reliability and validity of two methods for assessing partial knowledge on a multiple‐choice test. Journal of Educational Measurement, 7(2), 75-82.
Hancock, G. R. (1994). Cognitive complexity and the comparability of multiple-choice and constructed-response test formats. The Journal of Experimental Education, 62(2), 143-157.
Hansen, J. D., & Dexter, L. (1997). Quality multiple-choice test questions: Item-writing guidelines and an analysis of auditing testbanks. Journal of Education for Business, 73(2), 94-97.
Heim, A. W., & Watts, K. P. (1967). An experiment on multiple-choice versus open-ended answering in a vocabulary test. British Journal of Educational Psychology, 37(3), 339-346.
Helwig, R., Rozek-Tedesco, M. A., Tindal, G., Heath, B., & Almond, P. J. (1999). Reading as an access to mathematics problem solving on multiple-choice tests for sixth-grade students. The Journal of Educational Research, 93(2), 113-125.
Horst, P. (1933). The difficulty of a multiple choice test item. Journal of Educational Psychology, 24(3), 229-232.
In'nami, Y., & Koizumi, R. (2009). A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats. Language Testing, 26(2), 219-244. Retrieved from html

Kehoe, J. (1995). Writing multiple-choice test items. Practical Assessment, Research & Evaluation, 4(9). Retrieved from

Kruglov, L. P. (1953). Qualitative differences in the vocabulary choices of children as revealed in a multiple-choice test. Journal of Educational Psychology, 44(4), 229-243.
Kulhavy, R. W., & Anderson, R. C. (1972). Delay-retention effect with multiple-choice tests. Journal of Educational Psychology, 63(5), 505-512.
Lehrl, S., Triebig, G., & Fischer, B. (1995). Multiple choice vocabulary test MWT as a valid and short test to estimate premorbid intelligence. Acta Neurologica Scandinavica, 91(5), 335-345.
Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational and Behavioral Statistics, 4(4), 269-290.
Little, J. L., Bjork, E. L., Bjork, R. A., & Angello, G. (2012). Multiple-choice tests exonerated, at least of some charges: Fostering test-induced learning and avoiding test-induced forgetting. Psychological Science, 23(11), 1337-1344.

Lord, F. M. (1952). The relation of the reliability of multiple-choice tests to the distribution of item difficulties. Psychometrika, 17(2), 181-194.

Lukhele, R., Thissen, D., & Wainer, H. (1994). On the relative value of multiple‐choice, constructed response, and examinee‐selected items on two achievement tests. Journal of Educational Measurement, 31(3), 234-250.
Marsh, E. J., Roediger, H. L., Bjork, R. A., & Bjork, E. L. (2007). The memorial consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14(2), 194-199.
Mason, V. (1984). Using multiple-choice tests to promote homogeneity of class ability levels in large EGP and ESP programs. System, 12(3), 263-271.

Mason, V. (1992). A good word for multiple-choice tests. CATESOL Journal, 5(2), 29-44.

Masters, J. C., Hulsmeyer, B. S., Pike, M. E., Leichty, K., Miller, M. T., & Verst, A. L. (2001). Assessment of multiple-choice questions in selected test banks accompanying text books used in nursing education. The Journal of Nursing Education, 40(1), 25-32.
McCoubrie, P. (2004). Improving the fairness of multiple-choice questions: A literature review. Medical Teacher, 26(8), 709-712.

Meara, P., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests. Language Testing, 4(2), 142-154.

Mehrens, W.A. & Lehman, I.J. (1978). Measurement and evaluation in education and psychology (2nd edition). New York, NY: Holt, Rinehart and Winston.
Mitkov, R., An Ha, L., & Karamanis, N. (2006). A computer-aided environment for generating multiple-choice test items. Natural Language Engineering, 12(02), 177-194.
Morrison, S., & Free, K. W. (2001). Writing multiple-choice test items that promote and measure critical thinking. Journal of Nursing Education, 40(1), 17-24.
Morrison, S., & Free, K. W. (2001). Writing multiple-choice test items that promote and measure critical thinking. Journal of Nursing Education, 40(1), 17-24.
Nevo, N. (1989). Test-taking strategies on a multiple-choice test of reading comprehension. Language Testing, 6(2), 199-215.
Nicol, D. (2007). E‐assessment by design: Using multiple‐choice tests to good effect. Journal of Further and Higher Education, 31(1), 53-64.
Norris, S. P. (2009). Informal reasoning assessment: Using verbal reports of thinking to improve multiple-choice test validity. In J. F. Voss, D. N. Perkins, & J. W. Segal (Eds.), Informal reasoning and education (pp. 451-471). New York, NY: Routledge.

Oller, J.W., Jr. (1979). Language tests at school. London, UK: Longman.

Paxton, M. (2000). A linguistic perspective on multiple-choice questioning. Assessment & Evaluation in Higher Education, 25(2), 109-119.

Pressley, M., & Ghatala, E. S. (1988). Delusions about performance on multiple-choice comprehension tests. Reading Research Quarterly, 454-464.
Pressley, M., Ghatala, E. S., Woloshyn, V., & Pirie, J. (1990). Sometimes adults miss the main ideas and do not realize it: Confidence in responses to short-answer and multiple-choice comprehension questions. Reading Research Quarterly, 232-249.
Pyrczak, F. (1972). Objective evaluation of the quality of multiple-choice test items designed to measure comprehension of reading passages. Reading Research Quarterly, 8(1), 62-71.
Rankin, E. F., & Culhane, J. W. (1969). Comparable cloze and multiple-choice comprehension test scores. Journal of Reading, 13(3), 193-198.
Rodriguez, M. C. (2003). Construct equivalence of multiple‐choice and constructed‐response items: A random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163-184.
Rodriguez, M. C. (2005). Three options are optimal for multiple‐choice items: A meta‐analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13.
Roediger III, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(5), 1155.
Roid, G.H., & Haladyna, T.M. (1980). The emergence of an item-writing technology. Review of Educational Research, 50(2), 293-314.
Rosenthal, R., & Rubin, D. B. (1989). Effect size estimation for one-sample multiple-choice-type data: Design, analysis, and meta-analysis. Psychological Bulletin, 106(2), 332-337.
Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23(4), 441-474.
Schultheis, N. M. (1998). Writing cognitive educational objectives and multiple-choice test questions. American Journal of Health-system Pharmacy, 55(22), 2397-2401.
Scouller, K. (1998). The influence of assessment method on students' learning approaches: Multiple choice question examination versus assignment essay. Higher Education, 35(4), 453-472.
Scouller, K. M., & Prosser, M. (1994). Students' experiences in studying for multiple choice question examinations. Studies in Higher Education, 19(3), 267-279.
Shizuka, T., Takeuchi, O., Yashima, T. & Yoshizawa, Y. (2006). A comparison of 3 and 4 option English tests for university entrance selection purposes in Japan. Language Testing, 23(1), 35-57.

Smith, J.K. (1982). Converging on correct answers: A peculiarity of multiple-choice items. Journal of Educational Measurement, 19(3), 211-220.

Spaan, M. (2007). Evolution of a test item. Language Assessment Quarterly, 4(3), 279-293. Retrieved from
Spolsky, B. (1986). A multiple choice for language testers. Language Testing, 3, 147-158.
Steinberg, R. N., & Sabella, M. S. (1997). Performance on multiple-choice diagnostics and complementary exam problems. Physics Teacher, 35, 150-155.
Stewart, J. (2014). Do multiple-choice options inflate estimates of vocabulary size on the VST.  Language Assessment Quarterly, 11(3), 271-282. Retrieved from
Tamir, P. (1971). An alternative approach to the construction of multiple choice test items. Journal of Biological Education, 5(6), 305-307.
Tarrant, M., Knierim, A., Hayes, S. K., & Ware, J. (2006). The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Education in Practice, 6(6), 354-363.

Tarrant, M., & Ware, J. (2008). Impact of item‐writing flaws in multiple‐choice questions on student achievement in high‐stakes nursing assessments. Medical Education, 42(2), 198-206.

Tarrant, M., Ware, J., & Mohammed, A. M. (2009). An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis. BMC medical education, 9(1), 40.
Thissen, D., & Steinberg, L. (1984). A response model for multiple choice items. Psychometrika, 49(4), 501-519.
Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple‐Choice Models: The Distractors Are Also Part of the Item. Journal of Educational Measurement, 26(2), 161-176.
Thissen, D., Wainer, H., & Wang, X. B. (1994). Are tests comprising both multiple‐choice and free‐response items necessarily less unidimensional than multiple‐choice tests? An analysis of two tests. Journal of Educational Measurement, 31(2), 113-123.
Tinkelman, S. N. (1968). Checklist for reviewing local school tests. In N. E. Gronlund (Ed.), Readings in measurement and evaluation (pp. 103-108). New York, NY: McMillan.
Traub, R. E., & Fisher, C. W. (1977). On the equivalence of constructed-response and multiple-choice tests. Applied Psychological Measurement, 1(3), 355-369.
Treagust, D. (1986). Evaluating students' misconceptions by means of diagnostic multiple choice items. Research in Science Education, 16(1), 199-207.
Votaw, D. F. (1936). The effect of do-not-guess directions upon the validity of true-false or multiple choice tests. Journal of Educational Psychology, 27(9), 698-703.
Wainer, H., & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6(2), 103-118.
Ward, W. C. (1982). A comparison of free-response and multiple-choice forms of verbal aptitude tests. Applied Psychological Measurement, 6(1), 1-11.
Wesman, A.G. (1971). Writing the test item. In R.L. Thorndike (Ed.) Educational measurement (1st ed., pp. 99-111). Washington, DC: American Council on Education.
Wilhite, S. C. (1986). The relationship of headings, questions, and locus of control to multiple-choice test performance. Journal of Literacy Research, 18(1), 23-40.
Willey, C. F. (1960). The three-decision multiple-choice test: A method of increasing the sensitivity of the multiple-choice item. Psychological Reports, 7(3), 475-477.

Yi'an, W. (1998). What do tests of listening comprehension test? A retrospection study of EFL test-takers performing a multiple-choice task. Language Testing, 15(1), 21-44.

Zeidner, M. (1987). Essay versus multiple-choice type classroom exams: The student’s perspective. The Journal of Educational Research, 80(6), 352-358.
Zimmerman, D. W., & Williams, R. H. (1965). Chance success due to guessing and non-independence of true scores and error scores in multiple-choice tests: Computer trials with prepared distributions. Psychological Reports, 17(1), 159-165.


177 Webster St., #220, Monterey, CA 93940 USA

Web: / Email:

Download 93.6 Kb.

Share with your friends:

The database is protected by copyright © 2020
send message

    Main page