Standardized Test Bias and ELL /page
Standardized Test Bias Against
English Language Learners: What it is and how to reduce it
TBED 542: Multiculturalism and Acculturation
Dr. Gladys Scott
August 13, 2009
Researchers have studied evidence of bias in standardized tests only to conclude that tests which are well-designed and appropriately formed show no bias (Sattler, 1992: Valdes and Figueroa, 1996). After extensive examination of factors such as item content, sequence, structure, difficulty, factor solutions and predictions, researchers couldn’t find reason to deem such tests as unreliable (Niesser et al., 1996; Sattler, 1992). However, this paper reviews literature which questions the validity of standardized tests when assessing students who are non-native speakers and have been raised in a culture different from the American norm. The research asserts that bias will occur in a test of intelligence, ability or achievement that was developed and normed in the United States and given to students who are culturally and linguistically diverse (Rhodes, Ochoa, Ortiz, 2005). Considering that immigrant families represent 20 percent of the student population in the United States, and that English Language Learners (ELLs) score well below their English-speaking native peers, researchers must look for ways to improve culturally and linguistically diverse student standardized test outcomes (Dorner, Orellana, Li-Grining, 2007).
The literature reviewed in this paper examined the achievement gap in standardized test scores between linguistically and culturally diverse students and their native English-speaking, peers, determined what contributes to the gap and suggested ways that schools can narrow the gap. Studies found two major shortcomings of standardized tests – their cultures (every test projects the culture of its creator) and their norms – in relation to students who are culturally or linguistically diverse. They also recommend ways in which standardized tests can be modified to account for the needs of students who are not native English speakers and factors which boost the scores of ELLs on these tests.
Studies Reflecting Cultural Differences of Test Takers
All assessments of intelligence and cognitive ability reflect the culture (values, beliefs, ideals) of their creators; therefore, performance is based on learning the rules of a society (Rhodes, Ochoa, Ortiz, 2005). Standardized tests are based directly on the principle of “assumption of comparability” which means students are compared to a set of norms to determine their standing and it is assumed those students are similar to those on whom the test was standardized. To be valid, students should be compared to those who are of the same level of acculturation. (Salvia, Ysseldyke, 1991) Because the norms may be inappropriate, the cultural bias occurs. In writing a book that synthesizes established research and theory on the topic to help practitioners, Rhodes, Ochoa, Ortiz, contend that students who don’t have the opportunity to become acculturated at the same pace as their peers are likely to score lower because they don’t have the knowledge and content, not because they are less able (Rhodes, Ochoa, Ortiz, 2005).
Studies on ELLs’ Performance on New York Regents Exams
Three studies looked at different aspects of the achievement gap on the New York Regents high school exiting exams which has become part of New York’s means of satisfying the requirements of the No Child Left Behind Act (Department of Education 2001). These high stakes standardized tests play a central role in assessing student achievement, instructional methods, and school quality. New York is one of 21 states now using high school existing exams for all students, including special needs, culturally diverse and ELLs, to meet NCLB’s requirements. Because ELLs do not perform as well as their native English peers, the tests become a great challenge in education (Dong, 2004).
Only 50% of ELLs students in New York City schools passed the Regents in 2003 (Dong, 2004). Supporting the findings of Rhodes, Ochoa, and Ortiz in regard to standardized tests, Dong attributed the abysmal scores to obviously, a lack of English proficiency, but also to the fact that these students do not share the same cultural experiences as mainstream American students. For example, Dong found that Asian and European students who come to the U.S. have a different approach to test-taking. They may not be used to multiple choice tests, unlike native English speakers who have been trained in that format. In addition to the language and reading challenge posed by the questions, the tests frequently demand inferences which the ELLs have difficulty making. Her findings called for reforming essay tests to have cross cultural understanding in test design and grading (Dong, 2004). Dong referenced the supporting research of Mohan (1986) which compared and contrasted the levels of inference demanded of the reader on the New York Regents. Such semantic inferences, Mohan concluded, were testing cultural, rather than content knowledge, thereby putting the ELL at a distinct disadvantage.
Dong’s recommendations called for greater awareness of student diversity when formatting the test, development of language skills outside the classroom, usage of assessments that go beyond the traditional measures of intelligence and usage of a more precise measure of acculturation and English proficiency in interpreting the results. He supported several accommodations for ELLs taking standardized tests like the Regents which were also proposed by Butler and Stevens (2001). These include text modification strategies, assess students’ content knowledge in their native language, rephrase questions to reduce linguistic complexity, provide cultural notes and glossaries, simplify directions and reduce cultural bias.
According to Dong, the Regents developed multilingual versions for ELL students but not in every language. However, she recommended standardized test reforms on three levels: involve ESL/Bilingual professionals in the test design to review for cultural bias and language, build language awareness into daily lesson plans and classroom assessment for teachers who deal with ELL students daily and use performance assessment techniques to better evaluate ELL students (Dong, 2004). It would seem obvious to include the input of bilingual teachers which has a strong presence in Menken’s study.
While the above-mentioned studies look at the format and structure of the Regents, the research of Kate Menken of City University of New York analyzed how test driven curriculum for the Regents affected the scores of these tests, and denied ELLs from a sufficient bilingual education. Considering that the ELLs across the United States are now being included in statewide assessments, her findings have broad implications. In New York City, ELLs make up 13.8% of the public school population. In 2005, only 33.2% of ELLs passed the English Regents compared to 80.7% by all students. The ELL passage rate was 58.1% for the Math Regents, compared to 81.5% of the general population (Menken, 2005). Since 30% of ELLs drop out of New York City schools, the highest of all students, Menken considered her research important to understand the direction to properly educate ELLs.
In her study of ten New York City high schools in 2005, she found that the Regents were really language proficiency exams and not measurements of content knowledge. Her research determined that the schools tried to raise the scores of ELLs by changing their language policies and “teaching to the test.” She argued that such an approach promoted monolingual instruction and deprived ELLs of true language arts curriculum as their peers received.
While Dong’s research looked at the content of the test, Menken’s study sought to determine how high-stake tests have changed the learning experience for ELLs and understand the language policy implications of the focus of the assessment. Dong conducted her research through data sampling while Menken’s information was acquired through interviews, observations, state, district and school policy documents, standardized test scores, graduation, promotion/retention, and dropout data. Researchers interviewed 128 participants in ten schools including New York City high school teachers, administrators and ELL students. Like Dong, Rhodes, Ochoa and Ortiz, Menken’s research concluded that the Regents relied heavily on language proficiency, including the math portion. Therefore, to raise the scores of ELLs, schools increased the amount of English instruction instead of providing a strong bilingual program (Menken, 2005).
Among the ten schools, administrators took different approaches to deal with the pressure to have positive annual yearly progress reports. For example, School 4 required its 606 ELLs to have a daily double-period English Regents preparation course and a Saturday program in addition to an extended school schedule, often 12 periods, compared to their English proficient peers who attended eight periods a day. All students, including the ELLs, were required to receive a score of at least 65 on the English Regents, 10 more points than the actual statewide passing score (Menken, 2005). In essence, this approach deviated from the strong bilingual program which New York City schools had in use. Additionally, School 4 placed ESL students who just arrived to the U.S. into advanced English Regents preparation courses before they learned English language fundamentals (Menken, 2005).
While School 4 increased English language instruction, School 1 preserved native language instruction. The majority of ELLs were Spanish speakers, all of whom received ESL. Spanish students also received bilingual classes in math, science and social studies. When the teachers realized that the skills on the Advanced Placement Spanish exam and in the national curriculum for the AP course were similar to those required by the English Regents, they required Latino ELLs to take Spanish as a Native Language at the lower levels and Advance Placement Spanish at the more advanced levels. In addition, the school offered an English Regents Preparation course in Spanish. This approach was so successful – increasing pass rates by 50 percent – that it was implemented at other schools as well. Menken reasons that School 4’s approach supports bilingual education research which shows that developing literacy in students’ first language helps them develop literacy in their second language which develops content transfers from the first language (Menken, 2005).
The study’s shortcoming is that it did not report the results of the other nine New York City schools, which is an obvious question since the other schools took an increased English study approach. This research described the approach schools took but lacked a connection between the approach and the students’ performances on the Regents. If the bilingual approach boosted scores by 50% at School 1, what was the outcome of the monolingual approach? To make a viable comparison, one needs more data. Also, what were the scores of all the foreign language students? While Spanish-speakers in the one high school could take AP Spanish, what could be offered to native speakers of other languages, whether European, Middle Eastern or Asian?
One strength of this study, however, is derived from its interviews of teachers and students which gave a clear understanding of the test driven curriculum’s impact on learning. Pages of interviews relate their frustrations. ELLs’ responses were most disturbing as they told researchers that the push to raise scores has narrowed the curriculum so much that they didn’t think they are getting prepared for college. Because they were only learning the content on the test, there was no room for projects and in-depth study of certain topics. The study concluded that language policies in school must be carefully planned and decided upon by teachers, administrators and the community to meet the needs of the students, not determined by high-stakes testing (Menken, 2005). This concurred with Rhodes, Ochoa, Ortiz and Dong who expressed need for bilingual educators to help create the tests and correlating curriculum.
Socio-Economic Factors Affect Standardize Testing Results
In contrast, another study focusing on Latino students, specifically Mexican American students’, examined standardized math test scores using an integrated model that viewed standardized test performance as a result of situational and cultural factors based on individual, family, peer and institutional levels (Morales, Saenz, 2007). Rather than survey students and conduct interviews of teachers and students as Menken’s study did, researchers examined 12 hypothesis using data from a series of math comprehension tests administered by the National Center for Education Statistics (NCES). The probability sample provided a nationally representative sample of schools and high school seniors. For this study, the final sample included 490 Mexican-origin students, which seemed rather small considering the comparison to 7,690 White students. The study concluded that Mexican-origin students scored significantly lower (by 10 points) on the math test than did Whites. This is significant given that the test only has 81 possible points. (Morales, Saenz, 2007). Because the sample considered seniors only, it noted that the gap could have been much larger if it sampled students in earlier grades.
To explain the gap, researchers introduced the different variables and found that 67% of the gap in scores on cognitive math tests was due to differences in SES between Mexican-origin and White students (Morales, Saenz, 2007). Also of note was the effect of generational status since the study questioned the claim that Mexican immigrants hinder success of the Mexican-origin population. When looking at the ethnic gap in cognitive test scores, generational status slightly widens the gap. First generation students scored about four points above second-generation students.
This study considered factors such as students’ study habits, gender, family background and home language, peer pressure, and positive school experiences which were not included in Menken’s research. Such factors would help understand the performance of ELLs on the Regents. Neither study considered the impact of the students’ communities. If the studies had factored in variables regarding the students’ community and neighborhoods, explanations for the achievement gap may have been more defined. For example, the studies did not correlate the community’s crime rate, cultural segregation and poverty level to academic achievement. Unlike the Menken study, the research of Morales and Saenz didn’t acquire student feedback regarding test preparation classes for the math test or if such courses were even part of the curriculum. If the students were getting test preparation classes, were they bilingual or monolingual as in the New York City schools? Such factors would have possibly helped to determine how the school could offset the negative impact of the Mexican-origin student’s lower SES compared to their White peers.
Language Brokering May Increase Standardized Test Performance for ELLs
Sociologists have conducted many studies on the translating and interpreting work of immigrant children, most of which analyzed how that work related to the children’s development (Orellana, Reynolds et al., 2003).
However, the study in this literature review examined this practice more closely to see how language brokering cultivated linguistic, math and social-cultural aptitude which in turn increases these students’ skills on standardized tests in math and reading comprehension (Dorner, Orellana, Li-Grining, 2007). It tested the hypothesis that language brokering is related to academic outcomes and called for further mixed-method studies on the topic. The study sought to answer two questions: What is the scope of children’s experiences with language brokering in a particular Chicago immigrant community and is there a connection between this practice of immigrant households and students’ performances at school?
Because the study looked at students’ scores over a period of five years, it was possible to have better control for their early academic achievement (Dorner, Orellana, Li-Grining, 2007) Since this longitudinal study controlled for children’s gender, exposure to bilingual education and generational status, it could be more confident in its findings. Earlier studies lacked these control groups and studies reviewed in this paper also lacked this perspective.
This 2001 research study was based at the Regan Elementary School in Chicago where 90% of the students were low-income, 40 % were limited English proficient and 75% were Hispanic, most of whom were Mexican. Researchers surveyed the 10 fifth and sixth grade mainstream and bilingual Spanish classes posing questions about children’s preferred language, their lives, and their experiences with translating, interpreting, reading, writing and technology. Of 313 children, 280 responded or about 89% of those surveyed. About half were girls. As expected, 90% of the first and second-generation children said they translated for other people in everyday ways. Researchers created three categories of student activity: active, partial, and non-language brokers. To measure academic outcomes, they used standardized math and reading test scores which were administered as the Iowa Basic Skills (Dorner, Orellana, Li-Grining, 2007).
The research results showed that 35 percent of the students were active language brokers, most of whom spoke Spanish at home and had some bilingual education. Thirty-four percent were partial language brokers and 53 percent were not language brokers. Significantly, by grade five, the active brokers scored an average of eight points higher than both groups in reading and math (Dorner, Orellana, Li-Grining, 2007). Researchers concluded that the language brokering was positively related to their standardized tests scores in reading comprehension. It noted that not all the students who broker would reap higher scores because some students find the activity stressful (Dorner, Orellana, Li-Grining, 2007). The positive results appear to develop among students who are active language brokers so educators and parents must determine how to enhance those skills for application in the classroom.
The study recommended that educators and curriculum developers continue research to determine if training bilingual students who are not active translators would result in higher academic performance.
It can be assumed from the five studies in this literature review that standardized testing poses significant biases against the English Language Learners in both format and content which questions the validity of such assessments. These tests are normed for American students whose primary language is English. These tests continue to inadequately measure content knowledge in an ELL’s native language and phrase questions and directions in a complex manner. When test preparation classes are conducted in English only, districts compromise their bilingual program. To resolve these inequities, researchers urge that ESL/Bilingual professionals be involved in the test design to review for cultural bias and language and to develop performance assessment techniques to evaluate ELL students rather than relying on tests. Research is lacking in longitudinal studies to determine the most effective classroom approach for standardized testing preparation and to learn the factors, aside from socio-economic, which most impact the success of ELLs on standardized tests. More studies must be conducted to learn how to more finely assess cultural proficiency, just as practitioners now measure language proficiency. While many studies look at the Spanish-speaking student in regard to testing, there is a gap in research regarding students of other languages. Studies must examine any test bias toward them, also.
Butler, F., & Stevens, R. (2001). Standrardized assessment of the content knowledge of English
language learners K-12: Current trends and old dilemmas. Language Testing, 18 (4),
Dong, Yu Ren. (2004). Assessing and evaluating ELL students in mainstream classes.
In Terry A. Osborn (Ed.), Teaching Language and Content to Diverse Student (pp. 39-65).
Greenwich, CT: Information Age Publishing.
Dorner, L., Orellana, M., & Li-Grining, C. (2007). “I helped my mom,” and it helped me:
Translating the skills of language brokers into improved standardized test scores.
American Journal of Education, 113, 451-478.
Menken, K. (2006). Teaching to the test: How No Child Left Behind impacts language policy,
curriculum, and instruction for English Language Learners.
Bilingual Research Journal, 30, (2), 521-546.
Mohan, B. (1986). What are we really testing? In B. Mohan Language and Content
(pp. 122-135). Reading, MA: Addition-Wesley.
Morales, M., Saenz, R. (2007). Correlates of Mexican American students’ standardized test
scores: An integrated model approach.
Hispanic Journal of Behavioral Sciences, 2, 349-365.
Neisser, U., Boodoo, G., Bouchard, T., Boykin, A., Brody, N., Ceci, S.J., et al. (1996).
Intelligence: Knowns and unknowns. American Psychologist, 51, 77-101.
No Child Left Behind Act. (2002) Pub. L. No. 107-110.
Orellana, M., Reynolds, J. Dorner, & M. Meza. 2003. “In other words: Translating or ‘para-
phrasing’ as a family literacy practice in immigrant households. Reading Research
Quarterly 38 (1) 12-34.
Rhodes, R., Ochoa, S., & Ortiz, S. (2005). Acculturational factors in psyshoeducational
assessment. Kenneth W. Merrell (editor), Assessing Culturally and Linguistically Diverse
Students (pp.124-135). New York, NY: The Guilford Press.
Sattler, J. (1992) Assessment of children (rev. 3rd ed.). San Diego, CA: Sattler
Salvia, J., & Ysseldyke, L.E. (1991). Assessment (5th ed.). New York: Houghton Mifflin.
Valdes, G., & Figueroa, R.A. (1996). Bilingualism and testing: A special case of bias.
Norword, NJ: Ablex.