Transition to technology-based administration through a considered approach.
Many states, including two of California’s close neighbors, Oregon and Washington, have implemented technology-based assessment for their summative tests. Smarter Balanced will be delivered via technology devices to leverage the adaptive nature of the assessments. This administration mode has several significant benefits and opportunities, as well as some challenges.
Benefits and Opportunities: The potential advantages to technology-based assessment include:
Better measurement of key constructs through use of a range of new question types not possible on paper
Use of adaptive or multistage testing for increased test efficiency
Use of automated scoring for certain types of constructed response items, greatly reducing the costs of scoring student-generated responses
Faster return of student results because the time currently used to transport answer documents to the scoring center and scan the documents is eliminated
More efficient data capture and data management because there are fewer steps between when a student records a response and when the response is recorded in the database
Mitigation of some forms of test security risks because there are fewer opportunities for test booklets to be seen by those who should not see them
Reduced procedural burdens on teachers and administrative staff since there are fewer forms to handle and complete
More efficiency and flexibility in the provision of various test accommodations (e.g., read-aloud text, enlargement of text and images for students with visual impairment, presentation of text in sign language, and extended time)
A potentially more motivating environment for students, who are accustomed to using technology in their everyday lives, in and out of school
Challenges: To realize the benefits and opportunities of technology-based delivery of tests, it may be desirable to make this transition as quickly as possible. However, there are significant challenges to reaching this goal, including the following:
LEAs will have to meet the technology requirements to support the assessments of Smarter Balanced. California has participated in the technology evaluations conducted by Smarter Balanced, and the state is aware of the current deficit in technology availability for assessment.
Students and teachers need sufficient opportunities to gain familiarity with the delivery system and the item types used for the tests.
Students would have to be trained to type essays and responses rather than writing them by hand well in advance of any high-stakes administration.
During the transition period, some schools will administer the tests via paper and others will deliver the tests via computer, and administration costs are likely to increase because of the two modes of delivery. The transition period is likely to place an increased administrative burden on district staff because of the two modes of delivery. (This report provides a discussion of a transition plan in Recommendation 2.)
Although computer delivery decreases some forms of test security risk, it may increase the risk of student observation of other monitor screens or electronic security breaches. One way to lessen these security risks is to administer, during the annual testing window, several parallel forms within each grade and course. Another option is to develop these assessments as computer-adaptive tests (CAT) such as Smarter Balanced is planning. The CAT with a sufficient item bank will reduce the likelihood that the same items are appearing on adjacent screens at the same time.
In a state as large and diverse as California, the phased approach of putting each assessment online after the ELA and mathematics tests may have merit. The Smarter Balanced assessments are designed for technology-based administration, and there will be a paper form available for the first three years. It is unknown at this time what happens to this paper version should states continue to need this administration mode. It is likely that there will be an extraordinary amount of resources — both technical and human — put in place to administer the Smarter Balanced assessments online. After this push, California may wish to examine the lessons learned and carefully strategize where and when the next assessment should move to online administration. The assessment that moves the system forward without significant stress is a likely candidate.
The transition to technology-based assessment in Smarter Balanced ELA and mathematics will be substantial, and it would be appropriate to leverage this transition to technology for other content areas and instruction as appropriate. For example, a state assessment expecting a smaller EOC population may be able to make this transition more quickly. It is likely that the transition in content areas not assessed by Smarter Balanced will be constrained more by the school infrastructure in the technology-to-student ratio than any technical measurement issues. Thus, there is great benefit to planning well in advance of a transition to a technology-based administration, and this timeline will allow for the lessons learned in the Smarter Balanced transition to be applied to other content areas.
Reduce the number of students tested when information is used for more global decisions.
While individual student scores are expected to remain a cornerstone of the California assessment system, schools and districts are often keenly interested in obtaining information on a broader range of content than can be measured in one test. This broader measurement of the content standards can be achieved at the group level, with minimal increase in individual student testing time, by using matrix sampling techniques. In this model, an operational test would consist of a substantial core of items taken by all students plus several small subsets of operational items, perhaps 10-15 items each. Each student would take the core set of operational items plus one of the subsets. All of the subsets within a content area could be tested in each school or district, except for very small schools or districts.
The subsets could be developed to probe more deeply a single strand, a specific group of content standards, and/or individual key standards more thoroughly. They would be randomly spiraled among students, and students would receive individual scores on the core, not the subsets.
The core-plus-subset model provides, at an aggregate level, more detailed information for each content area than is currently available. A large number of different items would be distributed through the test-taking population, and thus schools and districts would get a more thorough picture of how they are doing in teaching a wide variety of content.
Because the core-plus-subset approach enriches the information provided by the assessment at the LEA level, the major advantages of this approach would be providing schools and districts more thorough feedback on instructional effectiveness and reducing concerns about narrowing the curriculum. A disadvantage of pursuing this goal, however, is that there would be increased costs for developing the additional items that would be required. Costs would also be incurred for development of appropriate score reports for the LEAs. Careful communication about this design would be required so that stakeholders would understand the design, its purpose, and its appropriate uses.
The current Smarter Balanced model does not include a core-plus-subset design. By adopting the Smarter Balanced assessments, those assessments would provide the core tests. If a broader assessment of the curriculum were desired at the group level, California could augment those assessments with the subset design.
Beyond ELA and mathematics, other content areas could benefit from a matrix sample design. To keep the amount of testing time to a minimum, other content areas might not assess every student every year. These subjects could move to a sample more consistent with that used on NAEP, in which there is no core. This matrix approach would allow for a much richer sampling of content across a body of students. Students in grades 3-8 and high school might take an ELA and a mathematics assessment, plus a social studies assessment assigned to that grade level for that year. In this way, students do not participate in an assessment for every content area, and yet educators and policy leaders garner information about the performance of California students in these subjects on the whole.
In developing this matrix sampling approach in other content areas, there are a number of trade-offs to consider. First, to counter concerns of narrowing the curriculum, the state would be developing additional content assessments for other grade levels. However, the item development quantities potentially might not be as large as typical because the state is not administering that assessment every year. Second, report information would not be available at the student level. Since no student is taking the entire assessment, scaled-score or proficiency data at the individual student level would not be appropriate, and growth scores for individuals would not be possible: these data would be limited to groups. It would be possible, however, to evaluate grade-to-grade comparisons over time, as well as between-grade comparisons if the assessments were developed to support them.
Strengthen security of administration according to stakes of the exam.
As technology advances, it will be necessary to develop and implement security mechanisms that are unique to the next-generation assessments and their administration.
Some of the factors that California has likely considered in relation to the exposure of the new assessments include:
Many items will be unique, distinctive, or at least uncommon constructed-response and performance tasks, and therefore very memorable.
Technology and social networks have made communication among students (and teachers) easier and potentially more viral than ever.
The type of administration used and the size of item pools will have material impacts on the frequency with which individual items are used.
California can minimize exposure to secure items in content areas other than ELA or mathematics using a number of strategies, such as:
Prepublishing all constructed-response and performance prompts if the volume is high enough that memorization is not an issue. California can spiral items and forms within a classroom, school, and/or district to limit the number of students who see any given item.
Prepublishing sample prompts for use in instruction to prepare for assessment when the responses are complex enough that answers cannot be prepared before testing.
Publishing all actual items as well as sample items that will not be used so that the volume negates most attempts at memorization or preparation. The distinction between live items and sample items would not be revealed. This strategy would require development of more items than necessary and a higher cost, but has proven to reduce exposure in our other assessment programs, such as in Virginia.
Staggering the release of constructed-response items and prompts.
In addition to exposure control, unauthorized distribution or access of secure content on assessments will remain a continuing threat. This can occur in any number of ways. In some cases, the student is responsible. For example, students can share items with others after testing, including posting exam questions on the Internet or sending them to others by way of text message or e-mail so that future test takers are given an unfair advantage in testing. State assessment systems often are high-stakes in nature. Unfortunately, this high-stakes reality sometimes produces incentives for school staff to tamper with assessment answer documents. Researching reports of security breaches can be costly. Accordingly, California should focus attention on methods to prevent security breaches in order to lessen the expense for these activities.
California has already implemented formal procedures for auditing test administrations to prevent unauthorized distribution or access of secure content. Unannounced attendance by personnel familiar with the assessment system (either state employees or contracted vendor personnel) can be held at random for most classrooms, and more often for groups that have a high potential for security problems. There are additional tools that the state can apply to the next-generation assessment for essay questions or other constructed-response items. Technologies exist for online assessments that check for similarity across essays. These programs have proven to be useful tools, despite some false positives and required reviews by assessment development staff to verify an actual case of plagiarism.
Establishing requirements and procedures for technology-based testing greatly reduces the access that school staff members have to test materials, which mitigates the potential for responses to be changed after test administration is completed. For this reason, online assessment systems should provide for a full audit trail tracking whenever a test is entered, exited, and reactivated. At a fundamental level, the system can restrict access only during school operation hours. Some systems fully track and time-stamp the activity within the system, like when responses are modified, and can even record individual keystrokes. California should determine the appropriate level of system-based auditing to employ, considering cost and use of the system.
Provide real-time results for computer-scored tests.
Technology-based assessments provide opportunity for results immediately. While not every item can be scored immediately, such as some performance tasks that require human scoring or more intricate constructed-response items, technology-based assessments can provide real-time results on a number of item types, especially when used in settings that are low stakes, such as interim assessments. It is these types of assessments, along with classroom-based formative tools, that are designed to provide actionable information for instruction, and thus educators have the most potential to benefit from this real-time reporting ability.
In an interim assessment, for example, teachers are interested in both the performance of their class overall as well as the performance of individual students on the content focus for that test. California should investigate a comprehensive reporting system that offers broad utility and flexibility to analyze these various levels of aggregation. For example, when reviewing the performance of an individual student, item analysis reports are both informative and revealing, yet care must be taken when using this information to ensure the appropriateness of the inferences from the results.
It is important that report design and content are easily understood, and reports are available on-demand through the assessment management tools system. In our ever-increasingly demanding world — especially that of the classroom teacher, building principal, or district administrator — articulating the information via information graphics or “infographics” is becoming much more the norm and the expectation. Pictures that tell the quick story and can be manipulated and disaggregated are much more helpful in focusing attention on specific aspects of student performance. For example, infographics such as the one below using data from the Organization for Economic Cooperation and Development, are becoming more commonplace in telling the story of status and change over time.
Provide diagnostic information about the next steps in the teaching and learning process.
An important issue in reporting test results to inform classroom instruction is the granularity of results provided. Reporting can be anywhere from one global score — which is not particularly useful for instruction — to many scores, each based on a specific objective or topic of instruction. A complicating issue is that the smaller the grain of reporting, the less reliable the scores being reported, or otherwise a greater quantity of items assessing that construct is needed. The unintended consequence of such unreliable score reports for the uninformed classroom teacher is when he or she makes significant changes in pedagogy or curriculum focus when the results may not warrant this.
A single score based on many items can be highly reliable. Many scores, each based on a few items, are typically very unreliable. The challenge is to reach an acceptable level of reporting that will be reliable but useful for teachers in evaluating student progress. Recent studies (Sinharay, 2010; Sinharay & Haberman, 2008, 2011) have shown that augmented subscores often lead to more accurate diagnostic information than observed subscores.1 The results of that research can be used to help California stakeholders be confident that the levels of reporting are appropriate and defensible based on their purpose.
Providing this information is only half of the equation. Aligned with the mission of the California assessment system articulated in the State Superintendent’s 12 recommendations, California can advance assessment for learning that is focused on improving student learning, building students’ confidence as learners via the use of classroom assessment, and helping teachers learn to use assessment for both accurate measurement and for good instruction, recognizing that different tools are needed for these very different purposes. California can develop a model that helps classroom teachers connect the expectations and performance levels of the state-level assessment to day-to-day classroom assessment practice. Such a model would develop educators who can do the following:
translate content standards into classroom-level learning targets and then into student-friendly versions of standards and targets
develop and use accurate, high-quality assessments in the classroom using the appropriate assessment method
involve students in their own assessment, including keeping track of and communicating their own progress, goal setting, and self-evaluation
create and recognize quality rubrics and performance tasks
assess more efficiently and economically
communicate effectively and accurately about student achievement, including the use of formative feedback
motivate students by making them responsible collaborators in the assessment process
In order to achieve these goals, a significant investment in high-quality ongoing professional development will be critical since formative assessment is at the center of high-quality instruction.
Articulating a coherent assessment system.
Parents, teachers, and others have many questions about how tests are used as tools to improve public education. How are the tests developed? Are they fair to all segments of our diverse student population? How are the results used? How can students, parents, and teachers prepare for tests? And most importantly, how do the tests contribute to improved student learning?
Like many other states, California often is describing the assessment system to two levels of audience. The first audience is interested in the reasonableness of the system. They desire to see how the parts fit together to create a suite of activities that lead to improved learning and teaching. There is a certain segment of stakeholders — mostly parents and the general public — who want to ensure that the system makes sense to them and is an appropriate part of their children’s education. Articulating this reasonableness requires thoughtful and planned information using experts who are as knowledgeable about communication as they are about assessment. California should consider articulating its vision of a comprehensive assessment system using communication experts within its state system as well as those of its current and future contractors. Just as communicating test results in novel ways to teachers in a fast-paced environment can be effective, similar infographics can help explain the assessment system plans to parents and the general public in quick and easy pieces of information.
Articulating a technically defensible process.
In addition to the expectation of a reasonable system to improve teaching and learning, there is a subset of stakeholders who have an interest in and a responsibility to ensure that the assessment system Calfornia develops is one that is technically defensible. As in the methodologies of many professionals, there is more than one correct way to achieve a goal or objective; the same is true in developing an assessment system. It is unlikely that every stakeholder will agree on every decision or component of the assessment system that California eventually builds. However, it is the state’s responsibility to provide the evidence for its confidence in the system that it has developed. Much of this type of evidence lends itself to the validity of the assessments and the overall program objectives they support.
Such communications involve aspects of the following:
how tests are developed
what is meant by the phrase “valid and reliable”
the derivation and meaning of scale scores
the interpretation of scores in light of the measurement error of the tests
appropriate means of comparing student performance
the relationship between performance levels and “grade level performance”
the utility of the tests and cluster scores for making diagnostic inferences
the impact state and federal accountability requirements have on California’s assessment system
In short, teachers and administrators want to be and need to be better informed about assessment, the use and interpretation of test results, and the development of classroom assessments and formative tools and practices — all this will help them determine how to best help students master the required content. Online webinars and modules can help communicate these more technically developed topics to this subset of stakeholders. In addition, the state could work with its higher education systems to develop an online course for pre-service teachers in California that would provide the basics on general large-scale assessment knowledge, as well as information specific to the California system that these novice teachers would need when they step into their classrooms.
Bailey, A., & Kelly, K. (2010). Creating enhanced home language survey instruments. EVEA Products.
Bennett, R. (2013). “Preparing for the future: What educational assessment must Do” in Gordon Commission on the Future of Assessment in Education (2013). To assess, to teach, to learn: A vision for the future of assessment. Retrieved from http://www.gordoncommission.org/rsc/pdfs/technical_report_executive_summary.pdf Bennett, R., & Gitomer, D. (2008). Transforming K-12 assessment: Integrating accountability testing, formative assessment, and professional support (ETS RM-08-13). Retrieved from Educational Testing Service website: https://www.ets.org/Media/Home/pdf/CBAL_TransformingK12Assessment.pdf California Demographics Education, DataQuest. (2013). English language learner students by language and grade, state of California, 2011-2012 [Demographic summary report]. Retrieved from http://dq.cde.ca.gov/dataquest/SpringData/StudentsByLanguage.aspx?Level=State&TheYear=201112&SubGroup=All&ShortYear=1112&GenderGroup=B&CDSCode=00000000000000&RecordType=EL California Department of Education. (2013). Recommendations for transitioning California to a future assessment system. Assessment Development and Administration Division: District, School, and Innovation Branch, Sacramento, CA. Retrieved from http://www.cde.ca.gov/ta/tg/sa/documents/suptrecrpt2013.pdf Conley, D. (2012). A complete definition of college and career readiness. Retrieved from EPIC website: http://www.epiconline.org/publications/documents/College%20and%20Career%20Readiness%20Definition.pdf?force_download=true Darling-Hammond, L. (2010). Performance counts: Assessment systems that support high-quality learning. Washington, DC: Council of Chief State School Officers and Stanford, CA: Stanford Center for Opportunity Policy in Education.
Dorans, N. J. (1999). Correspondences between ACT and SAT I scores (Research Report No. 99-02). Princeton, NJ: Educational Testing Service.
Dorans, N. J., & Walker, M. E. (2007). Sizing up linkages. In N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and Aligning Scores and Scales (pp. 179-198). New York: Springer.
Gordon Commission on the Future of Assessment in Education (2013). To assess, to teach, to learn: A vision for the future of assessment. Retrieved from http://www.gordoncommission.org/rsc/pdfs/technical_report_executive_summary.pdf Guzman-Orth, D. A., Nylund-Gibson, K., Gerber, M. M., & Swanson, H. L. (2013). The classification conundrum: Identifying English learners at risk. (Manuscript in preparation).
Hambleton, R. K., & Kang Lee, M. (2013). Methods for translating and adapting tests to increase cross-language validity. In D. H Saklofske, V. L. Schwean, & C. R. Reynolds, (Eds.). The Oxford Handbook of Child Psychological Assessment. OUP USA. Retrieved from http://books.google.com/books?id=Qb_K4MCcovcC&lpg=PA172&ots=7ZYqgffe6x&dq=%22Language%20Testing%22&lr=lang_en&pg=PA172#v=onepage&q&f=false Herman, J. L., Webb, N. M., & Zuniga, S. A. (2007). Alignment methodologies. Applied measurement in education, 20(1), 1-5.
Kieffer, M. J., Lesaux, N. K., Rivera, M., & Francis, D. J. (2009). Accommodations for English language learners taking large-scale assessments: A meta-analysis on effectiveness and validity. Review of Educational Research, 79 (3), 1168-1201.
Linquanti, R., & Cook, H. G. (2013). Toward a “common definition of English learner”: A brief defining policy and technical issues and opportunities for state assessment consortia. Retrieved from the Council of Chief State School Officer website: http://www.ccsso.org/Documents/2013/Common%20Definition%20of%20English%20Learner_2013.pdf Mancilla-Martinez, J., & Kieffer, M. J. (2010). Language minority learners’ home language use is dynamic. Educational Researcher, 39, 545-546.
National Governors Association Center for Best Practices, Council of Chief State School Officers. (2010). Common Core State Standards. National Governors Association Center for Best Practices, Council of Chief State School Officers, Washington D.C. Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47: 150–174.
Sinharay, S., & Haberman, S. J. (2008). Reporting subscores: A survey (ETS Research Memorandum No. RM-08-18). Princeton, NJ: ETS.
Sinharay, S. & Haberman, S. J. (2011). Equating of augmented subscores. Journal of Educational Measurement, 48, 122-145.
1 Augmented subscores use statistical approaches to borrow information from all items administered to a student to improve the quality of reported subscores on a relatively small number of items.