Through discussion with the state decide whether or not there is a need for a matriculation exam or an alternative to a matriculation exam
Written decision regarding need for, or alternative to, matriculation exam
If decided, develop plan to implement matriculation exam or alternative to matriculation exam
CDE and SBE
Prepare specifications and timeline to implement matriculation exam or alternative to matriculation exam
Specifications and timeline for implementing matriculation exam or alternative to matriculation exam
Intermediate Considerations for Recommendation 10:
The SSPI’s Recommendations articulate the potential advantages of these types of exams.
Matriculation or qualification examinations are used in numerous countries to assess student acquisition of prerequisite knowledge and skills for entrance into college, career, and/or upper high school levels. The use of such examinations in the United States is rare, but the potential benefits of this type of examination to students, LEAs, colleges, and business alike suggests that consideration be given to the idea of introducing them in California. Matriculation examinations can provide students with evidence of their requisite skills for prospective colleges or employers; in turn, these exams could make assessment relevant to students in a way that few other past state exams have.
In California, the concept of matriculation examinations was most recently introduced during the 2011-12 legislative session by Assembly Member [Susan] Bonilla in Assembly Bill (AB) 2001 [and the concept was again reintroduced in the current legislative session as AB 959]. AB 2001 called for California’s statewide assessment reauthorization legislation to include:
(a) A plan to bring together elementary and secondary school policy leaders, the community colleges, the California State University, the University of California, private colleges and universities, and postsecondary career technical and vocational programs to develop criteria and create non-punitive pathways in which assessments taken by middle and high school students are aligned with college and career readiness and may be recognized as one of a number of multiple measures for entry into college, placement in college-level courses, and career training.
(b) A plan for transitioning to a system of high-quality, non-punitive assessments that has tangible meaning to individual middle and high school students, including, but not limited to, recognition and rewards for demonstrating mastery of subject matter and progress toward mastery of subject matter. (pp. 45-46)
10.1 Exploration of Matriculation Exam Options
The SSPI Recommendations promote the examination of these options in California:
Assembly Bill 2001 was not enacted into law, but as the state considers its next generation of assessments, [California can engage in] further research and discussion . . . regarding matriculation examinations, including exam format (i.e., written, oral), cost, fee coverage (e.g., student, LEA), and ways in which such exams could be used to meet high school exit requirements. (p. 45)
Matriculation exams are typically used to provide information to employers or colleges regarding student readiness for employment or postsecondary education. The matriculation exam system such as traditionally practiced in some countries includes administration of two levels of exams — the O-level is taken first and signals career readiness, and the A-level is taken after two years of additional study to determine college readiness.
However, Assembly Bill 2001 clearly articulates the need to “streamline and reduce state-mandated middle and high school testing.” As a means toward this end, the Smarter Balanced tests could be repurposed: the summative grade 11 tests could be considered for matriculation purposes. Consistent with the goals of the matriculation process, cut scores could be established for “O” (career readiness) and “A” (college readiness) levels. In addition, if the state continues to develop other assessments beyond the scope of Smarter Balanced in such areas as science and social studies, then these exams could also be used for matriculation purposes and cut scores could be established to determine “O” and “A” levels. This might include other end-of-course exams such as biology or U.S. history.
Recommendation 11 – Conduct Comparability Studies
Conduct comparability studies to link performance on the STAR assessments with performance on SBAC.
There are a number of major differences between the Smarter Balanced and STAR tests, including but not limited to the content assessed, constructs of measurement interest, item types, and administration mode. In addition, the STAR assessments are paper-and-pencil tests (PPT) and, with the exception of the essay portion of the grade 4 and 7 ELA tests, are comprised of multiple choice (MC) items only. In contrast, the Smarter Balanced tests will be administered online and will include a computer-adaptive test (CAT) component and a performance component. Smarter Balanced plans to make a PPT option available for the first three years of the operational assessment.
With such substantive differences in content standards and modes of administration, some programs have opted for a complete and separate break between assessments administered under one program to those offered under such a different set of conditions. In these situations a new reporting scale is set and the first administration of the new assessment becomes the new “baseline” for future comparisons. Most importantly, the scores from the new and previous assessment are not comparable, a new trend line is established, and no comparability studies would be required.
However, stakeholders often want to compare student performance on old and new assessments. Formally creating a mechanism for comparison, in the form of a concordance between the two performance measures, can curtail the misinterpretation of results that stakeholders may create during the bridge years in the absence of official information. Should California wish to conduct a comparability study, two options could be considered: either conduct the study during the Smarter Balanced field test year (spring 2014) or during the first year of the operational administration (spring 2015). The two implementation options are presented below.
Table 11: Immediate Implementation Tasks for Recommendation 11
Option 1: Use test data from the CSTs and the Smarter Balanced field test
Task # (Option 1)
Draft study design plan
Design comparability studies
Data collection for STAR and Smarter Balanced
Data is collected from the subset of CA students that participate in the Smarter Balanced test and is matched with their STAR data
Conduct comparability studies
Conduct concordance studies for all Smarter Balanced tests
Concordance table for each test and a report summarizing the procedure
Develop cut score concordance
Map preliminary cut scores for Smarter Balanced onto the STAR scale and compare to CST performance level cut scores
Cut score concordance table for each test
Option 2: Use Smarter Balanced operational data
Task # (Option 2)
Draft study design plan
Design comparability studies
Assemble common items from the STAR test
Select common STAR items to embed into the Smarter Balanced assessment that will serve as the trend set
Gather item information during the Smarter Balanced operational administration
Conduct comparability studies
Conduct concordance studies for all Smarter Balanced tests and map the cut scores for Smarter Balanced onto the STAR scale; compare to CST performance cuts
Concordance table for each test; cut score concordance table for each test and a report summarizing the procedure
Intermediate Considerations for Recommendation 11:
The details of these two options in establishing comparability are discussed below.
11.1 Comparability via Smarter Balanced Field Testing
The first option would leverage the subsample of California students who will participate in the Smarter Balanced field test (FT) and the STAR spring 2014 administrations. These students would have their spring STAR response records matched to their corresponding Smarter Balanced FT response records. Score distributions could be compared and a concordance developed using one of two single-group linking methods described below. The method chosen will be dependent on the resulting data. If a correlation of 0.87 or greater exists between the Smarter Balanced FT and CST scores, a concordance linking approach is recommended (Dorans, 1999; Dorans & Walker, 2007). If the relationship is less than 0.87, a projection linking method is advised.
Concordance linking: An equipercentile concordance of Smarter Balanced and CST scores could be established using the smoothed joint distribution. Smarter Balanced test distributions would be divided into a certain number of increments to match CST distributions. Based on these concordances, Smarter Balanced scores corresponding to CST scale scores could be identified.
Projection linking: The probabilities from the smoothed joint distributions could be used to create projection tables containing conditional cumulative distributions of CST scale scores for Smarter Balanced scores. The projected conditional distributions could then be used to identify the Smarter Balanced scores associated with 50%, 60%, and 70% of students scoring at or above the CST cut scores. (Logistic regression would be used for this method).
With this option, it is expected that the concordance between performance levels will not be known at the time of the study, but can be mapped after standard setting is complete for the Smarter Balanced tests. A limitation of this approach is that, depending on the Smarter Balanced FT design, it is possible that no students will take a full test, which may impact the quality of the concordance. In addition, the concordance analyses can only be done after the Smarter Balanced scales have been established and approved.
11.2 Comparability via Smarter Balanced Operational Administration
The second option would embed STAR items into the first operational administration of Smarter Balanced in spring 2015. With the approval of Smarter Balanced, a subset of STAR items aligned to CCSS could be appended to or embedded in the Smarter Balanced tests using the field-test slots. STAR items would then be calibrated along with the Smarter Balanced operational items, serving as the bridge to develop a concordance between STAR and Smarter Balanced tests. A limitation of this approach is that when CST items are embedded in Smarter Balanced tests, the common items may not be representative of both tests, which may lead to biased estimation of the concordance.
In the transition from one assessment to another, decisions about managing the comparisons between the two assessments need to take into consideration a number of factors including the need or desire to maintain a “trend score,” as well as the differences between the two assessments. Given the distinct differences between Smarter Balanced and STAR tests, California may wish to conduct a one-time study to provide a bridge between performance on the two assessments. The bridge between the two tests will be, at best, in the form of a one-time concordance between test scores, which will relate the scores of the Smarter Balanced test to the STAR test.
Recommendation 12 – Maintain a Continuous Cycle of Improvement of the Assessment System
Provide for a continuous cycle of improvement to the statewide student assessment system.
Continuous improvement is, by definition, never complete. This recommendation expects that there is a documented and formal procedure for improving the assessment system over time. The ultimate goal is to develop a standardized system for development of new and/or improvement of existing assessment features, piloting those features, and adding them to the assessment system in an orderly manner and in a timely fashion. While the standardized system will take time to develop and mature, the guiding principles of such a system are ripe for discussion as the state makes the transition to the California Measurement of Academic Performance and Progress for the 21st Century (CalMAPP21).
Table 12: Immediate Implementation Tasks for Recommendation 12
CDE, SBE, State Stakeholders, Testing Contractor
Develop plans for a comprehensive evaluation program
Alignment evaluation specifications
Prepare specifications and timelines for evaluation of alignment of standards, instruction, and assessment, including the use of the formative and interim assessment data to impact instruction
Specifications for alignment evaluation
Validity study specifications
Prepare specifications and timelines for evaluation of validity, utility, and impact
Intermediate Considerations for Recommendation 12:
A robust assessment system is one that provides accurate and relevant data that can be used to draw reliable and valid inferences of interest as pertains to student learning and instruction. Development and maintenance, as well as identification of opportunities to improve the rigor, validity, and reliability of an assessment system, are critical if the intended goals for score use are to be met. As educational needs evolve, so must the assessment system. To this end, the following evaluations may provide useful data to initially inform assessment system improvements and contribute to the standardized system for the development of new assessments and the improvement of existing ones.
12.1 Alignment and Instructional Sensitivity
The state should conduct periodic evaluations of alignment of standards, curriculum, instruction, and assessment, as well as the extent to which assessments both inform instruction and measure improvements caused by changes in instruction. Results of these evaluations can be used to update the assessment system (e.g., features, components, or focus of measurement interest) and inform associated professional development for teachers to better support policy goals related to curriculum and instruction.
12.2 Validity, Utility, and Impact
California should conduct periodic evaluations of the assessment system related to the validity and utility of test scores and the impact of the assessments. These evaluations can be attained by an ongoing collection of evidence to support the validity of assessment scores (e.g., evidence based on test content, response processes, internal structure of assessments, relationships to other variables, and intended consequences of testing). Results from these evaluations can be used to determine any additional requirements that are needed to support teaching and learning for all students and to provide continual refinement of the assessment system.
12.3 Scale Stability and Performance Standards
For all assessments, evaluation of scale stability and performance standards is critical as new curriculum is fully implemented and schools transition to online instruction and assessment. Results of these evaluations should be used to adjust scales and performance standards as needed.
Like all states in the two major consortia, California stands at a crossroads in large-scale assessment. Under the ESEA era, states have worked tirelessly to achieve the expected technical quality required of such a testing system. California is no different in this respect, and yet the state has long been ahead of the country in providing a vision of what its assessment system can be.
Well before the No Child Left Behind era brought grade-level standards and assessments into sharp focus as key in this accountability movement, California had already pushed to the forefront, establishing rigorous content standards for ELA and mathematics in the early 1990s. The state had likewise begun the development of an assessment system aligned to these standards that transitioned over time to a custom-built, criteria-referenced assessment system. This system was well ahead of the progress of other states in providing summative information about student performance on a state-approved set of content standards.
California has also been a heralded leader in the movement to promote collegeand career readiness for all students. When other states were meeting around the table with policymakers to determine their options in providing the right signals about this critical marker, the California K-12 system was already implementing its Early Assessment Program, which came from a highly collaborative partnership with is postsecondary counterparts. The EAP system has garnered national attention for its design and implementation.
There is a new opportunity to lead again as state assessment systems move into their next generation. California is uniquely situated to be a leader in building a comprehensive state assessment system because of its prominence within the Smarter Balanced Assessment Consortium and because of its track record of innovation. While Smarter Balanced will alleviate some of the challenges faced by a comprehensive state assessment system (challenges such as how to elicit a greater depth of evidence about what students know and can do), there is plenty of room left for the state to innovate and address concerns regarding the narrowing of the curriculum and the provision of the right interim assessments and formative tools in content areas other than ELA and mathematics.
Through a focused plan on building a coherent and comprehensive assessment system, California has the opportunity to lead the nation in getting it right. With innovative approaches to assessing content, efficient means of administering assessments, and meaningful information that informs classroom teaching and learning, California has the opportunity to demonstrate what a fully developed assessment system looks like while leveraging the benefits of the Smarter Balanced Assessment Consortium.
Assessment across the country is now taking a different turn. States like California can take advantage of what we know and what we have invented since the last authorization of the state’s assessment system. We now have more innovative and valid means of assessing what students know and can do through item types that reach beyond our previous bounds. Technology now provides the opportunity for more innovation in items, administration, and reporting than there was even just a few years ago, and it is becoming more ubiquitous all the time. With these developments, California can establish an assessment system that is more responsive to the expectations of its users and stakeholders, one that models and promotes high-quality teaching and student learning.
Appendix: Long-Term Possibilities
A Vision toward the Future
While California will have a long list of activities to complete over the next three to five years if all twelve State Superintendent recommendations are enacted, there are other enhancements and revisions to its assessment system that stakeholders might consider. Especially in light of Recommendation 12, these long-term considerations can be a part of a regular review process. If selected, their implementation can be managed and monitored through a formal, continuous improvement process.
As the educational community awaits the reauthorization of ESEA and the two assessment consortia continue developing their assessment systems, educators will be provided with ample opportunity to consider what the next generation of educational assessment systems should look like. Toward the end of ESEAs initial term, researchers were proposing alternatives to the ESEA framework and the paradigm in which it placed K-12 education. In their paper entitled “Transforming K-12 Assessment: Integrating Accountability Testing, Formative Assessment, and Professional Support,” Bennett and Gitomer (2008) suggested that an alternative system could reframe the role of assessment in the classroom. Their article posits a way forward in the design of a comprehensive system:
Is based on modern scientific conceptions of domain proficiency and that therefore causes teachers to think differently about the nature of proficiency, how to teach it, and how to assess it?
Shifts the end goal from improving performance on an unavoidably shallow accountability measure toward developing the deeper skills we would like students to master?
Capitalizes on new technology to make assessment more relevant, effective, and efficient?
Primarily uses extended, open-ended tasks?
Provides not only formative and interim-progress information, but also accountability information, thereby reducing dependence on the one-time test?
Bennett and Gitomer go on to articulate how this new system should be developed, such that it provides coherency in two ways:
First, assessment systems are externallycoherent when they are consistent with accepted theories of learning and valued learning outcomes. Second, assessment systems can be considered internallycoherent to the extent that different components of the assessment system, particularly large-scale and classroom components, share the same underlying views of learners’ academic development. The challenge is to design assessment systems that are both internally and externally coherent. Realizing such a system is not straightforward and requires a long-term research and development effort. Yet, if successful, we believe the benefits to students, teachers, schools, and the entire educational system would be profound.
Additionally, there are several recent reports that articulate a future vision of assessments and the guiding principles for designing them.
The Findings and Recommendations of the Gordon Commission (The Gordon Commission, 2013) Nature of Assessment Assessment is a process of knowledge production directed at the generation of inferences concerning developed competencies, the processes by which such competencies are developed, and the potential for their development.
Assessment is best structured as a coordinated system focused on the collection of relevant evidence that can be used to support various inferences about human competencies. Based on human judgment and interpretation, the evidence and inferences can be used to inform and improve the processes and outcomes of teaching and learning.
Assessment Purposes and Uses The Gordon Commission recognizes a difference between a) assessment OF educational outcomes, as is reflected in the use of assessment for accountability and evaluation, and b) assessment FOR teaching and learning, as is reflected in its use for diagnosis and intervention. In both manifestations, the evidence obtained should be valid and fair for those assessed and the results should contribute to the betterment of educational systems and practices.
Assessment can serve multiple purposes for education. Some purposes require precise measurement of the status of specific characteristics while other purposes require the analysis and documentation of teaching, learning, and developmental processes. In all cases, assessment instruments and procedures should not be used for purposes other than those for which they have been designed and for which appropriate validation evidence has been obtained.
Assessment in education will of necessity be used to serve multiple purposes. In these several usages, we are challenged to achieve and maintain balance such that a single purpose, such as accountability, does not so dominate practice as to preclude the development and use of assessments for other purposes and/or distort the pursuit of the legitimate goals of education.
The Findings and Recommendations of the Gordon Commission (continued)
Assessment Constructs The targets of assessment in education are shifting from the privileging of indicators of a respondent’s mastery of declarative and procedural knowledge, toward the inclusion of indicators of respondent’s command of access to and use of his/her mental capacities in the processing of knowledge to interpret information and use it to approach solutions to ordinary and novel problems.
The privileged focus on the measurement of the status of specific characteristics and performance capacities, increasingly, must be shared with the documentation of the processes by which performance is engaged, the quality with which it is achieved, and the conditional correlates associated with the production of the performance.
Assessment theory, instrumentation, and practice will be required to give parallel attention to the traditional notion concerning intellect as a property of the individual and intellect as a function of social interactions — individual and distributive conceptions of knowledge — personal and collegial proprietary knowledge.
The field of assessment, in education will need to develop theories and models of interactions between contexts and/or situations and human performance to complement extant theories and models of isolated and static psychological constructs, even as the field develops more advanced theories of dialectically interacting and dynamic bio-social behavioral constructs.
Emerging developments in the sciences and technologies have the capacity to amplify human abilities such that education for and assessment of capacities like recall, selective comparison, relational identification, computation, etc. will become superfluous, freeing up intellectual energy for the development and refinement of other human capacities, some of which may be at present beyond human recognition.
Assessment Practices The causes and manifestations of intellectual behavior are pluralistic, requiring that the assessment of intellectual behavior also be pluralistic (i.e., conducted from multiple perspectives, by multiple means, at distributed times, and focused on several different indicators of the characteristics of the subject(s) of the assessment).
Traditional values associated with educational measurement, such as reliability, validity, and fairness, may require reconceptualization to accommodate changing conditions, conceptions, epistemologies, demands, and purposes.
Rapidly emerging capacities in digital information technologies will make possible several expanded opportunities of interest to education and its assessment. Among these are:
individual and mass personalization of assessment and learning experiences;
customization to the requirements of challenged, culturally and linguistically different, and otherwise diverse populations; and
the relational analysis and management of educational and personal data to inform and improve teaching and learning.
It is not the intent of this section to investigate each of these distinguished and thoughtful considerations for a vision of an assessment plan for California: that is more than can be accomplished here. Rather, this section offers additional considerations that are potential candidates for future development and are aligned with visions proposed in these reports as well as the SSPI Recommendations.
The long-term considerations listed below are separated into four categories: design, administration, reporting, and communication. Each focuses on aspects of the state assessment system that could be enhanced should these considerations be included.