5 Measurement and evaluation: what can we learn?
When discussing measurement and evaluation we need to be clear on what we are trying to measure, how we expect to measure it and what purpose the information is expected to play in guiding educational policy and practice. We can draw a distinction between on one hand broad sectoral, national and international surveys for purposes of benchmarking and monitoring of skill levels in the broader population, and on the other hand assessment of learning outcomes of individuals, classes, departments etc. within education institutions for purposes of evaluating and directly influencing learning outcomes.
The so-called 21st century skills have so far only been integrated into large-scale assessment surveys to a limited extent, mainly because of a lack of validated methods that can be used to measure them. As a result, surveys such as PISA, TIMMS, CIVED, ICCS, IALS, ALL and PIAAC have focused mainly on basic skills in the domains of literacy, numeracy, math, science and civic engagement. Recent developments in ICT use for assessments (simulations, adaptive testing etc.) have greatly improved the possibilities to develop tests that are more authentic and complex. The IEA is currently developing a new instrument for assessing the level of ICT skills and the PIAAC survey has developed a test for broader ICT-problem-solving skills (which are more focused on people’s abilities to use technologies purposefully than on their more narrowly defined ICT-skills). Apart from these developments, there has been little attention in large-scale assessment surveys for the broader area of 21st century skills. Attempts to broaden the scope of PISA to include so-called cross-curricular competencies (OECD 2004) have not been met with broad acceptance, partly because of the difficulties involved in developing reliable and broadly accepted indicators that are comparable across relevant (sub-)populations. The heavy reliance on self-assessment means that the measures obtained are not convincingly different from those available in various broader social surveys.
There has been considerable discussion as to the policy relevance of skill measures obtained from large scale assessment surveys in the light of the rather narrow focus on a few basic skills. Many have argued for the desirability of extending large-scale assessments to include such things as so-called non-cognitive skills like social and communication skills. The proponents of such measures argue that such skills are as relevant or perhaps even more relevant for economic and social outcomes of education. These scholars worry that the absence of valid and internationally comparable measures of such skills will result in biased estimates of the effects of the skills that are measured, and in erroneous policy conclusions that could lead to a diversion of resources in favour of measurable skills and to the detriment of skills that are more difficult to measure.
The proponents of large scale surveys in their current form are quick to point out that the skill measures are strongly related to economic and social outcomes, both at an individual level and at the level of countries as a whole. Based on an analysis of the observed relation between test scores and economic growth, Hanushek and Woessman (2011) argue that a net improvement in PISA scores by ¼ of a standard deviation would have increased economic growth in OECD countries by some 288%, or US$ 123 trillion, by 2090. They show that the skill measures completely account for the projected effect of increased educational attainment on economic growth in OECD countries.
Regardless of the outcome of this discussion, few scholars deny that a regular assessment of basic skills in surveys such as PISA is valuable. There is however strong disagreement as to what role such assessments should have in guiding educational policy and practice. At one extreme there are those who advocate directly confronting educational institutions with the results of such assessments and challenging them to do better (e.g. Ritzen, 2011). In the Netherlands this is already done to some extent through the actions of the education inspectorate in using CITO-scores for assessing schools (although not based on large-scale assessments, the publication of performance indicators in the higher education choice guide has a similar effect). Others point to the problems inherent in linking such assessment too tightly to policy and practice, arguing that in order for this to be effective, both the content of the assessment and the modes of assessment need to be carefully designed in order to ensure a positive impact on education. Beller (2011) argues that effective assessments need to be complete, authentic and fully integrated into the learning process. Large scale assessments are currently none of these things, and it is doubtful whether they can (or even should) ever be designed in such a way. Attempting to use such data directly to guide educational policy and practice would result in a misalignment between teaching and learning, content and performance standards, and assessment, and are likely to result in an unwanted diversion of resources, “teaching to the test” and the like. More in general, the risk arises that education comes to be seen by students, teachers, schools or policymakers too much as a contest that needs to be won (who is at the top of the ranking list?), if necessary using calculating strategies, and too little as a means to gain a better understanding of the strong and weak points of their education.
The strength of broad surveys and their importance for educational policy and practices lies in the comparability of the measures obtained over whole populations or subgroups thereof. Even when only applied to a subset of the full range of skills needed for work and life in general, such information is highly informative, as long as the subset is shown to be sufficiently relevant to key outcomes (Hanushek and Woessmann, 2011). Differences between countries and subpopulations within countries are useful for identifying populations at risk and assessing progress or decline over time.
Assessment within schools
In contrast to large-scale assessments, assessment of individual learning processes and outcomes in schools are not only focused on basic skills, but also on 21st century skills and domain specific skills, with an aim to monitor and where necessary intervene in the learning process. As mentioned above, it is important to achieve a good alignment between the content and performance standards strived for, the teaching and learning methods applied, and the methods of assessment used. Against this background, Geller (2011) points to the need to strike the right balance between formative and summative assessments. Summative assessment aims to summarize the learning that has taken place at a given point on time. The outcome of summative assessment can be useful for diagnosing problems, identifying weaknesses etc., but plays as such no role in the learning process. By contrast, formative assessment is a bidirectional process between the teacher and the student which itself plays a role in facilitating learning to enhance, recognize and respond to the learning (see e.g. Black and Wiliam, 1998; Cowie and Bell, 1999). Wiliam (2010) identifies five key strategies for using assessment to improve the quality of instruction:
1. clarifying what students are expected to learn and what the criteria for successful learning are
2. facilitating the development of activities that can make clear to what extent learning is actually taking place
3. providing feedback that students can use to help them move forward
4. enabling ways for students to learn from each other
5. encouraging students to take responsibility for their own learning.
New developments in ICT appear to have enormous implications for the possibilities for formative assessment. Beller (2011) points to developments that can improve the efficiency and quality of existing assessment practices in education (e.g. automated test instrument development, computer delivery of tests, automated scoring of complex items) as well as developments that could potentially expand the scope of assessment into new domains (e.g. simulated assessment tasks, intelligent tutoring systems and virtual reality systems). She points out that new technologies can also enable far richer, more authentic tasks to be developed than can be used to probe out precisely the kinds of things referred to under the heading 21st century skills, such as integrated knowledge, critical thinking and problem solving. At the same time, she also recognizes that there are major hurdles to be overcome before this huge potential can be utilized. For a start, it is far from clear that schools are currently equipped – pedagogically, technologically, logistically and socially - to implement technologies such as described above into the teaching, learning and assessment processes. If these issues cannot be adequately resolved, the risk exists that (partial) implementation of systems into education may impose an unwanted burden on schools without this resulting in clear benefits, at least in the short term. Particularly in a time when educational budgets are coming under increasing pressure this is a serious issue.
New technologies have also radically changed the world of large scale national and international assessments, and this has led some to suggest that such large scale assessments can be more or less fully integrated into learning processes so as to play a direct role in both summative and formative assessments. The more continuous approach to assessment applied by CITO in Dutch primary education on the basis of the a continuous learning approach (Referentiekader Doorgaande Leerlijnen) is an example of such an approach, allowing timely diagnosis of problem areas and where necessary corrective interventions. Another example of such an approach is the so-called “cognitively-based assessment of, for, and as learning” approach (CBAL) (see O’Reilly and Sheenan 2008), which aims to spread out assessments over the whole school year and incorporate formative elements into the assessment process as an aid to teachers. The ambition of the CBAL consortium is to use new technologies to fully integrate large-scale assessments into the learning process. It is too early to judge to what extent such initiatives can realistically achieve all of their aims. If nothing else, analysis of the results of such initiatives can yield valuable insights into the relation between the outcomes of more conventional large scale assessment surveys and the more detailed and tailored learning outcomes strived for by schools.
Measuring and evaluation of the achievement of students is key for any improvement of the skills of the population. Supported by new developments in ICT, our ability to measure a wide variety of skills in a valid, comparable and authentic manner has improved dramatically in recent years. There is however still much to be done, particularly in the areas of softer skills, and possibly also in developing comparable measures of specific skills. We have to be careful how we apply the information we do obtain. Although a certain degree of competition in education can be healthy, we have to avoid a situation where education is viewed by those involved as a contest in which the aim is to score well on assessments rather than to promote the learning process. Results of assessments can be useful indicators of where we stand in terms of skills, but only if linked in an authentic way to educational goals does it make sense to integrate these into the learning process or directly link them to educational policy.
6. What are the challenges that education is facing?
If we look at the many changes in today's world, the implications these changes have for the skill needs of the population and the central role of education and training in supplying these skills, it is clear that the task for education and educational professionals is unprecedented. The challenges that education is facing are many, but we would argue that the greatest challenge, which has enormous implications not only in itself but for the chances of meeting the other challenges, is the expansion of ICT use in schools. Within a very few years a tsunami of ICT is set to wash over education, and our education system is at present not fully equipped to deal with this. Young people are more ICT-savvy than their teachers, and ICT will enter the classroom whether we want it to or not. The problem is that the medium-related skills of young people are not well matched by content-related skills, and that teachers are currently not well placed to guide them in learning the latter type of skills. Nor are teachers currently sufficiently ICT-literate to make use of the enormous potential offered by new technology in terms of interactive and iterative learning and assessment, open source content and the like.
If the successful implementation of ICT into education was the only challenge, there would be some reason for optimism. With the exception of the fact that our teachers are as yet underprepared, in other respects Dutch education is already rather advanced in terms of the implementation of ICT infrastructure. There are however many other challenges. Some of these challenges result from changes in the input in education such as the student population, budgets or the teacher population. Other challenges result from required changes in the process of education, for example the introduction of innovative learning environments, or the timing and organisation of the education process. And of course some challenges are directly related to changes in the required output of education that is changes in the skills that need to be taught. Below we list some of the most salient challenges.
6.1 Challenges related to the input
How to deal with individual differences?
As indicated in Section 2, student populations in schools have become more diverse. They are becoming more mixed in terms of ethnic, socio-economic and religious composition, and also in terms of marital status of parents, with a profound increase in the number of students from single parent families. Primary and secondary education has seen an increase in the number of students who have some form of learning disability. In higher education we see an increase in the number of international students, mature-age students, part-time students and students from lower social strata. Moreover, the sheer increase in the participation in higher education as well as in the higher tracks of secondary education also implies that there is more variation in talents and abilities.
An interesting phenomenon in this context relates to the rising gender differences in favour of women. In most western countries women have overtaken the dominant position of men. They now form the majority of students in higher education, they perform better on language tests, and in some countries they even do better on math tests. By contrast, men often have a more problematic school career: they are more likely to drop out of high school, get referred to special education, and repeat grades.
There is still some dispute about the general desirability of diversity (for a brief overview see Dronkers, 2010), but there is consensus that diversity at least increases the workload of teachers and the complexity of their task. Diversity decreases the efficiency of classroom instruction and increases the need for individual instruction to students that lag behind or have different educational needs. This may have a negative effect on the average performance of students. It is clear that the increase in diversity calls for more differentiation and more tailor-made solutions. The ‘one size fits all’ approach that is still very dominant in education will need to be abandoned.
How to deal with all the challenges with fewer resources?
Budgets for education in all western countries are increasingly under pressure. In 2006 the average spending in the Netherlands on education was 5.6% of GDP, which is close to the EU (5.5%) and OECD (5.8%) average. We are now facing an era in which budgets are being frozen or decreased. The financial crisis forces governments to make drastic cutbacks in their expenditures. Even when total expenditures for education have not decreased, the mean expenditure per student has. This is felt most strongly in higher education (OECD, 2008). The falling expenditures per student in higher education are likely to lead to a reopening of discussions related to access to education, the role of education in reducing social inequalities, and such. In addition, challenges will arise related to the possible social exclusion of those groups in society who for whatever reason do not progress to higher education.
These problems are exacerbated by major demographic changes in the labour market. Current forecasts show that education and health care are the two sectors most likely to be confronted with shortages in the supply of personnel in the near future (ROA, 2009).
This is made worse by the skewed age distribution of the Dutch teacher population. 33% of teachers in primary education and 44% of teachers in secondary education are aged 50 years or older. This compares negatively with the EU average of 28% and 36% respectively. Like all older people, older teachers are prone to processes of cognitive decline. Consequently, their ability to process new information, to adapt to changes in the environment, and so on is likely to decrease. In combination, these changes mean that Dutch education has to successfully implement curriculum reforms, introduce ICT tools into education and meet all the the other challenges it is facing with increasingly strained budgets and a shortage of teachers, particularly those in the younger age range who are likely to best be equipped to deal with the changes.
6.2 Challenges related to the process
How to deal with a need for more flexibility?
It is clear that the above-mentioned increasing diversity of the student population has implications for the organisation of the educational process, mainly in terms of differentiation. This by itself induces a need for more flexibility. The need for flexibility is further increased by the role that VET and higher education are expected to play in lifelong learning.
Lifelong learning implies that VET and higher education will be faced with an increased demand for short, tailor-made courses for adults. In order to keep pace with changes in the work environment, adults will increasingly return to education or forms of non-formal training, to update their current skills, to increase their level of skills or to change their skills set completely. The current educational institutions in VET and higher education are hardly prepared for this new type of student (Denktank Leren en Werken, 2009). These institutions and the programs they offer are organised to function as initial education, with largely homogeneous groups of students in terms of educational needs who all start at the same time at the beginning of an academic year. If these institutions are to play a significant role in lifelong learning, they will need to completely change their orientation and organisation of the educational process. This will include the formal assessment and evaluation of previous learning experiences (EVC) in order to assess the current skills of the adults entering the educational programs. As both the entry skills of adults as well as the desired output level of their skills will differ significantly, individual tailor-made trajectories need to be designed. And these individual trajectories need to be offered in a way that is highly flexible in terms of time (starting dates and end dates, contact hours in evenings or weekends) and place (e-learning, etc).
How to successfully implement innovative learning environments?
Over the past two decades there have been many advocates to promote the use of innovative methods to develop the 21st century skills and indeed these methods have been widely introduced in education. In the Netherlands most – if not all - of the programs in upper secondary and tertiary education use some form of student-centred method like self-regulated learning, problem-based learning or project-based learning, and the goals of education are defined in competencies rather than skills. The OECD Centre for Educational Research and Innovation (CERI) has made a valuable contribution to this discussion with the publication of the report ‘The Nature of Learning’ (OECD, 2010a), in which leading scholars advocate the development of innovative learning environments such as inquiry based learning, collaborative learning and other student-centred modes of teaching.
There is ample evidence that these innovative learning environments indeed foster relevant 21st century skills like communication, cooperation and problem-solving skills. Nevertheless there are some caveats. That is to say, innovative modes of teaching and learning can be highly effective, but only under specifically proscribed circumstances. Failing to meet these conditions may render these innovative methods less efficient or even ineffective. To give a few examples:
There is probably little doubt that cooperative learning has a positive effect on cooperation skills, but the effect on cognitive achievement is less straightforward. Group dynamics may lead to a less than desirable learning environment and there is the constant danger of free riding. Slavin (2010) points out that cooperative learning only yields positive results on achievement outcomes when two conditions are present: clearly defined group goals and individual accountability. When these conditions are not met it is unlikely that cooperative learning will be effective.
Self-regulated learning is generally seen as an important way to develop meta-cognitive skills and to increase intrinsic motivation. It is therefore one of the key constituent elements of many innovative learning environments (e.g. the Dutch educational reform in secondary education, the “Studiehuis”). Recent insights from the neurosciences however shed some doubt as to whether self-regulated learning is always possible (Jolles, 2007). It turns out that the adolescent’s brain is not yet ripe for engaging in the long-term planning that is necessary for effective self-regulation. This applies more to boys than to girls, which is probably one of the reasons why boys have profited less from the introduction of self-regulated learning in the “Studiehuis”, than girls, as was shown by a recent evaluation by Coenen, Meng and Van der Velden (2011). It is crucial to take these and similar insights into account, in order to specify the conditions under which self-regulated learning is likely to be effective.
There is strong evidence that inquiry-based learning approaches such as project-based and problem-based learning develop academic skills. Students learn more deeply when they can apply classroom-based knowledge to real world problems, all the while nurturing 21st century skills like communication, cooperation and creativity. However it is less evident that this is always the most effective way to develop specific skills. To develop a body of knowledge in a given domain, students need structure. This structure enables them to see how new information fits within their existing frame of reference. In a traditional classroom setting this structure is usually provided by the teacher who acts as an expert, or by the classical textbook. This structure helps students to build a good overview of the whole body of knowledge to be learnt. Meng (2006) has shown that in a situation where the role of teachers is limited to supervising the process rather than serving as an important source of information, the development of domain-specific skills lags behind.
And finally, an excessive focus on innovative methods may easily obscure the fact that effective skills acquisition also requires practice, repetition and routine. Although we take this for granted in the case of skill acquisition in sports or music, it seems that for other skill domains this has become something old-fashioned and out-of-date. But there is no reason to assume that acquiring expertise in whatever domain can do without some form of practice and repetition.
The list can easily be expanded. The main message is that the success of innovative learning methods is crucially dependent on the conditions under which they are implemented. The knowledge about which conditions are crucial is unfortunately less well developed and also less widespread than the methods themselves. Many of the technologies listed above in relation to assessment (Beller, 2011) are potentially relevant to the implementation of innovative learning environments, and can, if used appropriately, help solve some of the problems related to their implementation. Once again however, this requires a teaching staff who know how to deal with the technologies in question.