Students respond “from scratch” based on a prompt.
Many variations exist, but items fall along a continuum from very simple responses to very extended responses.
General reception by many educators is more positive to this item grouping than the selected response grouping. You will hear other professionals in education use different terms, usually in the context of saying “we need to replace our school’s multiple choice tests with
essay exams.” (speaker is “old-school”, we now think of essay as one type among many types of constructed response items)
performance assessment.” (speaker usually means he/she advocates something other than filling in bubbles on test sheet)
authentic assessment.” (speaker wants the assessment task to be closer to a task we would do in everyday experience)
alternative assessment.” (speaker wants an alternative to multiple choice exams, but this term is ambiguous as it also has other meaning; such as making individual student accommodations)
The essence of a constructed response item is that it allows for variation in response; thus evaluation will require human judgment. Different scorers may have different judgments; one scorer may not be consistent over time or among students in class.
Adequacy of Content Coverage
Constructed response items tend to focus on central aspects of the content. If your testing target is a large body of knowledge or a large number of learning objective the constructed response approach may come up short. Constructed response items also take longer for students to answer so this also cuts coverage.
Consequences for Student Misunderstanding
The price a student pays when a constructed response item is misread is often more severe than when a selected response item is misunderstood. Consider a 50-item multiple-choice exam versus a 5-item essay exam. A student who misunderstands one essay item has 20% of their final score affected; misunderstanding a multiple choice is less harsh.
Constructed Response Items . . . Examples ahead
There are a wide variety of constructed response items. For the purposes of this class we will restrict our discussion to creating and scoring the following items. As you come across other constructed response items, many of the suggestions and principles discussed here will apply to them.
The short-answer item uses the constructed-response format. It requires the student to supply rather than select the correct answer.
The typical task relates to simple facts or skills.
The item can either be a direct question or an incomplete statement. Both versions can be answered by a word, phrase, number, or symbol.
NOTE: Professionals use the term “incomplete statement” rather than “fill-in-the-blank.” Why?
Short Answer Questions . . . Presented via the classic chicken joke.
What do you get when a chicken lays an egg on top of a barn?
How do two chickens dance when they dance together?
Chick to chick
Which ballroom dance will a chicken NOT do?
How do you stop a rooster from crowing on Sunday?
Eat him on Saturday
Short Answer Becomes Completion
Using principles of good item writing, change each of the following short answer questions to completion items. Consider good item construction techniques when answering these questions: Where should the “blank” be located in the sentence? How long should it be? Should I use multiple blanks for phrases?
Which day of the week do chickens hate most?
How do two chickens dance when they dance together?
Which ballroom dance will a chicken NOT do?
How do you stop a rooster from crowing on Sunday?
Scoring judgments . . . How close does the student’s response need to be to your envisioned answer to receive credit? What about spelling?
Some Final Thoughts on . . . Completion Items
Completion items are similar to multiple choice questions without the distracters. Students need to recall the information being asked based on context clues found in the stem. Reasons for using them include:
Completion items support learning objectives which focus on having students being able to summon up specific information from memory; however, don’t take stems verbatim from textbooks, lectures, overheads, etc. Why?
Completion items can be constructed more quickly than multiple-choice items, since you don't need to create distracters.
Sometimes it is difficult to construct a multiple-choice item without making the answer obvious. As there is no answer in a completion item, this type of item avoids this kind of problem.
Completion items can help students gain proficiency in the use of the new context clues found in the item stem since, if students seek any memory associations to help their response, the help will be found in the stem.
One difficulty in creating a completion item is to formulate the stem with sufficient contextual clues so that the wanted word is clearly indicated and ambiguity is avoided.
Let’s Look at . . . Using and Assessing Essays
Some overall thoughts on . . . Using Essay Items
The structure of the essay item often means that successful essay responses may be measuring writing skill as well as measuring of content knowledge.
Teach these skills before the test, not just on tests. Regular in-class essay writing should make the essay test approach less threatening and the test results more meaningful.
EXAMPLE FOR STUDENTS:
These simple steps will guide you
through the essay writing process:
Decide on your topic.
Prepare an outline or diagram of
Write your thesis statement.
Write the body.
Write the main points.
Write the subpoints.
Elaborate on the subpoints.
Write the introduction.
Write the conclusion.
Add the finishing touches.
Some overall thoughts on . . . Using Essay Items
Essay items are best for measuring students' higher level cognitive abilities (e.g., Use freedom of response and originality are important - measures ability to organize, integrate, relate, and evaluate ideas); if you are thinking of measuring knowledge only, considering using something besides an essay.
The essay prompt (called many names – question, stimulus) must be clearly stated for the students so they can write to it and so you can evaluate it effectively later. Consider this example:
Poor item - "Why did we enter World War II?”
Better - "State three reasons cited by historians that you feel best
explain America's entry into World War II.“
Provide a suggested length in terms of paragraphs or pages.
Avoid optional questions (e.g., choose 3 of the following 5). While this is good for student morale, it makes it problematic to score. All essays are not likely to be of equal difficulty; if students know there will be a choice, they can focus study away from your learning objectives.
Thoughts to consider as you . . . Create Individual Essay Items
As you create a high quality, valid essay item experience for your students, ask yourself these questions about every item and the scoring plan:
1. Does the item target a specified learning objective?
2. Is the level of reading skills required by this item below that of student ability?
3. Can all students answer the item in less than the allotted time?
4. Are higher level thinking verbs like "predict" or "compare and contrast" used rather than recall verbs like "list" and "name" or ambiguous verbs like "discuss" and "tell.“
5. Will all or most all content experts agree that the scoring plan outlines the correct response to the item?
6. Will the scoring plan insure that your judgments on each essay are protected from bias?
7. Are all students aware of how the essay with be scored?
Thoughts to consider when you . . . Create Your Scoring System
Essay Scoring Systems – Some Basic Choices
Point method - Have a written outline for yourself which expresses your preconceived model of a high quality answer (i.e. key points to be included or skills to be demonstrated). Simply sum these points.
Analytic method – use a two-way scoring rubric (e.g., rate on subscales from 1 to 4); raters break the essay task into important predetermined sub-tasks associated with key points and skills.
Holistic method – use a one-way scoring rubric (e.g., rate on overall scale from 1 to 9); raters compare each essay taken as a whole to the model. There is a variation to this method in which the raters sort all the essays into three categories (for example: “below average”, “average”, “above average”) then fine sort within categories. Some teachers use this method for A, B, C, D, F.
Primary Trait method – Used most often when the essay task is a practical one (for example, “Write a letter to your French pen pal.” The score is determined on whether it was complete or not; sometimes we say “met, or unmet”. The students receive a predetermined score when the task is completed satisfactorily.
Thoughts to consider as you . . . Score Individual Essay Items
Have your scoring key or scoring rubric physically with you as you score.
Prior to the start of reading your students’ essays, decide how to handle writing mechanics issues such as grammar, penmanship, spelling and punctuation.
Evaluate one question at a time, avoid the "halo" effect of the first good/ or bad answer impacting future judgment.
Don't look at student's name – “I know she knows, but she just didn't express herself.” OUCH or “How did he come up with this answer, he must have cheated.” DOUBLE OUCH One solution is to have students place their names on back of essay . . . of course, you may recognize their handwriting.
Watch for the tricks of bluffing - name dropping; addressing the significance of problem but not its solution; making some great points but they are off the topic; just writing and writing and saying nothing.
Use two or more raters if the decision based on this essay is critical.
Some overall thoughts on . . . Using Performance Task Assessment
Certainly by asking students to take written exams we are interested in their performance, but we are thinking of performance a bit differently here. In performance task assessment we are interesting in having them do something other than paper and pencil testing.
Performance testing can be standardized and they can have norms just like the paper an pencil tests. Most likely, however, you will create performance tests for use in your own classroom much as you create essay exams.
So, the students are active in producing something. In fact, it might look like an instructional activity. It would distinguish itself from an instructional activity in that it would have an assessment component.
As teacher, you might assess the process the student is using or you might assess the product. Or both.
Assessing a performance task would have similar scoring issues as an essay, so look back at those guidelines.
encourage peer learning and replicate the teamwork expected in the real world of work OR
become the vehicle for squabbling and freeloading?
Schools and teachers have historically valued the development of independent study habits and have oriented their students toward personal achievement. These students may perceive little value for their own learning in group activities, or may be frustrated by the need to confer with others. Students can also perceive group work as a tool used by a teacher primarily to reduce the teacher’s work load.
If you decide to use group work as part of your instructional repertoire, include the technique effectively by using well thought out objectives and scoring systems.
Target group performance work to special tasks; don’t overuse. A good start would be to pick a task that is worthwhile, feasible and best done, or only done, by a group.
Getting the assessment right is critical. How to structure the assessment is focused around answers to four questions:
whether what is to be assessed is the product of the group work, the process of the group work, or both (and if the latter, what proportion of each)
what criteria will be used to assess the aspect(s) of group work of interest (and who will determine this criteria - teacher, students or both)
who will apply the assessment criteria and determine marks (teacher, students - peer and/or self assessment or a combination)
how will marks be distributed (shared group mark, group average, individually, combination)
#1 Assessing a Group Product . . . Teacher Assesses the Product and Decides Score Distribution
Group members submit one product. Using a predetermined scoring rubric, the teacher assesses the product. All group members receive the same score, regardless of individual contribution.
One product submitted and assessed, as above, and then the teacher uses “some mechanism” to adjust an individual’s score up or down based on the teacher’s assessment of that student’s contributions to the group. Mechanism needs to be clear to the students and perceived as fair.
Each student in the group completes an allocated task that contributes to the final group product. Using a predetermined scoring rubric, the teacher assesses each task. Each student gets a score based only on the evaluation of their task.
Each student is allocated individually scored tasks, as above, and then all group members receive an average of these scores for their final score. (Modification: all group members receive a total of these scores.)
#2 Assessing a Group Product . . . Both Teacher and Students Assess the Product
Each student in the group completes an allocated task that contributes to the final group product. Using a predetermined scoring rubric, the teacher assesses each task. Then the task product is randomly distributed to another student in the class. Using the scoring rubric, the student assesses the product. Each student gets a final score based on the average of the teacher’s score and the score awarded by the classmate peer.
#3 Assessing a Group Product . . . Teacher Assesses the Product then the Group Decides Score Distribution
Group members submit one product. Using a predetermined scoring rubric, the teacher assesses the product and tells the group their score. The students in the group then decide how to distribute the awarded score. For example, the product is scored 80 (out of a possible 100). There are four students in the group. We take the score times the number of students so there are 240 points to distribute to the four members. No one student can be given less than zero or more than 100. If students decide that they all contributed equally to the product then each member would receive a score of 80. If they decided that some made a bigger contribution, then those students might get 85 or 90 points and those who contributed less would get a lesser score.
#4 Assessing the Group Process . . . Examples of what might be assessed.
Criteria for process, as appropriate to the subject and group work objectives, may include, for example:
application of creative problem solving in resolving difficulties
appropriate level of engagement with task
evidence of capacity to listen
responsiveness to feedback/criticism
whether and how leadership responsibilities were exercised
#5 Assessing the Group Process . . . Teacher Assesses the Group Process
Using a predetermined scoring rubric, the teacher directly observes the group behaviors of each student. Each student is awarded a final score based on those observations.
The teacher engages in direct observation, as above, and each student is individually scored. But the student’s final score is an average of all scores in the group.
The group keeps written logs which document the content, dates, times, and durations of all group discussions and actions, to include student names. The teacher collects the logs with the final product and uses the logs with a predetermined scoring rubric to assess the contributions of individual students.
#6 Assessing the Group Process . . . Peers Assess the Group Process
Students in a group individually evaluate each other's contribution using a predetermined list of criteria. The final score is an average of all scores awarded by members of the group. The teacher may or may not modify the scores awarded.
Students individually evaluate their own contribution using predetermined criteria and award themselves a score. The teacher may or may not modify the scores awarded.
Some Final Thoughts on . . . Dealing with Group Work Personalities
All groups have interpersonal dynamics. Some students find it difficult to retain focus and motivation because some members are preoccupied with their personal agendas.
As teachers, it is sometimes useful to identify typical individual behaviors that emerge in groups and help the groups develop procedures to deal with the behavior. If groups fail to deal with these behaviors the work of the group, both the product and the process, is at risk. Not only will these negative feelings have a negative impact on this task, but the engendered feelings work to turn students against future group work.
Howard Culbertson has created a fairly complete list of problem characters whose behaviors can damage work groups. What would you do about each?
Some Final Questions to Consider on . . . Performance Projects and Assessment
Performance tasks, whether they be individual or group, have special questions to consider as we evaluate the products and processes associated with them, for example:
How can I restructure the class period in order to give students time to work on the products? This time needed will expand if the projects involve group work. Is this taking away for important content I should be teaching?
How can I restructure my class time so I can fairly assess both the process and the product. What will the rest of the class be doing while I am assessing the performance task (since, by the nature of these assessments, not everyone is “on stage” at once)?
How can I be certain that tasks completed outside of my direct supervision were really done by the student? Certainly there is cheating on paper and pencil exams; but if work completed at home is a large percent of ones final score, I may be asking for trouble.
Using Constructed Response Items for . . . Formative Assessment
The intent of this group of techniques is to collect data which will allow immediately redirect learning, if necessary. Authors Angelo and Cross (1993) used the unfortunate term “Classroom Assessment Techniques” (why unfortunate?) and it has caught on in the literature.
It functions quite simply. At key points decided by the teacher, the students are asked for brief, written responses to open ended questions (some teachers like oral responses). Students are told their responses not be graded (as an alternative, the questions might be blanket scored with low point values). When written, 3 by 5 cards or even scrap paper might be used ; allow students 1-3 minutes to write.
The teacher reviews the responses simply to see if the students “get it.” No rubric is used. Teachers can read these quickly and determine follow-up activities based on the cards.
The next slide has examples of constructed response items that might be used in formative assessment.
Examples of Brief Constructed Response Items for . . . Formative Assessment
WRITTEN (delayed feedback but private)
Directed paraphrasing – Write the meaning of a key concept or term in their own words.
Muddiest point – Identify the most confusing point discussed.
Pro and Con Grid – Provide thoughts both for and against an idea discussed.
Test Item – Prepare a test item appropriate for the topic.
ORAL (immediate feedback but public)
Lecture Pause - Teacher stops lecture at 1 or 2 key points and asks students to reflect on how they are feeling about what they are learning. After allowing reflection time call on a few students to sample the feelings.
Opinion Poll - Teacher poses questions, students respond in unison by each holding up cards (Yes or No; A, B, or C). Notice this is really a selected response variation. Some schools use electronic clickers.
Portfolios are part . . . of the constructed response assessment family
For those training to be teachers, a portfolio means a collection of work in your teaching area (TaskStream) and has the following characteristics:
Multiple entries, many of which are self -selected.
Assessments of your work conducted by others.
Self-reflection (self-assessment) on the entries.
On-going in nature with growth indicated.
So would it make sense to have your students do portfolios?
Caution: What is the purpose for using the portfolio? Is it a working portfolio or a showcase portfolio? Who will see this work and why? Be sure students understand. Be sure you understand.
Caution: How will you evaluate the student’s portfolio? Overall, will you be using it for formative or summative assessment?
Caution: How much should go into a student’s portfolio? Key works only. Start slowly with key entries one at a time. Don’t have students working on different entries of the portfolio all at once.
Because the portfolio is a collection rather than a single response, conducting quality assessment in this area requires one to make three somewhat unique decisions:
From the assortment of materials in the portfolio, what will be the focus of the evaluation? Three choices:
Student’s best work, most typical work, or his progress
Who will select material for evaluation?
Who will evaluate?
From this point on, you can choose to evaluate using any scoring scheme discussed earlier with regard to essay and performance assessment (e.g., holistic or analytic).
Practical Advice . . . To following when using construction response items.
Become proficient in, and use a mix of, both selected response and constructed response items.
Devise your scoring system in advance.
Make sure you are assessing your learning objectives and not extraneous skills.
If you are interested in assessing higher level cognitive skills, make certain that the range of anticipated responses is truly open-ended. If there is truly only one possible response, consider re-crafting the item as a selected response item .
Terms Concepts to Review and Study on Your Own (1)
Terms Concepts to Review and Study on Your Own (2)