Seek items giving high item/total test score correlations. Each item should correlate well with the assessment’s total score.
Said in another way…
When you are giving a test that is measuring a specific characteristic (e.g., knowledge of the steps toward completing a financial aid form; knowledge of theoretical perspectives in psychology; knowledge of the components of a neuron), the items on the test should be intercorrelated (i.e., have “internal consistency” - they relate with one another).
Only when the items relate to one another can we be confident that we are measuring the characteristic we intended to measure (i.e., when the test has ‘internal consistency, the test is a reliable measure of that characteristic and it is more likely to be valid).
How Do We Determine Internal Reliability?
Examine correlations between each item score and the total test score—this is one way to assess “internal consistency”
You are correlating students’ “pass” vs. “fail” status on each item with students’ overall test scores.
This analysis indicates whether the item and total scores are assessing the behavior in the same way.
In general, items should be answered correctly by those obtaining high total scores (thus there should be a positive correlation).
In your final test, select only those items that have high positive internal correlations.
Item Difficulty
Difficulty
Permits the construction of a test in a specific way with specific characteristics.
Difficulty is based on the proportion of persons who pass or correctly answer the item.
The greater the proportion, the easier the item
What is the optimum level of item difficulty?
Item Difficultly - Prediction
If you are assessing achievement,proficiency, or mastery of subject matter, AND the results will be used in studies or examinations for prediction, then you should strive for an average item difficulty of .50
(and each item should not deviate much from this—this gives maximum variance among test scores, which is good for reliability and validity)
With .50 difficulty, there are more “discriminations” possible, thus you have the maximum “variance” among the test scores (this leads to better reliability and validity)
If you are interested in classification (e.g., mastery or not of most of the material in the course), then you should use the proportion that represents that standard.
If you deem 80% on an exam as “mastering the material,” then you should use .80 as the average difficulty level
some items will be higher and some lower, but the average would be .80.
Item Analysis: Validity
Test Validity: Relationship between total test scores and scores on an outside variable
Item Validity: Relationship between scores on each of the items and some external criterion.
Most are concerned with test validity, but test validity is a function of item validity.
More on Item Validity
Create external criterion groups: e.g., those with high scores (say upper 27%) and those with low scores (say lower 27%)— find items on the test (to predict school aptitude) that are passed by a significantly greater number in one group than the other group. These are the more effective items.
To select items, calculate the “discrimination index” (D), which is the difference between the number of correct responses for the high (H) and the low (L) groups. If 80 H scorers answered the item correctly, while 10 L scorers answered it correctly, the D = H – L = 70. Should select positive and high D value items (especially for achievement or aptitudes tests) for inclusion in the final form of the test (can use D as proportions, thus taking the difference between proportions and would be independent of sample size).
Obsessive-compulsive disorder is characterized by which primary symptom?
Hallucination
Memory loss
Intense specific fear
Delusion
Unwanted repetitive thoughts*
Lower Order Question, Type 2
Which disorder is characterized by unwanted, intrusive thoughts and repetitive behavior?
Phobia
Obsessive-compulsive disorder*
Dissociative identity disorder
Major depressive disorder
Schizophrenia
Creating Higher-Order Questions
The question requires students to mediate their answers by doing an extra step they had not previously learned in their studies.
Students must transfer recalled knowledge to a new situation, break apart and reassemble concepts in new ways, or combine content of two areas in novel ways to answer a question.
Not always easy to distinguish between application and analysis questions
A student who misses deadline in school while striving for perfection may be exhibiting symptoms of which of the following disorders?
Phobia
Obsessive-compulsive disorder*
Dissociative identity disorder
Major depressive disorder
Schizophrenia
Gene is always late for school because he spends an hour organizing his closet each morning. Which of the following treatments would be most effective for Gene’s problem?
In-depth interpretation of dreams
Electroconvulsive therapy
Medication affecting serotonin levels*
Systematic desensitization
Regular exposure to bright lights
Tips from ETS
Whenever possible write items using positive form.
Don’t include “teaching” in the stem.
Uses plausible distracters.
Can you give a reason why each distracter is not an acceptable response?
The stem should be a complete question or statement.
The correct answer should be about the same length as the distracters.
Items should not ask trivial information. The point being tested should be one worth testing.
ETS Tips on Distracters
Should be reasonable.
May include misconceptions and errors typical of less prepared examinees.
May include truisms, rules-of-thumb that do not apply to or satisfy the problem requirements.
Negative stems should be avoided. Stems that include “EXCEPT” “NOT” “LEAST” can be difficult to process. Never use negatives in both the stem and in the options.
Take care when writing and/or selecting items from a test bank.
Look for at least some items that test higher levels of Bloom’s Taxonomy.
After the test, have your best students critique your test and find items needing revision.
When selecting software (clicker, scanner, survey, test) consider the item analysis capability that comes with the software – factor that in to your purchase decision.
Excerpted from eLumen: A Brief Introduction by David Shupe, July 2007
Scorecard for All Students in the Course
Excerpted from eLumen: A Brief Introduction by David Shupe, July 2007
Class Scores by Student
Excerpted from eLumen: A Brief Introduction by David Shupe, July 2007
Aggregated Data for Course
Excerpted from eLumen: A Brief Introduction by David Shupe, July 2007
Course Aggregates by Program
Excerpted from eLumen: A Brief Introduction by David Shupe, July 2007
Calibrated Peer Review
Web-based program that enables frequent writing assignments with minimal impact on instructor time
Uses peer review
Promotes deeper learning
http://cpr.molsci.ucla.edu/
Calibrated Peer Review in Psychology 101
Critical Thinking in Introductory Psych Course
SLO on Pseudoscience skepticism: Students will correctly identify non-scientific explanations of human behavior and explain why those explanations are not based upon science and do not provide reliable or valid explanations of behavior or predictions of future behavior.
The Pseudoscience Belief Test
Please rate how much you believe the following statements. Use the 7-point scale provided.
1 – Do not believe in this at all.
2 – I doubt very much that this is real.
3 – I doubt that this is real.
4 – I am unsure if this is real or not.
5 – I believe that this may be real.
6 – I believe that this is real.
7 – I strongly believe this is real.
__1. A person’s personality can be easily predicted by their handwriting.
__ 2. A person can use their mind to see the future or read other people’s thoughts.
__ 3. A person’s astrological sign can predict a person’s personality and their future.
__ 4. An ape-like mammal, sometimes called Bigfoot, roams the forests of America.
__ 5. The body can be healed by placing magnets on to the skin near injured areas.
__ 6. Healing can be promoted by placing a wax candle in your ear and lighting it.
__ 7. A dinosaur, sometimes called the Lock Ness Monster, lives in a Scottish lake.
__ 8. Sending chain letters can bring you good luck; ignoring them can bring you bad luck.
__ 9. The government is hiding evidence of alien visitation at places such as Area 51.
__ 10. Voodoo curses are real and have been known to kill people.
__ 11. A broken mirror can bring you bad luck for many years.
__ 12. Houses can be haunted by the spirits of people who have died in tragic ways.
__ 13. Water can be accurately detected by people using “Y” shaped tree branches.
__ 14. Animals, such as cats and dogs, are sensitive to the presence of ghosts.
Adapted from…Walker, Hoekstra, & Vogl, (2002). Science education is no guarantee of skepticism, Skeptic, vol 9, no 3.
Students wrote a short essay in response to the materials: Why or why I believe graphology is a reliable, valid way to measure and predict personality.
Students are “calibrated” – prepared to score essays written by their peers.
Students receive a detailed grade report for the assignment.
Graphology Belief Scores Statistical Summary
Treatment
Group
Pre-test Average
Post-test
Average
Paired t-tests
Graphology
4.41
2.33
t(26) = 6.40
p < .01
Conditioning
4.12
3.69
t(25) = 1.31
p = ns
t(51) = 0.67
p = ns
t(46.7) = 2.93
p < .01
Mean Pre and Post-Test Scores on Graphology Belief Question
Example Essay The Detection of a Pseudoscience: Graphology
Elaine Quigley’s posting on the website www.businessballs.com is littered with “red flags” that expose graphology as the pseudoscience/pseudopsychology that it is. While an attempt to promote graphology, Quigley’s posting fails to measure up to several of Cotton and Scalise’s guidelines for “baloney detection.” This paper will examine four areas in which graphology fails to live up to its claim of being “science.”In an attempt to display graphology’s validity, Quigley cites the notion that it is “a very old and respected science.” The fact that it has existed for approximately 3,000 years is used to justify Quigley’s notion that graphology is a science. However, one educated in the definition of science knows that the age of a theory is not a factor used to determine its validity. In fact, there are many beliefs that have been around for thousands of years that cannot be tested and therefore cannot be deemed as scientifically reliable. Graphology is just one of many ideas that cannot be justified despite their age. Quigley also fails to tell how the “science” of graphology has been tested and proven. Instead, she simply states that graphology is a “reliable indicator of personality and behavior” and expects her readers to accept this statement as fact. She also mentions that “the science is still being researched and expanded.” This is the extent to which she approaches the issues related to the research of graphology. Without explaining the testing that was done to prove the methods reliability, how is one to know that graphology is indeed reliable? Indeed, the answer is simple. It is impossible to be sure of the reliability of a measure of personality if the measure itself cannot be tested. In addition to not presenting methods for testing the claims of graphology, Quigley also fails to present evidence in support of its validity. Instead, she simply states that “it is not easy to explain how and why graphology works, nevertheless it continues to be used, respected and appreciated by many.” Could it be that the only “evidence” for the reliability of graphology is the satisfaction that its users experience? Unfortunately, being “used” and “accepted” characteristics required of a science. Finally, the vast majority of information provided by Quigley is anecdotal and leads up to a sales pitch for her services. She provides vague stories about how graphology has been used to produce more successful hiring processes and personal relationships. The information is presented more as an advertisement than a scientific work. Quigley goes into more detail on her experience as a graphologist than she does on the aspects of graphology that would qualify it as a science. In conclusion, it is quite clear that based on the evidence presented in this paper, graphology qualifies as a pseudoscience rather than a science. The claims of graphologist Elaine Quigley fail to show that graphology is indeed a science. Instead, she relies on the age of graphology and anecdotal evidence in support of graphology while ignoring issues related to methods for testing graphology’s claims and the results that have resulted in tests of its validity. Looking critically at “discoveries” is no doubt a useful tool that extends beyond the subject of graphology. The methods for recognizing pseudosciences compiled by Cotton and Scalise are certainly tools that would empower all people and prevent them from being fooled by pseudoscientific claims.
Questions and Answers for CPR Peer Reviewers
1. Did the essay begin with a topic sentence?
2. Was the essay free of spelling and grammatical errors?
3. Did the essay present at least four (4) different reasons for supporting or denying the validity of graphology (or handwriting analysis) as a method of assessing personality and/or predicting behavior?
4. Did the essay have balance? Although this may seem subjective, do you feel that it provided a balance among each of the points made? For example, was each point was explained in the same amount of detail.
5. Did the author's arguments seem convincing to you?
6. Did the author conclude with any reflection about whether this assignment was or was not helpful to his or her learning? In other words, did the author indicate that this assignment might help him or her judge the validity explanations of behavior encountered in the popular media (newspaper, radio, TV, magazines, etc.)?
7. How would you rate this text? (Scale of 1 – 10)
Student’s Screen: Detailed Results
Instructor Screen: Student Progress
Instructor Screen: One Student’s Results
Instructor’s Screen: Student Results
SLO Data
http://calchautauqua.net/
UCLA
June 18-20, 2008
Calibrated Peer Review: A writing and critical thinking instructional tool. Arlene Russell, UCLA & Tim Su, CCSF
Classroom Responders
Engage students
Monitor student understanding
Quickly and easily collect and store assessment data
Which tool, if any, are you most likely to use for assessing your SLOS?
ePortfolio
Calibrate Peer Review (CPR)
On-line rubric generator
Scanning embedded items on Scantron answer sheets
Developing scannable forms using a product like Remark OMR
Survey software to capture students’ self-appraisals
Adobe PDF forms
None of these
Contact Info & Acknowledgements
Dr. Jerry Rudmann,
Professor of Psychology Irvine Valley College jrudmann@ivc.edu
Much of this slide show was adapted (with the express written permission) from Pat Arlington, Instructor/Coordinator Instructional Research Coastline Community College parlington@coastline.edu