Automatic Essay Scoring is Here and Now Online



Download 9.25 Kb.
Date23.11.2016
Size9.25 Kb.

Automatic Essay Scoring is Here and Now Online

  • Welcome to CIT S234
  • Gary Greer
  • University of Houston Downtown
  • &
  • Michelle Overstreet
  • The College Board
  • Tuesday, Oct 25, 2005 (9:15 AM - 10:15 AM) Coral Tower, Lobby Level

AES, AI, ACCUPLACER/WritePlacer

  • When essays are scored by human experts, the scoring characteristics can be mapped by Artificial Intelligence (AI) and used in Automatic Essay Scoring (AES).
  • AI is used to identify and internalize essay features into scoring models (algorithms).
  • The algorithms are verified in simulation and subsequently on live essays.
  • The algorithms are used by AES to score an essay.

Automatic Essay Scoring

  • AI maps salient characteristics of freshman essays (about 300) into a linear model of each score (for example, 6s ; 8s ; 10s, etc.)
  • AES is carried out by mathematically matching live essays to these predetermined linear models to predict a score.
  • AES algorithms specify whether an essay’s characteristics mathematically match the semantic space previously specified by human graders.

AES

  • AES therefore emulates human raters by repeatedly evaluating characteristic essay features such as Structure, Content, Style, Syntax, Discourse, and Word choice to predict a maximum likelihood estimate of a score according to the algorithms copied from the 300 human-expert scored essays.
  • AES’s performance has been verified in national level studies and now waits for users to conduct performance tests at local levels.
  • We conducted our local performance study with ACCUPLACER/WritePlacer.

WritePlacer employs AI called Intellimetrics

  • WritePlacer infers and internalizes the rubric and pooled judgments of human scorers by analyzing over 300 semantic, syntactic and discourse features in five categories:
    • Focus and Unity
    • Development and Elaboration
    • Organization and Structure
    • Sentence Structure
    • Mechanics and Conventions

ACCUPLACER/WritePlacer is Online

  • ACCUPLACER Online offers an option for AES called WritePlacer Plus. Delivery is online, testing time is reduced, reliability is enhanced, and scoring is immediate.
  • At U. of Houston Downtown we asked whether this AES is the same as human-expert scoring. In other words, does this AES differ from human scoring?

We Conducted a local Study

  • Research Question 1
    • What is the correlation between WritePlacer scores and human expert scores? Is it significant?
  • Research Question 2
    • Do distributions of scores differ? (Are the medians equal?)

Our Hypotheses

  • Hypothesis 1
    • A significant correlation exists between WritePlacer scores and human expert scores.
    • (Ho : correlation = 0)
  • Hypothesis 2
    • The Median WritePlacer score is equal to the Median human expert score.
    • (Ho: Medians are equal.)

Our Method

  • Participants were 112 randomly selected, college freshmen examinee essay takers.
  • Their essays were twice scored : 1st by WritePlacer’s AES and 2nd by human experts.
  • Correlation between scores was obtained.
  • To see whether the median scores differed, a non-parametric test statistic was obtained.

Table 1 Frequencies of Differences

  • Difference
  • Frequency
  • Percent
  • Who Scored higher?
  • -2
  • 4
  • 4%
  • AES
  • -1
  • 7
  • 6%
  • AES
  • 0
  • 67
  • 60%
  • identical
  • 1
  • 28
  • 25%
  • Human
  • 2
  • 6
  • 5%
  • Human
  • Total 112
  • Total 100%

Table 2 – Significance Tests

  • Medians Test
  • n
  • Mean
  • Rank
  • Sum of
  • Ranks
  • Wilcoxon Test Statistic
  • AES
  • 112
  • 119.63
  • 13398
  • 11802
  • p>.05
  • Human
  • 112
  • 105.38
  • 11802
  • Correlation
  • rho = .724 p<.05

Discussion of Tables

  • Table 1 indicates that 91% of the paired scores were identical or agreed within 1 point and that 9% differed by 2 points.
  • [The 10 (9%) that differed by 2 points were split 60%-40%: 6 where Human > AES and 4 where AES >Human)].
  • Table 2 shows inferential statistics supporting a conclusion that AI scoring assigns the same scores to essays as human experts assign to (the same) essays.

Findings

  • The correlation between WritePlacer scores and human-expert scores is significant :
        • r =.72 p<.05.
  • The distributions of WritePlacer scores and human-expert scores are the same):
        • Wilcoxon W 11802 p>.05

Conclusions

  • Scoring essays by AES (as implemented within ACCUPLACER/WritePlacer) is consistent with scoring essays by human experts. (Interrater reliability is significant.)
  • AES scoring of essays is not subject to unreliability (inconsistency) due to fatigue. AES never gets tired !
  • AES scoring is efficient and effective.

Additional Issues:

  • 1. Measurement error is eliminated.
  • 2. Essay supplemented by MC items = increased confidence about placement.
  • 3. Efficiency/ faculty freed for instruction.
  • 4. GMAT/MCAT/SAT are adopting AES.
  • 5. Deep Blue learned chess moves.
  • Thank you



Share with your friends:


The database is protected by copyright ©sckool.org 2019
send message

    Main page