Ucam-cl-tr-790 issn 1476-2986 Computer Laborator

Download 0.52 Mb.
Size0.52 Mb.
1   2   3   4   5   6

tel se wt+1

= wt

+ ˜at

3. 3 D iscr im inati ve Pr efer ence Ranking


- Lt- 1end if compute tt+1

end fo r

1. how far the algorithm has progresse d (t) 2. the magnitude of the increase in empirical loss (Lt

) 3. the balance of the training distributions ( ß )

The motivation b ehind the third heuristic is that in the early stages of the algorithm, unbalanced training distributions lead to aggregate vectors that are skewed toward the dominant class. I f the pro ce dure is stopp e d to o early, the e mpiric al loss will b e dis prop ortionately high for the s ub dominant clas s, leading to a s kewe d weight vector. The e ffect of ß is to relax the timing schedule for imbalanced data w hich results in higher quality solutions.

The TAP optimisation pro cedure requires storage of the input vectors along with the feature weight and up date ve ctors, yielding space complexity of O (N ) in the numb er of training samples. At each iteration, computation of the empirical loss and aggregate vector is O (N · ν ) (recall that ν is the average numb er of unique features p e r sample). Given the curre nt and previous loss value s, computing t is O (1) and thus each ite ration scales with time complexity O (N ) in the numb e r of training samples. The numb e r of training iterations is governed by the rapidity of the timing schedule which has no direct dep e nde nce on the numb er of training s amples, yielding an approximate overall complexity of O (N ) (linear) in the numb er of training s amples .

The TAP and SVM mo dels describ ed ab ove p erform binary disc riminative clas sification, in which training exam scripts mus t b e divided into ‘pass’ and ‘fail’ categories . The confidence margin ge nerated by the classifie r on a given test sc ript can b e inte rpreted as an e stimate of the degree to which that script has passed or failed, e.g. a ‘go o d’ pass or a ‘bad’ fail. However, this gradation of script quality is not mo de lled explicitly by the classifier, rather it relies on emerge nt correlation of key f eatures with script quality.

In this section, we intro duce an alte rnative ML technique c alled preferenc e ranking which is b e tter suited to the AAE T task. I t explicitly mo dels the relationships b etween sc ripts by learning an optimal ranking over a given sample domain, inferred through an optimisatio


pro cecure that utilises a sp ec ified orde ring on training samples . This allows us to mo del the fact that some sc ripts are ‘b etter’ than others, across an arbitrary grade range, without nece ssarily having to sp ecif y a numerical score for each, or intro duce an arbitrary pass /fail b oundary.

We now pres ent a version of the TAP algorithm that efficiently learns pref erence ranking mo dels. A de rivation of similar equations for learning SVM- base d mo dels , and pro of of their optimality is give n by Joachims (2002).

The TAP preference ranking optimis ation pro cedure requires a set of training samples, x1, x2, . . . , xn, and a ranking


xj) : ˆw · (xi

- xj) = µ. (15)

where µ is the margin, given a sp ec ific value b elow. The derived set of pairwise diff ere nce vectors grows quickly as a function of the numb er

of training samples. An upp er b ound on the numb er of difference vectors f or a se t of training vectors is give n by

:2u = a

* r (r - 1)/2 (16)

where r is the numb er of ranks and a is the average rank frequency. This yields intrac table numb e rs of diff erence vec tors for eve n mo dest numb ers of training

vectors, eg: r = 4, a = 2000 yields 24, 000, 000 differenc e vectors. To overcome this, the TAP optimisation pro cedure employs a sampling strategy to re duce

the numb er of differenc e vec tors to a manageable quantity. An upp e r b ound is sp e cified on the numb er of training vectors, and then the probability of s ampling an arbitrary difference ve ctor is given by u /u where u is the sp ecifie d upp er b ound and u is given ab ove.

The optimisation algorithm the n pro ceeds as for the classific ation mo del (Algorithm 1), except we have a one- sided margin. The mo dified pro cedure is shown in Algorithm 2

.The misclassification c ondition is:


· (xj

- xi) > 2 (17)xi


6and the aggregate vector ˜at



is cons tructed by:

= nor m (  


- xi

, t ) (18)
1   2   3   4   5   6

The database is protected by copyright ©sckool.org 2020
send message

    Main page