Ucam-cl-tr-790 issn 1476-2986 Computer Laborator



Download 0.52 Mb.
Page5/6
Date04.11.2016
Size0.52 Mb.
1   2   3   4   5   6


tel se wt+1

= wt


+ ˜at

end if compute tt+1

end fo r


Note that the one- sided pre ference ranking margin takes the value 2, mirroring the twosided unit- width margin in the classification mo de l.

The te rmination of the optimisation pro cedure is governed by the timing rapidity hyp erparameter, as in the classification case, and training time is approximately linear in the numb er of pairwise diff erence vectors, upp er b ounded by u (see ab ove ). The output from the training pro cedure is an optimised weight vector wtwhere t is the iteration at w hich the pro cedure te rminated. G iven a test s ample, x, Pre dictions are made, analogously to the clas sification mo de l, by computing the dot-pro duct wt· x. The res ulting real scalar can then b e mapp e d onto a grade /score range via simple line ar regres sion (or some other pro ce dure), or us ed in rank comparison with othe r test samples. Joachims (2002) desc rib es an analogous pro cedure f or the SVM mo del which we do not rep eat here.

As stated earlier, in application to AAET , the principal advantage of this approach is that we e xplicitly mo del the grade relationships b etwee n scripts. Pref ere nce ranking allows us to mo del ordering in any way we cho ose ; for instance we might only have acces s to pass /fail information, or a broad banding of grade leve ls, or we may have acc ess to detaile d scores. Prefe renc e ranking c an account for each of these scenarios, w hereas clas sification mo dels only the first, and numerical regres sion only the las t.

3. 4 Feat ure Space

Intuitively AAET involves comparing and quantifying the linguis tic varie ty and complexity, the degree of linguistic comp etence, displayed by a text against errors or infe licities in the p erformance of this comp etence . It is unlikely that this comparison c an b e c aptured optimally in terms of feature typ es like , f or example, ngrams over word forms . Varie ty and complexity will not only b e manifested lexically but also by the use of diff ere nt typ es of grammatical construction, whilst grammatic al errors of c ommiss ion may involve nonlo cal dep e nde ncies b etween words that are not capture d by any given length of ngram. Neverthe less, the f eature typ es used for AAET must b e automatically extracted from text with go o d levels of reliability to b e effe ctively exploitable.

We used the RASP system (Brisco e et al 2006; Brisco e, 2006) to automatically annotate



1

7

TFC:

Typ e Exampl e

lexical terms and / mar k lexical bigrams dear mar y / of the part-of- sp eech tags NNL1 / JJ part-of- sp eech bigrams VBR DA1 / DB2 NN1 part-of- sp eech trigrams JJ NNSB1 NP1 / VV0 PPY R


GTFS: pars e rule names V1/ mo dal bse/ +- / A1/ a inf script length numerical corpus -derived error rate numerical

Table 1: E ight AAET Feature Typ es

b oth training and test data in order to provide a range of p ossible feature typ es and their instances s o that we could explore the ir impact on the accurac y of the resulting AAET system. The RASP system is a pip eline of mo dules that p erform s ente nc e b oundary de tection, tokenisation, lemmatisation, part-of-s p eech (PoS) tagging, and s yntac tic analys is (parsing) of text. T he PoS tagging and pars ing mo dules are probabilis tic and trained on native English text drawn from a varie ty of source s. For the A AET system and e xp eriments des crib ed here we use RASP unmo dified w ith default pro ces sing settings and s elect the most likely PoS sequence and syntactic analysis as the basis for feature e xtrac tion. The system make s availalble a wide variety of output representations of te xt (s ee B risco e, 2006 for details). I n developing the AAET system we exp erimente d with most of them, but for the subset of e xp erime nts rep orte d he re we make us e of the set of feature typ es given along with illustrative examples in Table 1.



Download 0.52 Mb.

Share with your friends:
1   2   3   4   5   6




The database is protected by copyright ©sckool.org 2020
send message

    Main page