Nor are the uses of plagiarism detection software limited to the matter of guarding against the known and obvious pitfalls of plagiarism. It can assist examiners by showing how essays are constructed, whether or not they are technically plagiarised. It can be useful in supervision and examination of theses. We should expect journal editors increasingly to use plagiarism detection software for articles submitted. Plagiarism detection software can be useful for the students themselves, before ﬁnally submitting their work (arguably they should already know where their essays are sourced from, but poor note-taking may lead to mistakes and failures of attribution). In any case, their ability to see the reports their markers will see is relevant to some of the conclusions drawn at the end of the article.
One way of avoiding plagiarism is to set tasks which make plagiarism more difﬁcult. Some of these assessment techniques have pedagogical merit in their own right, in which case we should consider adopting them, whether or not plagiarism is an issue. The fear of plagiarism may therefore perhaps trigger a review of our assessment which we should have made in any event. We may nonetheless legitimately conclude that there are skills which can be best assessed using the traditional long essay, written over an extended period – it is, after all, essentially what we are doing when we write our own academic pieces. It would be totally unacceptable to have to abandon a good assessment method just because we are not prepared to use the tools which are available to us to tackle the plagiarism menace.
In order to use plagiarism detection software, it is necessary for the marker to have the essays in a digital form. The simplest way to do this is to require submission in digital form, possibly alongside paper submission for those of us who, even today, object to reading material on screen. Alternatively, scanning and optical character recognition (OCR) software can be used, but the OCR software needs to be almost 100 per cent accurate to be of value. With the continuing improvement of screens, and the increasing familiarity of academics with on-screen reading and editing of email, it seems difﬁcult to believe that objections to online submission are sustainable, other than in the very short term.
There are three main techniques used by the software packages currently available. First (fairly obviously), there are those which employ search engine techniques, to ﬁnd matches on the Internet. Secondly, there are those which ﬁnd similarities between ﬁles on a single computer; these are intended primarily to detect collusion. Thirdly, there are those, of which Turnitin is the best-known example, which build up their own archive databases from past essay submissions, and agreements with publishers. It is this third type which provides us with the tools to defeat plagiarism from any source, whether or not that source is Internet-derived.
Many packages use only the ﬁrst or only the second technique. For example, EVE2 only ﬁnds matches on the Internet, whereas CopyCatch Gold and WCopyﬁnd are collusion detectors. Turnitin uses the ﬁrst and third techniques (and it is also possible to use its archive to check for collusion). Viper uses all three. In 2001, a number of packages were reviewed in a Technical Review of Plagiarism Detection Software Report, prepared for Joint Information Systems Committee (JISC). EVE2, Turnitin and CopyCatch were included in the review. The review of WordCHECK links to a website which now links to Viper. Though the report is now quite old, and the software itself has moved on, the general principles identiﬁed in the report remain valid.
There is also the issue of what is searched for. Restricting the search to an exact string will not catch the student who makes minor changes to a plagiarised passage, whereas not so restricting it can result in many false leads. A software package needs to be able to compare passages of realistic lengths, and not be fooled by minor differences between the target and the checked passage. If the report allows the marker to easily compare suspicious parts of the essay with original sources, the match need not be particularly exact, especially if the software is to be of value in discovering how students construct essays, as well as in detecting plagiarism strictly so deﬁned.
Even packages which only ﬁnd material sourced from the Internet should become increasingly more effective, as material is increasingly made available online, unless digital rights management techniques are used to protect such material.
There are packages which compare documents on a single computer, whose main use is for detecting collusion between students, if, for example, all submitted essays are stored in a directory on the marker’s computer. Essays can also be compared against anything else on the marker’s computer, enabling (subject to copyright) private databases of likely sources to be held locally; indeed, Viper’s instructions positively encourage checking material held locally. In a specialist area, even quite a small local archive is likely to be a formidable tool.
Computers can greatly assist us, therefore, in combating a problem that they themselves have created. But we cannot abrogate our judgement to the machine. Care and professional judgement are needed to interpret the results. For example, Turnitin’s ‘Overall Similarity Index’, which records the percentage of the essay which matches an Internet or archived source, means very little, and a marker has to read the report very carefully to evaluate it. Many legal phrases, statutory provisions, etc., will naturally be on the Internet, and an ‘Overall Similarity Index’ of zero (even assuming the option has been taken to exclude direct quotes) would be neither expected nor indeed desirable. To some extent, perfectly legitimate paraphrases might also be caught, or conceivably the adoption of a writer’s views, but in an original context. But after all, careful evaluation is what markers do. It properly remains the role for the academic, and not the machine, to make a ﬁnal judgement.
It is difﬁcult to see that a student can object to the use of plagiarism detection software as such.
Any marker will check for plagiarism, as thoroughly as time and other resources allow.
There has been litigation in the United States, however, about the archiving of essays. In A.V. v iParadigms LLC, high school students sued iParadigms, the producers of Turnitin, claiming that the archiving of their essays amounted to a breach of their copyright in them. iParadigms claimed that they were entitled to the defence of fair use, and also that the students, by clicking on an ‘I Agree’ button when they created their user proﬁles to submit essays to Turnitin, had consented to the use. In the US Court of Appeals, iParadigms succeeded on the fair use issue, and the court did not need to consider the issue of consent.
Space does not permit a detailed consideration of intellectual property law, but we should certainly not assume that fair dealing is deﬁned in the same way in the UK as fair use in the US, nor that archiving essays would be regarded as a permitted act.
It is conceivable that a plagiarism archive would be protected by the notice and take-down provisions of the Electronic Commerce (EC Directive) Regulations 2002, at any rate until it had notice of a copyright infringement, but this cannot be certain, since the user of the service (probably the university) might well be regarded as ‘acting under the authority or the control of the service provider’, in which case the immunity conferred by the regulations would not apply.
Given that in the UK, a fair dealing defence would almost certainly fail, and given also the fragility of a public interest defence, it would be wise for universities also to obtain the consent of students, before submitting essays to an archive.
That is not the end of the problem. A student whose essay contains an appropriate proportion of quotes from elsewhere, properly acknowledged, will not infringe the copyright of the author quoted, but a plagiarised essay will, and the archiving might therefore infringe third party rights. Again, it would be wise, if possible, to deal with this through consent, and it is probable that many publishers would indeed consent to allowing their material to be used in the ﬁght against plagiarism. Indeed, it might be possible to set up a licensing scheme, similar to that operated by the Copyright Licensing Agency (CLA) in respect of photocopies, etc.
Consent might not always be obtainable, however, an obvious category of objectors being writers of essays intended for sale in paper mills, and the owners of such sites. The application of the public interest defence would be the same for paper mills as already observed for essay banks, and, given its fragility, it would be wise for software designers to design so as to be able to exclude essays where plagiarism is identiﬁed, as well as material where stringent objection is taken, by the copyright owners, to its use.
One of the objections taken by the students, in A.V. v iParadigms LLC, to the archiving of their essays, was that if they later submitted the same work to a literary journal it would appear to be plagiarised, though their own work. The District Court, whose view was upheld in the US Court of Appeals, had said:
Anyone who is reasonably familiar with Turnitin’s operation will be able to recognize that the identical match is not the result of plagiarism, but simply the result of Plaintiff’s earlier submission. Individuals familiar with Turnitin, such as those in the ﬁeld of education, would be expecting the works submitted to have been previously submitted.
If this reasoning is convincing, it is another example of the care that needs to be taken when considering a Turnitin (or similar) report.
It is possible to counter plagiarism by setting tasks which make plagiarism more difﬁcult. In law, from my own experience I know that we can set problem questions, changing them each time the assignment is set, and use very short deadlines, making plagiarism more difﬁcult. We can reduce the proportion of coursework assessment and increase the role of the traditional examination. We can increase use of oral presentations and (at least to assess basic knowledge) multiple choice questions, which are impossible to plagiarise.
In the longer term, it is possible that plagiarism detection techniques will fail to live up to their potential. A darker possibility is for new software to emerge, written to frustrate plagiarism detection, thereby creating an additional challenge for plagiarism detection software.
But it is also possible that plagiarism detection software will become a seriously effective tool. Plagiarism requires at a minimum the copying of a text document from another source without acknowledgement.
Whatever the motivation, plagiarism constitutes bad work, but if it results from a failure of understanding, or from time pressure or incompetence, it may not be appropriate to penalise it further. What causes our concern is the student who deliberately passes off another’s work as his or her own, pretending to a merit that he or she does not possess. We take plagiarism so seriously, and punish it, because the motivation might be of the second type, rather than the ﬁrst.
But suppose we lived in a world where the student knew the essay would be tested, and all sources discovered; indeed, he or she could even see the report, before the essay was submitted. Copying will be evidence of incompetence, rather than dishonesty. Indeed, once the taint of dishonesty is removed from the equation, we might even place a value on the ability simply to ﬁnd material effectively on the Internet (while the value we place on this skill may not be high, we cannot entirely deny its utility in the modern world).
Plagiarism detection software will never be able to defeat the determined and well-funded cheat. Students will still be able to buy bespoke essays, written by others on their behalf, and if they are never re-used they will continue to go undetected. Since these essays will lose their value after just one use, and since the ghost-writers will know that their own work will be tested, it may be supposed that this form of cheating will become more expensive, and therefore rare. We may prevent only 95 per cent or 99 per cent of cheats, but that is a lot better than nothing, even if we cannot detect 100 per cent. That is, after all, the basis of much crime prevention.