It is widely recognised that recent improvements in generative artificial intelligence (GAI) pose significant challenges for higher education. In my role as GAI and Education Lead at Birmingham Law School, I have conducted a review of the existing literature on the issue, with a particular focus on the impact GAI has on assessments. This blog post summarises the findings of that review, arguing that the advent of GAI means a radical rethink of the way we assess is urgently required.
Should we be paranoid about the androids?
GAI is capable of creating original content (in the form of text, images, videos etc.) in response to prompts from human users. It works by using statistical models to analyse a set of data (known as ‘training data’) and then looking to generate new data which has similar characteristics. In the case of text generation, so-called ‘large language models’ (LLMs) are used to analyse huge amounts of text, which allows the GAI to reproduce realistic syntax and semantics, as well as veridical substantive content.
Today’s LLMs are capable of producing coherent, complex texts that are capable of passing for the work of a human author. This obviously creates a difficulty for university assessments: put simply, students could get GAI to complete their assessments for them. The only way of avoiding this possibility entirely would be to carry out all assessments in conditions in which students have no access to GAI technology; most obviously in the form of in-person exams. This would not, however, be desirable, since there is ample evidence that in-person exams have a tendency to encourage rote learning at the expense of higher-order cognitive skills.
Fitter, Happier, More Productive? Potential educational benefits of GAI
There are good reasons not to prohibit the use of GAI by students. It seems clear that GAI tools are likely to become increasingly integrated into the routine work of lawyers and other professionals, as well as in day-to-day life. We therefore need to work towards creating degree programmes that will give students the skills to use such tools effectively. And this is likely to require devising assessments that will evaluate these skills.
Furthermore, the educational literature identifies a range of potential advantages from incorporating GAI into teaching and learning. Numerous authors have argued that GAI promises a more personalised education experience, with the potential for self-paced learning and individualised feedback. Researchers have also pointed to the capacity of GAI to promote active learning, increase student engagement and improve outcomes for those who are not studying in their first language. It has also been argued that the use of chatbots can help students acquire basic knowledge more rapidly, thus allowing instructors to spend more time on encouraging higher-order learning.
Optimistic? I might be wrong
We should not, then, simply try to keep GAI out of higher education. However, we must also acknowledge that much of the optimistic rhetoric about GAI being an ‘opportunity’ not a ‘threat’ to university teaching and assessment is unduly glib. The concerns around academic integrity, threats to our students’ ability to think critically, as well as broader ethical worries, are genuine and substantial, and brushing them off as retrogressive fails to recognise the complexity of the issues at stake. A comment often found in opinion pieces is that academics worried about the impact of GAI in education are akin to those who, in the 1970s, campaigned to ban pocket calculators from schools. But this comparison overlooks both the fact that the introduction of calculators did require substantial reform to the way in which maths is taught and assessed, and that, in any event, GAI is not in any meaningful sense analogous to a calculator. We should not bury our head in the sand by pretending that we can keep GAI out of higher education. But nor should we naively assume that the problems raised by GAI will simply sort themselves out – blind faith in the benefits of technology is just another way of refusing to engage with a difficult problem.
Scatterbrain: GAI as a threat to critical thought
One very real concern is that the use of GAI by students can inhibit critical thinking. As one study put it, GAI ‘simplifies the acquisition of answers or information, which can amplify laziness and counteract the learners’ interest to conduct their own investigations and come to their own conclusions or solutions’. This problem is compounded by the well-documented biases that GAI displays, with evidence suggesting that it does not merely repeat, but can in fact exacerbate, stereotypes that appear in its training data. More fundamentally, there is a concern that users of GAI will take its outputs as information – that is, objectively true statements of fact – rather than as content to be approached with a critical or even sceptical eye.
The Trickster: GAI as a threat to academic integrity
By far the most frequently-raised concern about GAI in higher education is the difficulty it raises for academic integrity, with evidence suggesting that the problem of students illicitly submitting GAI-generated work is already a significant one. Cheating is, of course, not a new problem, though it is one which has attracted increased attention since the widespread shift to online assessments during the Covid-19 pandemic. Recent research shows the numbers of students who admit to cheating is disturbingly high: a recent survey of students at an Austrian university revealed that 22% had engaged in plagiarism; and a report by the UK Quality Assurance Agency estimated that one in seven recent graduates may have paid someone to undertake their assignments. In this respect, GAI is not creating a brand new risk, but it does make misconduct more tempting to students who might not have been prepared to use an essay mill, much like file-sharing made copyright violations acceptable to consumers who would never have dreamed of shoplifting from a record store.
There are numerous empirical studies that show that GAI can perform at around, or slightly higher, than the average student in multiple choice tests, short essay questions and reflective writing, and my own experimentation suggest that it has a similar capability to answer problem questions and coursework essays. I am therefore of the view that the threat that GAI poses to the integrity of our assessments should be treated with the utmost seriousness.
Not OK Computer: the limits of technological solutions
Those hoping for a technological solution to the problem are likely to be disappointed. Software that purports to detect the use of GAI is not reliable, and it is not difficult for a determined student to avoid detection by making relatively small changes to GAI-authored text. Online proctoring systems suffer from concerns relating to privacy and discrimination, and in any event it is not clear that they are effective. Lockdown browsers are easily circumvented by students simply using another device to access GAI chatbots.
How can you be sure? The limits of manual detection
Studies that have used GAI to answer university essay questions have found certain characteristic indicators, namely:
- failure to meet word count requirements;
- a failure to cite core texts;
- ‘hallucinated’ references (i.e. references purporting to cite sources that do not, in fact, exist);
- bland, repetitive language;
- overelaborate metaphors;
- the over-use of certain words and phrases, such as ‘nuanced’, ‘illuminate’, ‘intricate tapestry’ and, most of all, ‘delve’.
Paying careful attention to these hallmarks can allow a marker to identify suspicious cases. However, numerous studies have shown that even experienced markers struggle to detect work that has been written by generative AI. There are a number of reasons for this. Firstly, with the possible exception of hallucinated references, none of the indicators of GAI use comes close to being a ‘smoking gun’. Secondly, all of these shortcomings can be circumvented by a careful student who uses effective prompts and double-checks any references provided by GAI. Thirdly, markers work under huge time pressure, and it is unrealistic to suppose that they could engage in the kind of careful scrutiny of a text required to reliably detect the use of GAI. For these reasons, the authors of a recent large-scale scoping review concluded that ‘unproctored online examinations are no longer a meaningful summative assessment method’. Sadly, I find myself agreeing with that conclusion.
Everything in its Right Place: a ‘two lane’ approach to assessment
If we accept that students will, and indeed should, use GAI in their studies, and that this fact brings with it a serious threat to the integrity of out current assessment modes, then it follows that we should rethink the way in which we are assessing. One can find numerous suggestions online as to how assessments may be ‘GAI-proofed’, but in my opinion such a strategy is a fool’s errand: given the pace of technological advancement, the fact that GAI might struggle to complete a particular task today does not imply that it will be unable to do so in six months’ time. Instead, I would suggest that higher education institutions consider implementing a ‘two lane’ approach, in which either the assessment is designed so as to completely prevent the use of GAI (‘Lane 1’), or GAI use is permitted on the condition that students upload their interactions with GAI along with their submission (‘Lane 2’).
Examples of Lane 1 assessments include traditional in-person exams, oral exams, in-class assignments, presentations that include a significant Q+A element and dissertations combined with a viva. Moving to lane 1 assessment types might simply involve moving back to the way in which modules were assessed pre-Covid, though those who previously used in-person exams may wish to consider whether an alternative in-person assessment type might better promote higher-level learning outcomes.
Designing effective Lane 2 assessments is trickier, since incorporating GAI use into assessments takes most of us into unchartered waters. Nevertheless, there are a number of suggestions available. Students could, for example, have GAI assist them in producing a piece of work, and then submit this together with a record of the interactions they had with GAI and a piece of writing reflecting on the extent to which they found the GAI assistance useful, and why. A related idea would be for students to use GAI to produce an authentic artefact (e.g. an advice letter to a client) and then work to improve it. Assessments along these lines could potentially both allow students to harness the benefits of GAI-enhanced learning while also educating them about some of GAI’s limitations.
Bulletproof… I wish assessment in higher education was
The inexorable creep of GAI poses a serious threat to academic integrity in higher education, and the idea that we can go on as before is no more than a nice dream. If GAI is capable of passing our assessments with good grades, then there is a risk that our awards will be seen as just fake plastic degrees. Ignoring the problem – either by looking to ban GAI entirely or by embracing it uncritically – will ultimately let down our students, and leave us high and dry. While reform to teaching and learning in higher education tends to go slowly, anyone currently using online examinations needs to urgently rethink.