Asking students to critique ChatGPT

If the “Essay is dead”, is this activity a worthy replacement?

One of the first questions many educators and journalists had when ChatGPT dropped late last year was: “Will AI kill the academic essay?”

There are many avenues to address this question and the purpose of this post is not necessarily to give a polemic or defense of the traditional essay. I recommend Daisy Christodoulou’s thoughts on the topic. However, as we move forward in responding to AI in education, it is vital that we are not drawn to quick and shallow solutions.

I have been concerned with a fairly simplistic, but also problematic “solution” when some people are calling for the academic essay to be replaced. The logic goes something like this:

Due to the ease with which essays and similar written assessment tasks can now be produced with the help of Generative AI, we need to change these traditional assessments. Instead, students can now have ChatGPT generate an essay and they can spend their time analysing and evaluating what is produced.

From my perspective, there are two key problems with what is being promoted above as a substitute for essay writing:

1. Is the task realistic?

Is it reasonable to expect school-aged students to authentically critique the work produced by a sophisticated generative AI such as GPT4? This specific AI can currently score 90% on the American Bar exam, a feat beyond the capability of most adults, let alone school-aged students. What criticisms will they scramble to find?

  • Text structure and accuracy of writing? Unlikely.
  • False or misleading information? Maybe. Generative AI can certainly be wrong or simply make things up (hallucinations). The more relevant knowledge a student has in their long-term memory, the greater chance the individual has in identifying false or misleading information. However, without this necessary schema, students will likely be lost in discerning what is trustworthy. Maybe this comes from a position of academic snobbery, however, I question the inherent value found in the vanilla thoughts of a Large Language Model. Do we want to consume too much of our students’ time critiquing these initially impressive, but heartless artefacts? I would rather our students spend their time being immersed in the wonderous problems found within the Western Canon and other diverse cultural artifacts. Similarly, there is more educational value when students are asked to engage and critique different views surrounding current day events. Critiquing what a robot thinks will just get boring in comparison.

  • Bias? Many on social media have accused ChatGPT of sounding like a politically left-leaning, Silicon Valley elite. There are examples of the artificial intelligence expressing more gripes about Trump compared to Biden. Some have found that GPT3.5 is more likely to make blasphemous jokes about Jesus compared to other religious figures. However, when spending time conversing with ChatGPT (especially GPT4), I have personally found that the bot actually brings a tangible amount of nuance to contested issues (far more nuance compared to what is often found on Twitter). The generative AI usually attempts to give two sides to an issue and acknowledges when a contentious position has its critics. As with all critical literacy or source analysis, bias can be found in what has been produced, but bias is also in the eye of the beholder. Notwithstanding, can school-aged students successfully identify bias generated by ChatGPT? Maybe. Once again, the more students know about the topic, the more likely they will sniff out subjective bias found in the machine. But according to Sam Altman, the CEO responsible for ChatGPT, as this generative AI develops, it may end up having more objectivity than most humans. Hence, just like Wikipedia, in time, this tool likely going to be a source of information to learn from. Better we have students spend their time critiquing the bias found in the human world.

Ultimately, if students are asked to critique AI-generated text, particularly in language-rich domains like English and the Humanities, the complexity of this task could lead to an unrealistic difficulty in finding genuine criticisms. Given this challenge, I predict a high likelihood that students will resort to an alternative approach. They may, in fact, turn to AI for assistance in critiquing the AI. Calling into question on why this is being proposed as a worthy replacement for the essay.

2. The assessment construct is totally different

While the academic essay might eventually be replaced with more effective assessments in an AI era, it’s crucial we don’t discard the valuable assessment constructs the traditional task aims to measure. I have argued elsewhere that the pressing challenge for educators who are too slow to respond to AI will be a significant compromise to the construct validity of their assessments. On the other hand, if we hastily jump to shallow assessment solutions, we still may negate the assessment constructs we ought to be measuring.

What is an assessment construct?

An assessment construct refers to the specific trait, attribute, skill, knowledge, or ability that an assessment or measure is designed to evaluate.

Depending on the context, here are some of the assessment constructs inherent in an academic essay. We are attempting to measure an individual’s ability to:

  • produce a clear and coherent thesis
  • organise and structure arguments
  • demonstrate content knowledge
  • demonstrate critical analysis and evaluation
  • Support arguments with synthesised and well-referenced evidence
  • write accurately with engaging style

One can argue that most of these assessment constructs are not exclusively found in essay writing. They can be measured effectively in other assessment types. However, they need to be protected. Young people need to continue to think critically about topics or issues charged with inherent value. When students are asked to write an essay on a topic of significance, they need to synthesise their relevant knowledge and understanding to critically analyse and evaluate the key factors to ultimately form a position on an issue. This task gives the students an opportunity to join the human conversation about important societal matters.

If we instead ask students to analyse a robot’s thinking of a similar issue, we completely change the assessment construct. We are in fact assessing a totally different domain entirely: their ability to perform literary criticism. This is certainly another valuable assessment construct. But it is not a simple swap with essay writing. Despite the fact that an emerging goal of modern education will be for our students to learn how to evaluate what they consume from generative AI, we must not be so quick to replace the important assessment constructs embedded within an academic essay.

We are time poor in education. Rather than spending too much time analysing what ChatGPT thinks, students need to continue to critically grapple with big issues themselves. Until we find or create task types (potentially enhanced by AI) that can effectively, and at suitable scale, measure this valuable assessment construct, the academic essay ought to stay on life support.

Leave a comment

Comments (

0

)