The research paper as a tool to assess a student’s mastery of a given topic is dead. The killers are of course the various AI tools that can produce a document reasonably close to a mediocre paper by a graduate student or something that can be edited easily to look like one. In this post, I am not interested in the ethics or even the consequences for the profession of this fact, although I may go back to them in further posts.
Consider the irony of spending hours to prove that a paper was partially generated by AI, only to find out that the student will get away with it because you cannot prove with absolute certainty that they did so. The asymmetry is striking: one can get a perfectly average sounding paper from any of the AIs on the market, for free or for very little money, in about 5 minutes. Conversely, it will take serious work for a faculty member to determine that cheating took place, and because the tools to do so are probabilistic, at best they can say that it is likely that an AI wrote it. There is however no guarantee that this is correct. This is not plagiarism where there is identical text that provides irrefutable evidence (although AIs have been known to outright plagiarize as well). Remember also that a student of average intelligence will know to re-read the output of the AI and make changes, for example deleting the “AI give-away” expressions such as “delve”. A reasonably smart student will have read the same articles you did on what words are “give away” of AI use. Can AIs detect AI-written papers? Long story short: no, they cannot, at least not reliably.
The Viva-Voce Exam
What’s a university teacher to do? We know our students are using AIs (https://www.wsj.com/tech/ai/chatgpt-ai-cheating-students-97075d3c). We could embrace the use of AI to write papers, a sort of hybrid process where the student refines their prompt and edits the output until they get a reasonable product.
However, this ignores that the point of asking a student to write a paper was not to get a mediocre paper summarizing research we’ve already read. The point was for the student to acquire knowledge and display it, thus proving that they had learned what we were trying to teach them. The research paper was not a perfect tool to do so, of course. Nonetheless it did the job with some degree of reliability. Increasingly, students are figuring out that they can get away with AI-generated papers. Obviously, not every paper you will read this semester will be, but if my experience is anything to go by, a significant number of them will be. This makes it unfair to the honest students and deprives the teacher of a tool to assess the students.
So, if a research paper is out of the question and objective quizzes are just as amenable to AI-based “help”, I suggest that the old-fashioned viva voice exam may be a solution. For the younger crowd, a viva is an oral exam: the student and the professor sit across a table and they talk about a topic. The professor asks questions, the student answers them. The professor assesses the quality of the answers and gives the student a grade. This goes back a few millennia, but the idea is still the same.

Creating a Structured Viva Voce Assessment Tool
I will present the process I have employed to create Structured Viva Voce Examinations (SVVE) using as an example a comprehensive exam for a course in Pragmatics and Second Language Teaching. Why does the “structured” aspect of the Viva Voce matter? It addresses the often heard claim that vice voce exams are arbitrary and subjective. The point of structuring them is to eliminate much of the subjectivity from the question choices. Obviously, all assessment is subjective to some degree. This includes so-called “objective tests” since someone subjectively determined what the question and what the “correct” answer is. The goal is not to eliminate all subjectivity (axiomatically impossible) but to eliminate unnecessary arbitrariness and bias from the process. If you ask a randomly chosen question, then you cannot pick an easy question for a student you like.
The other concern expressed at Vivas is that they lack inter-rater reliability. However, SVVEs have been shown to have adequate inter-rater reliability, when various examiners share the batteries of questions.
Preparing a battery of first level questions.
You should have a number of “starting” questions, for a class of around 25 students, around 8 or 10 is enough. You should prepare them ahead of time. These should be general questions such as “What is speech act theory” or “What do we know about politeness theory?” or “Tell me about functional sentence perspective.” Roughly speaking they should coincide with the topic of a chapter in the book or perhaps an important subtopic.
The purpose here is to ensure coverage of the entire topic you are covering in the SVVE by asking 3 or 4 question spanning the entire course. If you are doing a SVVE for a narrower topic, you should adjust to the gamut of the materials covered in the exam (a single chapter, a section of the book, the first half of the course in a midterm, etc.).
These questions are your baseline passing C-level performance. If the student is able to answer the question (i.e., can explain what a speech act is or what politeness theories have been examined, etc.) then they have a reasonable knowledge of the subject.
It is important that you randomize the questions, for example, by writing them on index cards and shuffling them, or letting students pick them out of a bag, or by writing a little software function that presents a choice of randomly chosen questions from your list. If the class is small and there is a low chance of the students communicating, the randomization process is less important. The main idea is to avoid having the students guess what your questions will be.
A comprehensive SVVE, covering for example the material of an entire 15 week university course, may require more than 3 or 4 questions, or if the subjects are equally important, you can let the randomization do its work. If the materials are progressively harder, or if some chapters are more important than others, you may ask more questions about the harder more important chapters and limit the questions about the less important material to a single question, just to make sure that the students did not just skip those chapters. If the material builds progressively on the previous chapters, then asking questions on the latter chapters and ignoring the introductory ones is fine.
All in all you are looking for a set of questions, representative of the materials, weighed by importance and difficulty.
Preparing second level follow up questions.
Once you have ascertained that the student has a passing knowledge of the subject of the question you need to have follow up questions to assess “deeper” knowledge. For example, if the first question has been “What are speech acts?” a follow up question could be “What is a performative?” or “What are the three acts present in any speech act?” or “what are felicity conditions?”
It is easy to see how these questions differ from the starting questions: they presuppose knowledge of speech act theory (our starting question).
The purpose of follow up (level 2) question is to ascertain whether the student has a reasonably deep knowledge of the topic. If the student can answer this kind of questions, they are giving at least B-level answers.
You may want to have a few second level questions ready, but you should not ask them mechanically. It is quite possible that in the process of answering your first level question the student has already addressed some aspects of a second level question. If that’s the case, you may want to latch on to their answer and push them to go deeper on the subject, or you can tie your question back to their answer: for example, “You mentioned felicitous speech acts. How is that defined?” The purpose here is to show that you are listening and also to be supportive of the student, by acknowledging their knowledge. If they fully answered your second level question, you may want to ask another second level question on the topic, or skip the second level and move on to the third level question.
Preparing third level questions.
These are deep questions, that presuppose the knowledge of level 1 and 2 questions. Continuing with the speech act theory example, a third level question may be “What are the felicity conditions of “promising”?” or “What are some of the realization patterns of requests?”. Third level questions can include analysis of specific examples or specific phenomena. If the student can answer this type question, you can safely give them a score in the A range. You may want to have 4 or 5 third level questions on your index card, and pick one or two based on connections to the student’s answers. You don’t need much more than a handful of questions as many of the students will not make it to the third level question.
Conducting the SVVE
The goal is to assess the student’s knowledge and understanding of the material as quickly and effectively as possible. You should ask a set number of level 1 questions (3-5 seems an optimal number, but of course you can customize to your needs) and follow up to level 2 or 3, depending on whether the student can answer knowledgeably and comfortably the questions.
One of the problems of unstructured VVEs is that it is unclear how long you should spend on each topic. Let me start by saying that it is often very clear whether the student knows a subject within less than one minute of the answer. If they are presenting the topic clearly, knowledgeably, and effectively, it is OK to stop them one or two minutes in by saying “You obviously know this very well. I am giving you full credit. Let’s move on to X.” The student and you both know that this is not a chat among friends but an examination.
If the student gets a question wrong, point out that the answer is incorrect and give them a chance to correct the answer. You should note the mistake and continue. If they are unable to correct it or perseverate in it, you may conclude that this is a systematic error and that they do not have the correct knowledge or have no knowledge of the subject. Sometimes the student will admit to not knowing the subject. If they were able to answer to some extent, but then got it wrong, you may give them D credit for the answer. If they made a mistake but corrected themselves you may want to be lenient and “let it pass” if the exam is otherwise going well.
It is often the case that a student’s wrong answer reveals a deeper lack of understanding of the subject. You should follow up on any major mistake, but not necessarily correct all imprecision and minor misstatements. If you do so the student may feel “hounded” and become nervous.
Once you have determined that a student’s level of knowledge is, for example, B-level but that they are unable to answer third level questions, end the question and move on to the next level 1 question.
There is no reason for a SVVE to last longer than 30 minutes, assuming an average coverage of subject matter. If the student is well-prepared, you may be able to finish the exam in 10 minutes or even less. If the student is less prepared, you will need more time as you will have to rephrase your question, of correct them, or point out inconsistencies, etc. If the student is clearly not prepared at all and provides no answer or clearly incorrect ones, there is no point in waiting: move on to the next question. After a string of no answer or wrong answers, you can end the exam and fail the student or give them an opportunity to retake the exam or whatever your policy is. I find that these situations are fraught, as the student is likely upset at failing and so I ask what went wrong and offer advice on how to prepare for the next exam. If this is the final exam, then just cut it short by asking for example if they have any questions and then end the exam.
If you are uncomfortable with telling the student their grade and your assessment of their performance, you may consider a policy of emailing the students with their score at the end of the exam session.
Should you provide the student with the correct answer?
If this is a final exam, it may be pointless, as the student may never encounter speech act theory again in their lives. However, if the exam occurs within a larger context in which the correct knowledge may benefit the student, then I feel that providing the correct answer is ethically the right thing to do. First, it is likely that the student will remember the correct explanation because it is now very salient. Second, it does not really matter when the student learned the correct answer. They may have failed the exam, but if they learned the correct answer, it’s a win.
Is the Viva-Voce the answer to all the problems in education? Will it cure cancer and ensure peace in the Middle East?
Obviously not. For starters, if you have more than 30 students in a class, a viva-voce exam is going to be a tough choice, as it will require 15 hours of time (30 students at 2 students per hour) or so. If you are teaching undergraduates, hiring graduate students to do the vivas is a possible option, but this will not work for graduate classes. There may be situations in which a viva is just not a feasible or desirable option. That’s to be expected. A research paper was never a universal solution either. However, I think that in many situations a viva voce exam, provided it is a SVVE, will be a tool that allows us to assess the students’ knowledge. At least until the Terminator class personal automaton with voice matching catches on. But then we’ll have other problems to deal with (see the Terminator movies).
Tips and practical suggestions
Start with an open ended question, such as “What was your favorite topic?” or “Which topic have you prepared best?”The purpose of this first question is to reassure and put at ease the students. By giving them an opportunity to shine and show off their preparation they can relax and overcome the “jitters” typical of exams. Conversely, if they are truly unprepared, this question is surprisingly revealing: if they are unable to recall any topic or unable to expound on any topic in the entire course that tells you that they will probably fail.
Try to ensure that all the questions are of the same level of complexity. That’s implicit for first level questions (since they coincide with chapters of the textbook or similar subdivisions), but it is not as clear for second and third level questions. Preparing a list of questions ahead of time will allow you to check that the level of complexity is roughly the same for each question.
Keep a written record of the questions and of your assessment of the answers. We live in a litigious society. You may have to justify your assessment. If you keep notes, remember that they may become evidence against you, so avoid any comments outside the content of the answers. You may want to keep a recording of the whole exam. Obviously you will need to disclose to the student that you are recording the exam and ask for their permission. However, you should check with your campus administration if doing so violates any FERPA directives.
Be prepared to handle floundering students: a small number of students will “fall apart” during an exam and be unable to answer at all. That’s never a good thing. Roughly speaking, this may be indicative of the fact that their preparation was insufficient, or in other words, they just don’t know the material. Alternatively, they may know the material but get overwhelmed by their nerves. The latter case can be handled by reassuring the student (for example, by pointing out previous correct answers), or by suggesting that they take some notes and a few seconds to outline their answer prior to answering. If you allow the use of the book during the exam, you may suggest that they look at their notes and/or book, to refresh their memory. Worst case scenario, especially if the student is visibly rattled, you may want to take a break or reschedule the exam.
I appreciate the wisdom (+experience) and the tact you show whenever the student does not perform well. Definitely, a very human assessment. I enjoyed especially the part where you talk about giving the correct answer: you remind us that the goal of all this process of teaching is not evaluating (or just cutting heads like the Queen in Carroll's tale), but providing effective knowledge. Better to learn it later than ever, for sure. I will discuss your post with my students, who are preparing to be teachers too. The fine irony in the title "Will it cure..." etc. makes it evident that information is shown as ressolutive and complete, making our palate search for more fast-feeding information: we usually search for simple answers and quick solutions for everything (bálsamo de Fierabrás, Cervantes called this). You give us, on the contrary, many motives to open reflexion. In Spain our students are not used to Viva Voce exams, they dread it, but the students coming from Italy feel at ease: is this ability then cultural or the effect of training? I believe we can always learn to discuss subjects, and here, the sooner the better. Thank you.
Yes, most exams are still orally done in Italy, both at universities and in high schools, where I teach. I was asking myself the same questions, at a lower level, and since teenagers are even more keen on cheating - I myself have found some this year, proving with enough certainty their text was AI-generated using AI tools - the irony, and time, as you wrote - I was wondering what to do. I am thinking about continuous assessment, i.e., a grade for each interaction/answer through the school year. Maybe that's too demanding on the teaching side. I'll think about it, but for now thanks for your deep thoughts!