ChatGPT4 failed my pragmatics exam

I gave ChatGPT4 the prompt for the exam and my students criticize its response

May 12, 2023

I do not, despite what one may surmise, spend the most part of my waking hours coming up with ideas to embarrass AI programs (see below).

Artificial, perhaps; intelligent, not in the least.

Salvatore Attardo

April 20, 2023

The debate on artificial intelligence has been simmering since the 1950s, but has come to a full boil with the release of ChatGPT4 earlier this year. I use a “bubbly” analogy for a reason. My point is that, like the stock market bubbles, in which hype leads to absurd valuations that eventually come crashing down leading naive investors to ruin, AI is ab…

Read full story

However, I had noticed that some of my students were beginning to quote ChatGTP4 and other AIs in class. Fine, I said to myself, if they want to treat AI as a source, let’s assess the source. So, I asked ChatGPT4 the following question: “Please discuss the influence of H. P. Grice on pragmatics, with a focus on comprehensiveness, in less than 500 words.” Then, as part of their final exam, I gave the students of my Pragmatics graduate course the answer generated by ChatGPT4 (see appendix). The students’ assignment was the following:

I would fail the answer that ChatGPT provided. Your assignment is to explain why. You should do so in less than 500 words and in so doing explain what ChatGPT4 did wrong.

Having been a teacher for over 30 years, I also clarified that I was not asking them to comment on the technology underlying the AI or on the ethics of AI, etc. This was a pragmatics class and the exam was a pragmatics question. In short, how good was ChatGPT4 answer as far as knowledge of pragmatics goes?

The students duly complied and I now have about thirty answers to my question. The students’ responses range from the naive observations to to sophisticated critiques of the comprehensiveness of the answer. Let’s start with the naive criticisms: true, ChatGPT4 only mentions two out of four maxims and it does not mention the Cooperative principle by name, but it correctly states that Grice is known in pragmatics primarily for his theory of implicature. The question was not about the CP or implicature, but about the influence of Grice. So, point for ChatGPT4. My students also noted that ChatGPT4 fails to quote any sources. Fair enough, but I did not ask it to do so, so we can’t really blame it. So, a draw on this one.

Alas for the Jarvis-hoping masses, that’s where the good news ends. My students spotted several show-stopping, F-inducing errors in the answer: the biggest one is that ChatGTP4 gets the theory of implicature wrong. Implicatures can be derived from flouting the principle of cooperation, as ChatGTP4 correctly states, but also from following it. Another doozy is that the example of flout-based implicature is also wrong: saying “I have no money” when asked for a loan is either a lie (violation of the maxim of quality) or an exaggeration (in fact I have some money, but not enough to loan) which is not “not saying enough” as ChatGTP4 states. Furthermore, a speaker who says “I have no money” when asked for a loan is not implying that they don’t want to loan the money, they are politely declining by implying the cannot loan the money. More on ChatGTP4 ignoring politeness later.

We now come to the really painful part: comprehensiveness. Recall that I specifically asked for the answer to be comprehensive. And yet, ChatGPT4 ignores the rational underpinning of Grice’s theory. To be fair, this is not elementary stuff and it requires a graduate-level understanding of the underlying philosophy of Grice, as can be gathered, for example, from the entry on Grice in the Stanford Encyclopedia of Philosophy (https://plato.stanford.edu/entries/grice/). Not anyone’s idea of light beach reading, admittedly, but definitely a source that any self-respecting AI has access to and would be expected to have “read.” That’s bad, because precisely the rational foundation of Gricean implicature is what made it attractive to the “standard” politeness theories, such as Brown and Levinson or Leech, which is precisely an area of influence that ChatGPT4 ignores, as we saw above.

Other areas that ChatGPT4 ignores is the undeniable influence of Grice’s ideas on Relevance Theory (which started specifically from Grice’s maxim of relevance—and ran with it), on speech act theory (indirect speech acts, anyone?), and on Gumperz’s “conversational inferences.” Furthermore, the significant work in experimental pragmatics and the application of pragmatics to second language acquisition testing the actual psychological reality and teachability of Grice’s ideas gets no mention at all. So, sorry to be blunt, but comprehensive ChatGPT4’s answer is not.

Yet, ChatGPT4 discusses “criticisms” of Grice’s theories. Recall that my question was about the “influence” of Grice, so it may appear odd that ChatGTP4 chooses to discuss criticisms of his approach. Several students took ChatGPT4 to task for this. However, I have to side with the AI here (I hope when Skynet takes over they remember this and spare me—I can be very handy with the WD40 and who doesn’t like a well oiled gear?): an influence can be negative. Reacting against a set of ideas molds a field just as much as following it. So, ChatGTP4 is justified in discussing the criticisms of Grice’s work, but if you are going to go there, then it’s only fair to discuss that these criticisms have been addressed and soundly refuted, several times.

Alas, in this area ChatGPT4 falls flat on its artificial face again. In a paragraph it states that

Some scholars argue that his focus on conversational maxims and implicature overlooks other important aspects of language use, such as the social and cultural context in which communication takes place.

which is a fair comment, as I discussed above, if one sided. But then, literally three sentences later, it opines that

Moreover, his emphasis on the importance of context and the social dimensions of communication has been influential in shaping the broader field of linguistics.

Several students caught the contradiction: if you quote, without disagreeing, “some scholars” as blaming Grice for a lack of interest in “the social and cultural context” of communication, then you cannot conclude that Grice’s emphasis on “the importance of context and the social dimensions of communication.” You might get away with this with a long discussion of how the scholars who blamed Grice were in fact wrong and Grice was the covert founder of sociolinguistics (Relax! He was not; I’m just making a point). But there is nothing of the sort. The truth is that the last sentence is a generic banality that applies to pretty much any research in pragmatics in the past 50 years or so. Context and the social dimension of language is pretty much the definition of pragmatics, except perhaps for the kind of formal pragmatics that Morris and Stalnaker had in mind, in 1938 and 1972, respectively.

In light of ChatGTP4’s incapacity to earn a passing grade in an easy graduate-level question about pragmatics, which required only the capacity to aggregate a bunch of information, what are we to make of the recent survey that shows that more than 50% of students said they would use AI despite being forbidden to do so by their institution? An obvious answer is that students have been swept up by the hype about AI and so they are in fact unaware of how flawed the answers are. Another possibility is that the students in the survey just don’t care. They are not interested in learning; they merely want a degree for some functional purpose (getting a job, for example) and they are just going through the motions of academic life. The first group, we can help, perhaps with assignments like the one I used. The second group—well, they will soon be replaced by ChatGPTn, so who cares? They will learn the hard way that there are no short cuts in knowledge. If you want to know something you have to learn it. The alternative is bullshit.

Michael Loncar

May 13, 2023

Amen! Keep the critiques coming. The avalanche of BS about the positive educational uses of Chat GPT4 (mostly devoid of empirical evidence) need to be countered.

The end is powerful/insightful. We just have to help those that want to learn, which is a process.

"Another possibility is that the students in the survey just don’t care. They are not interested in learning; they merely want a degree for some functional purpose (getting a job, for example) and they are just going through the motions of academic life. The first group, we can help, perhaps with assignments like the one I used. The second group—well, they will soon be replaced by ChatGPTn, so who cares? They will learn the hard way that there are no short cuts in knowledge. If you want to know something you have to learn it. The alternative is bullshit."

Expand full comment

Aug 25, 2023Edited

I have a lot of issue with this. I can't tell what your main point is, but you seem to be making both: 1) GPT-4 isn't accurate or 2) Students shouldn't be using it.

- Your students may know what you want from then when you ask them to be "comprehensive", but you can't expect an LLM to know what you mean. The word "comprehensive" is doing too much of the work here and the prompt should be more detailed (also, comprehensive in less than 500 words?).

- You only prompted one time. You did no second prompt, nor refining.

- A good prompter would first obtain even a vague idea of the topics that should be covered and feed those in. Again, the LLM isn't going to read your mind.

- You forgave it for not citing sources because you didn't ask it to, however you don't seem to realize that often an LLM literally cannot do this. Unless it's directly from a book or something with nice title, it will often be unable to generate this information from it's weights ("knowledge") because it's not a database, that's not how the tech works.

- Combined point: your prompt is bad. Not only is it a bad prompt for humans, it's a really, really bad prompt for an LLM.

Many professors feel the same way you do, so you're in good company, but you all make the same mistakes - you stink at using this technology. It's a skill, and a single shit prompt isn't it. Having graduated college and coming back 15 years later for follow-on degrees, I have had professors blanket ban to use of LLMs, and a class that *required* using it. Deciding what is actually valuable to learn seems to be very difficult for some reason. What these AIs are doing, and thank God for this, is forcing professors to realizing that writing an essay on a topic isn't a good way to learn. It's as archaic as "teaching" Shakespeare in high school. Decide what is valuable for a student to know. Don't make them guess when they do they are forced to do open-ended research - much like what you wanted in your "comprehensive" paper.

Idle Thoughts of a Busy Fellow

Artificial, perhaps; intelligent, not in the least.

Discussion about this post