An attempt at a zero-shot AI discriminator
I have become interested in detecting AI generated text. I have been reading about zero-shot methods; methods that don't require and aren't trained on explicit examples of AI and human-written text. These methods have a frequentist, goodness-of-fit flavor to them. One constructs a statistic from the text under consideration, and compares it to an expected distribution were the text AI generated.
Naturally, I wanted to detect AI text in a Bayesian way. My first thought was, when we consider whether a text was AI generated, the alternative is that it was human-generated. But which human! A child? A high-school student? A Nobel-prize winning scientist? A non-native English speaker? My second thought was, which AI! Which language model? Sampled at what temperature? With what system and user prompts?
We should make use of any information we have about the possible origin of the text. For example, suppose we were setting a homework problem for high-school students and wondered whether they copied and pasted our question into an LLM and submitted that as an answer, or whether they wrote an answer in the old-fashioned way. In this case, we would know that the AI prompt was likely to be the exact question we asked, and that the human writer might be a high-school student.
Two models for the generation of the unknown text could thus be:
- Model 0: Text generated from an unknown LLM with the homework question copied and pasted, and off-the-shelf system prompts and temperature.
- Model 1: Text generated from a high-school student with a reasonable grasp of the subject matter and strong English language skills.
My weapon of choice in model selection is, as always, the Bayes factor. This requires us to write the likelihood of the text under the two models. My idea was to use LLMs as surrogates for the two models. We usually interact with LLMs by chatting to them: here, we are sampling new text from them. Under the hood, they generate new text by computing the probabilities of parts of text, called tokens. Thus, if you load an LLM model in, e.g., the transformer library in Python, as well as sampling text from an LLM, you can also compute the probability of that LLM generating a sequence of text that you specify from a given prompt.
Thus, I would use LLMs as surrogate models that would approximate my models 0 and 1:
- Surrogate model 0: Text generated from a particular LLM, prompted by the homework question.
- Surrogate model 1: Text generated from a particular LLM, prompted by the homework question, and with a system prompt to answer in the manner of a high-school student with a reasonable grasp of the subject matter and strong English language skills.
For a given answer to the homework question, I could compute the probabilities of that answer under these two surrogate models. By taking the ratio of these probabilities, I would have the Bayes factor $$ B = \frac{P(\text{text} \,|\, \text{LLM prompted by homework question)}}{P(\text{text} \,|\, \text{LLM prompted by homework question and system prompt to answer in style of student})} $$ The Bayes factor would tell me the relative probability of the text originating from surrogate model 0 versus surrogate model 1. If the surrogates are reasonable approximations to the real models, it would inform us about the chances that the text came from an AI versus our anticipated type of human writer.
I had an idea that one could tweak the precise system prompts by sampling texts from the two surrogate models until they were in line with our expectations. This is analogous to checking different choices of priors; a recommended part of any Bayesian workflow.
Does this work? The Bayes factor was computationally cheap (cheaper than sampling new text) and easy to compute using the standard transformer library and a freely-available Qwen pretrained LLM model. It appeared to perform well in discriminating quite badly written English text from smooth and polished AI text. Unfortunately, it's easy for adversaries to beat our system by prompting their own LLM to answer in the manner of a student. Furthermore, the resulting Bayes factors were sensitive to arbitrary aspects of the prior. That is, the Bayes factor was sensitive to the exact system prompts that I used in our surrogate models. Changes to the system prompt that appeared innocuous led to huge changes in the resulting Bayes factor. E.g., slight changes to my system prompt could change the probability of the first token significantly, to the point where the whole result changed.
For that reason, I abandoned this idea. The surrogates might appear to well-calibrated and generating acceptable sample text, but they secretly encode strange and arbitrary assumptions about the anticipated tokens. Additionally, I was concerned about the ethics of profiling the human writers in my system prompt.
Tags: ai
Shushan
Visited Shushan, a village in northwestern Suzhou at the base of a mountain. Thoroughly enjoyed the boardwalk through the terraced tea farms and fruit orchards.
The grace and stamina of the workers picking the tea was inspiring. I've since read that they are growing Biluochun tea, one of the ten great teas of China, and that the mixture of tea gardens and fruit orchards enhances the flavor of the tea.
Tags: travel
Supremacy
Read and enjoyed Supremacy: AI, ChatGPT and the Race That Will Change the World by Parmy Olson. It tells the story of the development of LLMs and more generally generative AI through the rivalry between Sam Altman and OpenAI, and Demis Hassabis and Deepmind. This isn't a technical book and you won't find explanations about AI here.
Recurring themes are the relationship between AI and big tech, corporate governance structures that ensure safe and ethical AI, and the Faustian bargains and compromises that Altman and Hassabis make with Google and Microsoft. As the race between Altman and Hassabis heated up, they needed compute that only billions of dollars could buy. At which point things changed from 'solve intellgience and then solve everything else' to 'solve intelligence and then create value for shareholders.'
I found the book overly trusting of both Altman and Hassabis and their stated ideals. After all, it mentioned the case of Sam Bankman-Fried. After his convictons, he indicated that his beliefs in charity and altruism were exaggerated. However, on the whole I found it informative and genuinely exciting.
Tags: ai
Zen
Read Zen and the Art of Motorcycle Maintenance. A remarkable novel, or perhaps three novels in one, interspersed with one another. The ideas explored in the novel about quality and our relationship with technology feel so relevant today. On the whole I found the philosophical arguments put forth persuasive and compelling, apart from a few criticisms about the philosophy of science.
Read two works of children's literature with family. First, Stuart Little. I can't understand why this is a classic of American children's literature. It was alright, but hardly memorable and ended abruptly to me. Second, The Heartwood Hotel. This was a wonderful book, full of imagination and joy, destined to become a classic, I think.
Tags: books
Muzzy
Read Muzzy Izzet's autobiography (Muzzy) on the commute to and from work. A strange choice, as I am not a Leicester fan.
I read it for nostalgia. I remember Muzzey Izzet as a player, as his unusual and memorable name (his full name is Mustafa Izzet) was always on regional sports reporting. He was also unsual as a player in that 90s era of English football, being a skilful player with composure under pressure.
I enjoyed some of the memories and stories, and his charity work and hospital visits were genuinely inspiring.
Do you do AI?
Do you do AI and machine learning? I am an academic and get asked this question; but I am never sure how to answer and so perhaps answer rather hesitantly and mumble. I thought I'd write an answer here instead.
Besides physics, my interests for over a decade have included the foundations of statistics, the foundations of science, statistical computation and computing & technology. That started perhaps as early as my Ph.D - where I applied Bayesian methods to models in particle physics - or perhaps as early as my undergraduate degree - where I thought about applying crude Markov Chain Monte Carlo algorithms to a simplified supersymmetric model - or perhaps as early as high-school - where a maths teacher ran a Monte Carlo simulation over the course of an hour, in real-time, to demonstrate the central limit theorem - or maybe primary school - where I remember puzzling about the probability of outcomes of a football match (two teams; 50-50? but surely not. A paradox to me then.)
Yes, then, I am interested in AI and machine learning, and have many questions surrounding it. Regarding my more philosophical interests in the foundations of science and statistics, e.g., what are the foundational principles of AI and machine learning? Is it truly a novel way of learning from data? Why is it so focused on prediction versus inference? Is that what distinguishses it from other fields? Is it a new field or an intersection of existing ones?
On the computational side, can machine-learning methods improve on traditional methods of statistical computation? What should be their relationship to e.g. MCMC? Should they replace it? or operate as a subalgorithm for efficient proposals? Do machine learning methods for e.g., density estimation, truly evade the curse of dimensionality? If not how does it manifest?
On a more technical and personal level, e.g., do I understand modern neutral network architectures and attention transformers? Is there a simple statistical model that demonstrates double descent? Do I grok LLMs? (Do I grok how a calculator works!? If not why do I expect to grok LLMs?)
On a political or sociological level, e.g., are LLMs a 'good thing'? Do we have a choice about the way technology impacts society and our future? What do we want our relationship with AI to be?
Is this what anyone wanted to hear? No. I'm rather afraid that it isn't. I've found that the question really means, have you embraced AI and machine learning? Are you part of an AI maximalist future where we press buttons and boost productivity? Which is why I mumble.
Tags: ai
Be still my beating Hearts...
Some people say that to find out which football team you should support, you should take a ruler and a map, and find the closest professional club to your place of birth. If I did that, I think I'd get Hearts or Hibs. I've always found the name Heart of Midlothian alluring and charming, and unusually celebral for a football club, being named after a Walter Scott novel.
Now that I've shown you my credentials, I can say that I am thoroughly enjoying the title race for the Sottish Premier League and hoping that Hearts can win. I don't have them as favourites; I'm worried about Celtic's know-how in getting over the line. I think it might go down to the final round, perhaps even the final kick.
Tags: football
Delving into focal words on inSPIRE HEP
You've probably noticed that large language models (LLMs) have favourite words that they use more often than human writers. These are known as focal words and the phenomena of focal words is non-trivial. 2412.11385 call it 'the puzzle of lexical overrepresentation'.
I thought I'd check out the appearance of a focal word in the high-energy physics literature by querying the inSPIRE HEP database. I use the 'fulltext' search and looked at the word 'delve'. I think this does some kind of stemming so that, e.g., 'delve' also matches 'delving'. I normalized the results to the total number of papers per year. The results are:
Of course, authors could be influenced by LLMs or imitating 'good' writing produced by LLMs. I don't know much about this field. Make of it what you will.
I can understand the spike, but I'm not sure why it decreased back down in 2025. Perhaps LLMs have evolved and 'delve' isn't such a common focal word anymore? Perhaps writers are conscious about hallmarks of LLMs in their work and edit instances of 'delve'? Perhaps 'delve' was a buzzword that entered popular consciousness because of LLMs?