/home/andrew$

Some thoughts on physics, statistics, computing & technology

Craghead Colliery

May 02, 2026 — Andrew Fowlie

Watched Craghead Colliery, a documentary made by the BBC in 1969 about the fate of the Craghead Colliery and community between Newcastle and Durham. The miners performed dirty, dangerous and depressing work to provide coal and power for us. They worked hard for each other and support their families for 50 years in the pit and then often passed away shortly after retirement, suffering from chronic health conditions associated with their work.

The coal ran out, or at least, became expensive to extract. The documentary features speeches from Margaret Castle and Harold Wilson at the Durham Miners' Gala. In my opinion, the Labour Prime Minister Wilson let them down: he was evasive about their future, telling them only that a decision on the future of energy in the North East would be based on technical, economic, as well as social factors. He later buried news about the future of the mines on the day of the Soviet invasion of Czechoslovakia.

I liked Margaret Castle's words though:

I am deeply conscious of the of the tribute that you have paid in allowing me to be here this morning … you are not the ones who theorize about the need for change in Britain you are the ones who have to experience it …… By definition the Labor Party is the party of fundamental change and therefore by definition the task of a Labor government will always be more difficult than that of a Tory one.

Tags: tv

Dwayne Fields

April 26, 2026 — Andrew Fowlie

Listened to Dwayne Field's episode of Desert Island Discs. Dwayne is an explorer, motivational speaker and an ambassador for the Scouts. He had an amazing story, growing up in Jamaica, at first raised by his great grandmother, before moving to the UK aged six. He navigated gangs, crime and social problems in inner London, before a life-changing opportunity to join an expedition to the North Pole. In doing so, he became the first black person to reach the magnetic North Pole.

There were two takeaways from his story stuck with me. First, I loved his statement that as an explorer he was about planting seeds, not flags. That he wasn't interested in claiming firsts and discoveries and successes for himself, as his; he wanted to lead the way, to inspire and support other people. I think this works as a metaphor for what we should try to do in academia. Second, I loved his message that we were all capable of doing and giving so much more than we realized. If he could start where he started, and go to the North Pole, and now inspire other people, what can I do?

Tags: inspirational, podcasts

Nero

April 22, 2026 — Andrew Fowlie

Read and thoroughly enjoyed Nero by Conn Iggulden. This is the first part of a trilogy of historical fiction about ancient Rome. This first part traces the emperors Tiberius, Caligula and Claudius, ending with the Roman conquest of Britain.

I've read other books by Iggulden many years ago, and it's similar, though much less biographical, to Robert Harris' Cicero trilogy. It also reminded me of Eagle of the Ninth by Rosemary Sutcliffe. I remember listening to it on tapes from the local library many, many years ago.

Tags: books

Sincerity

April 20, 2026 — Andrew Fowlie

It is well-known that a major risk of using AI generated text is that it might not be factually correct. Large-language models hallucinate information and then hallucinate references for that information. Thus it can be hard to know what is true.

There is a related problem in written communications that I don't think is as widely discsused. It conerns not truth, but sincerity.

AI generated emails aren't (just) wrong in a factual sense, they are insincere. The writer doesn't mean what they say. For example, if someone asks an LLM:

Write a formal email to Prof. Dog in physics and data science at Kennel University. Say whatever it takes to secure a summer research placement; that I love the field etc. Don't leave any blanks for me to fill.

We obtain from gpt-5.4:

Dear Professor Dog,

I hope this email finds you well.

My name is [Your Name], and I am writing to express my sincere interest in the possibility of joining your research group at Kennel University for a summer research placement in Physics and Data Science.

I have developed a strong passion for this field through my academic studies and independent exploration, and I am particularly inspired by the way physics and data science can be combined to address complex real-world problems. The interdisciplinary nature of this area greatly appeals to me, and I am very eager to deepen my understanding by contributing to active research under your supervision.

I am especially interested in the research being conducted at Kennel University and would be honored to have the opportunity to learn from your expertise. I am a highly motivated, hardworking, and enthusiastic student, and I would be fully committed to making a meaningful contribution to your team during the summer. Beyond my academic interest, I am also eager to develop my research skills, strengthen my analytical thinking, and gain first-hand experience in a rigorous research environment.

I would be truly grateful if you would consider me for any available summer research opportunities in your group. I am very willing to dedicate my time and energy to supporting ongoing projects and to learning whatever is necessary to be useful to your team. It would be an invaluable opportunity for me to work with you and to further my development in Physics and Data Science.

Thank you very much for your time and consideration. I would greatly appreciate any opportunity to discuss this further, and I would be happy to provide any additional information if needed.

Yours sincerely,
[Your Name]

Hey, I said no blanks! Some writers don't even bother filling their name into these things. What am I to make of these emails? Do they mean it? Are they sincere? What's the point of all this? These messages are obviously generated by a LLM; it's transparent to the reader. However, insincere writers can be smarter:

Write a formal email to Prof. Dog in physics and data science at Kennel University. Say whatever it takes to secure a summer research placement; that I love the field etc. Don't leave any blanks for me to fill. Write it in the style of final-year graduate student; good English, but not perfect, with some grammatical errors. Don't use an overly polished generic style.

Again gpt-5.4:

Dear Professor Dog,

I hope you are well. My name is [Your Full Name], and I am currently in my final year of my degree. I am writing to express my very strong interest in the possibility of joining your group at Kennel University for a summer research placement in physics and data science.

I have been following work in this area for some time, and I feel very genuinely excited by the way physics and data science can come together to solve difficult and meaningful problems. What attracts me especially is not only the technical side, but also the possibility to use mathematical and computational methods to understand complex systems in a deeper way. This is the kind of work I want to continue doing in the future, and I would be extremely grateful for the chance to learn in such an environment.

During my studies, I have developed a solid interest in quantitative research, modelling, and data analysis. I have worked on coursework and independent academic tasks that required careful thinking, programming, and interpretation of results. I am comfortable working with data, and I enjoy the process of going from a messy question to something more structured and scientifically useful. I know that research is often difficult and sometimes frustrating, but this is also one reason why I like it, because it pushes me to think harder and improve.

I am especially interested in your work because it sits between strong theoretical ideas and practical analytical methods. This combination is very inspiring to me. I believe a summer placement in your group would help me grow a lot, both technically and academically, and also allow me to contribute with enthusiasm, discipline, and a serious willingness to learn. I may not know everything yet, but I learn quickly, I work hard, and I am not afraid to spend time understanding difficult things properly.

I want to be honest that securing a summer research opportunity like this would mean a great deal to me. I am at a stage where I want to move beyond only studying existing knowledge and begin taking part in producing it, even in a small way. I would be very happy to assist with ongoing research, data work, literature review, coding tasks, simulations, or anything else where I could be useful. I am ready to do the less glamorous work too if that is what is needed, because for me the main thing is to gain real research experience and contribute to the group as much as I can.

Physics and data science is a field I care about deeply, and I can say honestly that I would put my full effort into this opportunity. It would be a privilege to work under your supervision, and I believe the experience would have a strong impact on my academic development and future direction.

Thank you very much for considering my email. I understand you are likely very busy, but I would sincerely appreciate any opportunity to be considered for a summer placement in your group. Please let me know if you would like any further information from me.

Yours sincerely,
[Your Full Name]

Now what am I to do? You are wasting my time and degrading written communication. Stop sending me AI mail.

Tags: ai, communication

An attempt at a zero-shot AI discriminator

April 10, 2026 — Andrew Fowlie

I have become interested in detecting AI generated text. I have been reading about zero-shot methods; methods that don't require and aren't trained on explicit examples of AI and human-written text. These methods have a frequentist, goodness-of-fit flavor to them. One constructs a statistic from the text under consideration, and compares it to an expected distribution were the text AI generated.

Naturally, I wanted to detect AI text in a Bayesian way. My first thought was, when we consider whether a text was AI generated, the alternative is that it was human-generated. But which human! A child? A high-school student? A Nobel-prize winning scientist? A non-native English speaker? My second thought was, which AI! Which language model? Sampled at what temperature? With what system and user prompts?

We should make use of any information we have about the possible origin of the text. For example, suppose we were setting a homework problem for high-school students and wondered whether they copied and pasted our question into an LLM and submitted that as an answer, or whether they wrote an answer in the old-fashioned way. In this case, we would know that the AI prompt was likely to be the exact question we asked, and that the human writer might be a high-school student.

Two models for the generation of the unknown text could thus be:

  • Model 0: Text generated from an unknown LLM with the homework question copied and pasted, and off-the-shelf system prompts and temperature.
  • Model 1: Text generated from a high-school student with a reasonable grasp of the subject matter and strong English language skills.

My weapon of choice in model selection is, as always, the Bayes factor. This requires us to write the likelihood of the text under the two models. My idea was to use LLMs as surrogates for the two models. We usually interact with LLMs by chatting to them: here, we are sampling new text from them. Under the hood, they generate new text by computing the probabilities of parts of text, called tokens. Thus, if you load an LLM model in, e.g., the transformer library in Python, as well as sampling text from an LLM, you can also compute the probability of that LLM generating a sequence of text that you specify from a given prompt.

Thus, I would use LLMs as surrogate models that would approximate my models 0 and 1:

  • Surrogate model 0: Text generated from a particular LLM, prompted by the homework question.
  • Surrogate model 1: Text generated from a particular LLM, prompted by the homework question, and with a system prompt to answer in the manner of a high-school student with a reasonable grasp of the subject matter and strong English language skills.

For a given answer to the homework question, I could compute the probabilities of that answer under these two surrogate models. By taking the ratio of these probabilities, I would have the Bayes factor $$ B = \frac{P(\text{text} \,|\, \text{LLM prompted by homework question)}}{P(\text{text} \,|\, \text{LLM prompted by homework question and system prompt to answer in style of student})} $$ The Bayes factor would tell me the relative probability of the text originating from surrogate model 0 versus surrogate model 1. If the surrogates are reasonable approximations to the real models, it would inform us about the chances that the text came from an AI versus our anticipated type of human writer.

I had an idea that one could tweak the precise system prompts by sampling texts from the two surrogate models until they were in line with our expectations. This is analogous to checking different choices of priors; a recommended part of any Bayesian workflow.

Does this work? The Bayes factor was computationally cheap (cheaper than sampling new text) and easy to compute using the standard transformer library and a freely-available Qwen pretrained LLM model. It appeared to perform well in discriminating quite badly written English text from smooth and polished AI text. Unfortunately, it's easy for adversaries to beat our system by prompting their own LLM to answer in the manner of a student. Furthermore, the resulting Bayes factors were sensitive to arbitrary aspects of the prior. That is, the Bayes factor was sensitive to the exact system prompts that I used in our surrogate models. Changes to the system prompt that appeared innocuous led to huge changes in the resulting Bayes factor. E.g., slight changes to my system prompt could change the probability of the first token significantly, to the point where the whole result changed.

For that reason, I abandoned this idea. The surrogates might appear to well-calibrated and generating acceptable sample text, but they secretly encode strange and arbitrary assumptions about the anticipated tokens. Additionally, I was concerned about the ethics of profiling the human writers in my system prompt.

Tags: ai

Shushan

April 08, 2026 — Andrew Fowlie

Visited Shushan, a village in northwestern Suzhou at the base of a mountain. Thoroughly enjoyed the boardwalk through the terraced tea farms and fruit orchards.

The grace and stamina of the workers picking the tea was inspiring. I've since read that they are growing Biluochun tea, one of the ten great teas of China, and that the mixture of tea gardens and fruit orchards enhances the flavor of the tea.

Tags: travel

Supremacy

April 08, 2026 — Andrew Fowlie

Read and enjoyed Supremacy: AI, ChatGPT and the Race That Will Change the World by Parmy Olson. It tells the story of the development of LLMs and more generally generative AI through the rivalry between Sam Altman and OpenAI, and Demis Hassabis and Deepmind. This isn't a technical book and you won't find explanations about AI here.

Recurring themes are the relationship between AI and big tech, corporate governance structures that ensure safe and ethical AI, and the Faustian bargains and compromises that Altman and Hassabis make with Google and Microsoft. As the race between Altman and Hassabis heated up, they needed compute that only billions of dollars could buy. At which point things changed from 'solve intellgience and then solve everything else' to 'solve intelligence and then create value for shareholders.'

I found the book overly trusting of both Altman and Hassabis and their stated ideals. After all, it mentioned the case of Sam Bankman-Fried. After his convictons, he indicated that his beliefs in charity and altruism were exaggerated. However, on the whole I found it informative and genuinely exciting.

Tags: ai

Zen

April 08, 2026 — Andrew Fowlie

Read Zen and the Art of Motorcycle Maintenance. A remarkable novel, or perhaps three novels in one, interspersed with one another. The ideas explored in the novel about quality and our relationship with technology feel so relevant today. On the whole I found the philosophical arguments put forth persuasive and compelling, apart from a few criticisms about the philosophy of science.

Read two works of children's literature with family. First, Stuart Little. I can't understand why this is a classic of American children's literature. It was alright, but hardly memorable and ended abruptly to me. Second, The Heartwood Hotel. This was a wonderful book, full of imagination and joy, destined to become a classic, I think.

Tags: books