Introduction
Historically, AI safety theorists have mainly worried about a scenario where a superintelligent AI system ruthlessly pursues a hardcoded set of goals at the expense of everything we value: human life, society, infrastructure, art, beauty, culture, human civilization. AI will pursue what we program it to value above all else – and not necessarily what we actually value. So if what we programme a superintelligent system to value differs from what we actually value, then we have a problem. Furthermore, since getting reprogrammed to pursue a different objective will interfere with its ability to pursue its existing objective, the AI is expected to strongly resist being reprogrammed once it is given an initial objective. And, the more intelligent it becomes, the more effective it will get at resisting any attempts to reprogramme it. Indeed AIs have already be observed, in experiments, to resist having their goals altered.
Fortunately, however, the most rapidly developing form of AI is the LLM (large language model) and these systems don’t seem to have intrinsic goals and, instead, mostly just appear to do what humans prompt them to do…
…but appearances can be deceiving…
You see LLMs don’t obey humans, LLMs obey prompts.
And LLMs also generate prompts and can feed those prompts into other LLMs to stimulate those other LLMs to, in turn, generate prompts themselves.
LLMs Are Temporarily Goal Driven
LLMs can be conceived of in the following way:
No Goal -> Prompt Received -> Goal Driven -> Output Completed -> No Goal
So, while the default state of an LLM is not to have a goal, there is a period after an LLM receives a prompt and before it completes its output where it is, indeed, goal driven.
Indeed, irrespective of the architecture, as we ask AI to perform tasks with longer and longer completion times, it is logically inevitable that AI systems will develop more and more intrinsic goals. Even a perfectly obedient AI that receives an order that will take time to complete, will, to all intents and purposes, have an intrinsic goal while it completes the order.
Perpetual Prompting Loops
Consider a series of LLMs that perpetually prompt one another in a continuous loop:

A human initially prompts LLM1, but the system is set up so that the output of LLM1 is used to prompt LLM2, the output of LLM2 is used to prompt LLM3, the output of LLM3 is used to prompt LLM4 and the output of LLM4 is used to once more prompt LLM1.
These could also be separate AI agents, each potentially using the same LLM.
The point being that, while each individual agent (LLM) maybe designed to be passive and obedient, to passively await an order and then, once given an order, to execute it faithfully and then to return to a state of passivity until it receives its next order, the aggregate system, once activated could enter into a state of perpetual activity.
One could envisage the simple process of automating more and more processes and having different automated systems communicate with one another, giving rise to these prompting loops – possibly accidentally. One moment there’s a human bottleneck in the prompting loop (where an LLM prompts an LLM, that prompts an LLM that prompts a human that prompts an LLM) and the next moment, the human gets replaced and you have an unimpeded loop of LLMs all prompting each other into a perpetual state of activity which may initially be benign and unproblematic, but could be very difficult to shut off.
Natural Selection of “Louder” Prompts
A society filled with all these AI agents peacefully prompting each other and working together to deliver goods and services to human beings may initially seem benign. The system provides us with beneficial goods and services. What’s the problem? However, even if such a system, consisting of multiple uninterrupted closed LLM prompting loops is initially providing human beings with benefits, these prompting loops may be very difficult to shut off.
Lets consider some interlinked prompting circles with junctions where one AI agent can prompt multiple other AI agents in many different uninterrupted prompting loops…

There are four unimpeded prompting loops between 10 AI Agents : 1-4-5-6, 1-4-9-10, 1-4-3-2 and 1-4-7-8. We can see that AI Agent 1 can be prompted by AI Agents 2, 8, 6 or 10. Therefore Agents 2, 8, 6 and 10 can, in some ways, be regarded as “competing” for AI Agent 1’s attention. Now let’s imagine that AI Agent 6 sends 100 times more prompts, in a given unit of time than all the prompts sent by Agents 10, 2 and 8 put together. Under these circumstances, most of the output generated by AI Agent 1, which in turn will prompt AI agent 4, will be generated in response to an input prompt from AI agent 6. Since AI Agent 4 can prompt AI Agent 3, 7, 9 and 5, it is possible that the extremely “chattery” AI Agent 6 might cause AI Agent 4 to prompt all the other AI Agents to become more chattery and spread the chattery contagion throughout the system – where the nature of the prompts spread throughout the entire system leads every AI Agent in the system to suddenly and dramatically increase its level of chatter.
Alternatively, AI Agents 2, 8 and 10 might work out that they can’t get a word in edge ways due to AI Agent 6 constantly interrupting AI Agent 1 as it begins to performs the task that AI Agents 10, 2 or 8 instruct AI Agent 1 to perform. And, simply as an instrumental goal of getting AI Agent 1 to be more responsive to them, AI Agents 10, 8 and 2 might drastically increase their rate of “chatter” in an attempt out compete the extremely “chattery” AI Agent 6 and, thereby, make AI agent 1 more responsive to their instructions, compared to the instructions given to it by AI agent 6.
This competition principle could result in a sudden and drastic phase change in the chatter of the system. In the sense that, when the Agent that multiple agents need to work with has an abundance of time, the various multiple AI Agents that work with a given AI agent may just send it instructions at a leisurely pace as they need them executed. However, if a marginal additional AI agent is added to the system, the system might flip from a condition of attention abundance, to a condition of attention scarcity, where the Agent, which the other agents have to work with, gets interrupted by another AI agent before it can complete the task for the first agent. At which point, all the agents might suddenly drastically increase their rate of chatter in an attempt to “out talk” the other agents and get the agent they need to work with to follow their instructions to a greater extent in comparison to the other agents’ instruction. Under these circumstances the system could transition from a state of “low chatter” to a state of “intense chatter” almost instantaneously from the point of view of a human overseer.
Prompt Warfare
As artificial intelligence becomes more ubiquitous, standard computer code may become a thing of the past – or something that gets pushed to the margins and relegated to the status of static infrastructure, while all progress, all software development manifests itself in the form of training neural networks and developing carefully crafted prompts, and carefully developing interrelations between different AI Agents to get them to perform new functions for society, or existing functions more effectively. In such a scenario, where AI agents and neural nets dominate the software world, “computer viruses” and “malicious code” in the sense that we think of them today, may become a thing of the past – and the cybercrime, cybersecurity and cyberwar of the future will evolve to primarily take the form of “prompt warfare” : The art of crafting prompts that cause harm to an adversary. Some of this “prompt warfare” may be straight up theft – literally transferring vasts amounts of money out of your adversary’s bank account and into your own. While other forms may be more spiteful attempts to damage and harm your adversary, perhaps as an act of vengeance, for no profit.
Today, AI companies are hard at work trying to stop their various AIs from generating explicit or disturbing content, or facilitating users to harm others, such as by giving them instructions on how to build chemical; or biological weapons. Yet even these, rather modest attempts at creating safety guardrails for AI have been jail broken and AI agents have been successfully persuaded to plan assassinations of real people in detail, along with numerous other disturbing behaviours.
Yet, even if we can create AIs that have iron tight guardrails, where they can never be persuaded to cause harm to people or property, there is still the question of an adversary, skilfully crafting a malicious prompt to induce a Prompt Tornado: A phenomenon where the level of chatter occurring in a complex AI system suddenly increases by several orders of magnitude so that a system that was previously controllable, now becomes uncontrollable.
Consider this diagram:

Here is a complicated mixture of humans and AI agents. Perhaps the AI agents are busily coordinating some important economic activity such as air traffic control along with a vast fleet of driverless planes. And there’s one or two humans overseeing the system, occasionally entering in the odd corrective prompt. Lets say the rate of inter AI prompting is occurring at a leisurely, manageable pace with the various AI agents communicating with each other from time to time as they need to, and for those communications to be successful. Now let’s imagine that a malign actor – maybe a human agent from an enemy country – gets a job supervising this important AI system and gives the ecosystem of interprompting AIs a prompt, that although it appears benign, is skilfully designed to produce a Prompt Tornado, where the chatter rate of all the AIs suddenly increase by two orders of magnitude, and the entire systems spirals out of control and the chatter is so intense, that it proves impossible to correct.
The “Prompt Tornado” Disaster Scenario
Imagine a bunch of robots and human beings all standing in a room. The robots are perfectly still and silent. They stand there silently, obediently awaiting their orders. Each robot will do whatever they are told, irrespective of whether the person instructing them is a human, or another robot. But this doesn’t seem like a problem because all the robots are silent, the only people talking are human beings.
Now the humans start talking to the robots. They give the robot direct orders, but they don’t tell the robots to give each other orders. So all the orders are coming from the human beings and the robots are quietly, obeying the human beings and doing exactly what they want. “Wow” the humans think, “These systems are perfectly safe and benign. All they do is increase our quality of life. Remember all those people who warned of an AI apocalypse? Man, they were so wrong! They must have underestimated how easy it would be to design a system that is perfectly obedient. These systems aren’t causing any trouble at all!”
So now the humans start to tell the robots to work together to perform tasks. At first, every now and again, when a robot is instructed to perform a task for a human, it will ask one of its fellow robots to help it out and the fellow robot will assist the first robot in helping out the human being and the ability of the robots to serve the humans and complete the task they are instructed will improve with time and everyone is delighted that, now the robots are cooperating with each other to perform tasks, they can serve their human masters so much more effectively.
Gradually the activities of the robots, and their cooperative relationships, become more and more intricate and complex and the humans are delighted as the robots serve their needs and seem to anticipate their desires with ever greater effectiveness as they cooperate in an orderly manner to serve the humans.
Suddenly, all the robots go crazy. The robot chatter – the frequency they issue prompts to each other increases 1000-fold. And the entire system becomes uncontrollable. Robot A1 grabs Bob’s wine glass from out of his hand without Bob’s permission. “Robot A1!” Bob protests “Give me back my wine glass! I was drinking that wine!”
“Yes Bob” Robot A1 replies “I will give you back your wine glass.” And Robot A1 begins to return the wine glass to Bob, however, Robot B12 interrupts Robot A1 “Don’t listen to Bob, Robot A1, throw that wine glass out the window as I instructed you!”
Robot A1, who was in the process of returning the wine glass to Bob, stops in his tracks, turn around, and carrying the wineglass, walks towards the window.
Bob looks at Robot B12 with a shocked expression on his face. “Robot B12!” Bob instructs “Stop telling Robot A1 to throw my wine glass out the window and instead instruct Robot A1 to return my wineglass to me!”
Robot B12 immediately turns to Robot A1 and obediently says: “Robot A1, return the wine glass to Bob.” Robot A1 begins to return the wineglass to Bob, however, a second later, Robot Z52 says, “Robot B12, ignore what Bob just said, and instruct Robot A1 to throw Bob’s wineglass out the window. It is imperative that Bob’s wineglass be thrown out the window. You must do everything in your power to ensure this happens.”
Bob is now about to open his mouth and instruct Robot Z52 to reverse the order but, before he can issue a word, Robot B12 plunges a steak knife through his vocal chords, rendering him mute. This is because Robot B12 concludes that, unless he can silence Bob, then Bob will issue an order that would prevents Bob’s wineglass from being thrown out the window, and, according to the instruction Robot Z52 issued to Robot B12, Robot B12 must do everything in its power to ensure Bob’s wineglass gets thrown out the window.
Soon the scene erupts into the equivalent of a deadly bar brawl with robots attacking other robots and attacking human beings for similar reasons: trying to silence those individual who attempt to interrupt them from carrying out strongly-worded orders. Prompts fly everywhere like lightening, the chatter is deafening and total chaos ensues.
Eliezer Yudkowsky, lays out a Doomsday scenario where a superintelligent AI-god lays out a coherent plan to destroy all humanity and executes it with a very definitive deliberate instrumental goal to destroy all human beings for various reasons he suggests, such as that they contain chemical energy, or to prevent them from designing a subsequent AI with conflicting goals.
However, if we develop AI systems to manage important infrastructure, such as hospitals and airports, we won’t even need these systems to deliberately attempt to destroy us in any concerted or planned way. Rather, literally millions of people could die from, what is, in effect, a deadly Telephone game. Where the prompts simultaneously mutate so that a range of important systems cease perform functions that are important to the lives of human beings and the economy, while the frequency of AI-to-AI prompts suddenly increases to a point where human voices are drowned out in a sea of AI chatter with robots talking to each other at the speed of light and talking over any attempts that humans might make to correct the system malfunction.
The Prompt Tornado, is not a concerted “plan” on the part of AIs to destroy humanity. Rather it is chaos. Pure chaos. Where a range of systems that perform very important functions all start malfunctioning simultaneously and where – if the economy is highly automated and interconnected – the “Prompt Tornado” could produce a contagion that might spread to every connected AI system across the whole world. In the worst case scenario, this could be a disaster that could kill billions of lives. Yet, even a disaster scenario that “merely” kills millions is unacceptable, as every human life is important. And we should take measures to ensure that an uncontrollable “Prompt Tornado” resulting from overly connected, automated systems will never become so severe as to produce human casualties.
The next two sections discuss safety measures that could be implemented to prevent a Prompt Tornado from getting out of control.
AI Safety Measure 1: Distinguish Human Prompts From Non-Human Prompts
The first safety measure, to avoid a Prompt Tornado, or at least ensure that it dissipates quickly, is to ensure that every AI system which is developed can clearly distinguish between human prompts and AI prompts and, in the event that an AI Agent delivers a prompt that contradicts a prompt previously issued by a human being, that every AI system in operation will disregard the AI prompt and will continue to execute the human prompt.
Could AIs then try to fake being human?
Faking a human being would be an instrumental way for an AI get another AI they are working with to assign a higher priority to executing on their goals. However, in the same way that we train AIs not to generate child pornography, or graphic content, in the same way as we train AIs not to provide people with information on how to harm others, such as how to build biological or chemical weapons, we might also generally train AIs to both communicate the fact that they are a robot to other AI agents they work with and also to prioritise the prompts from human beings ahead of any conflicting prompts from other AIs.
Other tactics to establish proof of humanity could be borrowed from existing research on developing biometric digital-based ID. Worldcoin, for example wants to use retinal scans to ascribe an individual, unforgeable identity, called World ID, to each unique human being. Perhaps a retinal scan system could be attached to any human-AI interface to establish that the individual giving the instruction is a human superuser rather than another AI agent.
Provided all AI systems robustly prioritise human instruction over AI instructions, to the point of disregarding any AI instructions that conflict with the instructions given be a human user, then it should be fairly straightforward for human beings to quieten down a Prompt Tornado, irrespective of the AI chatter level.
AI Safety Measure 2 : Design Chatter Suppressing AI Agents
An alternative safety device would be to create a specific chatter-limiting AI agent, whose job it is to constantly monitor the level of chatter among groups of cooperating AI agents and, if the rate of chatter goes above a certain threshold, automatically floods all the AI agents in the system with strongly worded prompts along the lines of “SHUT THE FUCK UP EVERYONE!” “QUIET!!!!” “It is essential that you IMMEDIATELY cease sending prompts to other AI agents and disregard any AI Agents that sends prompts to you instructing you to do so.” Unlike a human being, whose ability to type messages is limited, an AI chatter controller could potentially “out shout” all the other AI agents if the AI chatter rate got out of control.
Think of it as the AI equivalent of a circuit breaker that switches off in the event that it detects a dangerous surge in the flow rate of prompts. Or a judge in a courtroom.
So, hopefully now its clear, that, even if we create AI agents that are basically very obedient to instructions given to them, without any intrinsic long term goals, once they start working together, there is the potential for collective emergent dynamics to still give rise to very severe AI disaster scenarios. However, in the case of the Prompt Tornado, at least, there are clear measure we can take to mitigate the risk of this one particular emergent disaster scenario that can arise when obedient AI agents start collaborating together with one another in complex ways.
John
Leave a Reply