The Software In Charge of AI
Filed under: #tech #ai
In today’s (January 2024) generative AI landscape, which came on in a flash, there’s not much broader understanding about the architecture of the AI software we use. So I wanted to explain why The Great Text Predictor is just a cog in the AI machinery you use today and still is only completing sentences, and not reading your emails or searching the web. That’s the role of the software “controlling” the neural net.
In the early days hype of GPT3.5, I noticed a post from a friend who demoed a prototype he made where he told GPT he wanted a list of items in a certain file format, and then asked it to produce a file based on a web URL for a cake recipe. It was amazing, but only partly so. GPT has a cunning ability to follow that sort of instructions. But unfortunately, it does NOT have any “capabilities” of reading web pages from the internet. The end result my friend got, however impressive, was a result of the hallucination feature. GPT saw a URL that was something like www.recipes.com/chocolate-cake.html and based on those words just came up with a likely recipe of its own and put that in the file. Upon investigation he confirmed the items and quantities he got did not match what was actually on the webpage.
OpenAI and Azure OpenAI provide APIs for software developers to write their own LLM-enabled software. One of the frequently asked questions from developers is: “why does it not remember the conversation like ChatGPT? I asked a follow-up question, but it does not remember anything…”
The result of both those scenarios is basically explained in my first article: the LLM is a statistical calculation. Words are converted into numbers, the numbers are calculated, and converted back into words that come out. Anything else, at all, is just plain old classical software that controls the whole scenario. “The AI” does not read websites. “The AI” does not have a conversation. Instead, software pulls all the text from the website, or software keeps a history of your questions and the AI’s answers. And when you ask your next question, the software will ADD the text from the website, or the previous history of your conversation TO your question. This provides the context upon which the LLM can then complete the text - aka give an answer. (Note: this also means the prompt sent to a GPT chatbot gets longer and longer with history, and as a result also more and more expensive. Eventually pieces of the history will have to be left out since there are limits to the size of a prompt.)
It’s not that AI is “dumb” per se. It’s just that, the neural net is just a calculation. It doesn’t actively do anything. It’s not in charge, it doesn’t take actions. It’s all in the software around the AI. When you ask Bing Chat a question, software will first distill your question into keywords to run through the classic Bing engine. The software then takes one or more of the pages from the search results, and then goes to the LLM with that information and basically asks it to answer your question (aka complete the text) based on the text it retrieved from the search results. An LLM doesn’t know what date or time it is, unless the software that runs it adds that as a precursor context to your question. It’s completely and utterly STATELESS. It’s pre-trained, and doesn’t change other than what it is being told on the spot using the prompt.
That brings us to plugins (quote below from OpenAI website):
Plugins are tools designed specifically for language models with safety as a core principle,
and help ChatGPT access up-to-date information, run computations, or use third-party services.
I remember some announcement title that said: “AI can now read the web”. ChatGPT -the software- can run computations or use third-party services or read the web, yes, but the underlying LLM cannot. What happens is when you ask a question the software will, potentially also by running a question through a call to an LLM (potentially a smaller or cheaper LLM than the rest of the conversation), and attempt to determine what you’re trying to do (this is called ‘intent classification’). One way to do this is run a question to an LLM: given these plugins that we know of (we then list out plugins with descriptions), do any of these help with the user’s question? The LLM will give an answer. The software can then decide to run the plugin and give the output of it as context to another call to an LLM, along with your question. This is somewhat oversimplified of course, but that is basically how it’s done.
So to avoid security problems, we’re looking at classic software security practices, including treating the output of an LLM similar to an action from a user (because it is, it translated a user’s question into something structured that our software can traditionally understand and action on). We will likely some time this year see some headlines like “Hackers convince AI to email them secret documents by sending it a malicious message”. When you do, just remember: it wasn’t “the AI”. It was the classic software, and it’s a classic software vulnerability. The LLM was just the static API call translating instructions from words into pseudo code. And some poor developer made software somewhere that just ran the pseudo code without checking.
There is no comment section here, but I would love to hear your thoughts! Get in touch!
Blog Post Collections
- The LLM Blogs
- Dynamics 365 (AX7) Dev Resources
- Dynamics AX 2012 Dev Resources
- Dynamics AX 2012 ALM/TFS