I'm Joris "Interface" de Gruyter. Welcome To My

Code Crib

From Text Predictor to Chatbot

Jan 24, 2024
Filed under: #tech #ai

Previously, we talked about LLMs basically being nifty text generators by predicting the next word given a bunch of previous text (“context”). We also found out there’s some by-product of clusters of knowledge hidden in these enormous neural nets of statistical word correlations that are very interesting but we may not really be able to count on. But how does fancy word prediction software get to the point of actually having a coherent back-and-forth conversation?

After doing basic training over a large amount of internet content, the neural net has “learned” sentence structures, some semblance of general knowledge, and even statistically captured which words go together when being polite or angry or sarcastic etc. But now, we train it further but on a completely different data set. This data set is specifically designed to resemble conversations, and is mostly generated or at least individually reviewed by actual humans. This data set isn’t nearly as large, but it’s all about the quality over the quantity. At this point the neural net is what’s called “fine tuned” to understand the structure of Q&A/Conversation/etc. There could be other types of fine-tuning, for example to make a code generator like GitHub Copilot. In such a case the data set could be just examples of a piece of code along with what the expected predicted next set of code lines would be. Although this fine-tuning sets up the neural net to start generating text in a certain way, it doesn’t undo the previously learned word correlations for sentences, some pieces of knowledge etc. Instead, well, it fine-tunes exactly how text is supposed to be generated for the given style of context prompt (a question or instruction).

And so we have a chatbot, as the next word predictor is now mimicking the Q&A conversation style we fine-tuned it on when it generates the next words based on the originally provided context (which can now be a question).

How it is able to follow instructions like “pretend you are…” or “answer in the style of” and is able to predict the next words as if it were actually following instructions is amazing and not exactly understood. But it’s important to recognize the instructions are basically the context and words from which the next words are generated just like before. There is no ghost in the machine taking orders, there’s only statistics and math calculating to generate the next word based on all the previous words. This is where the absolutely MASSIVE scale of these models starts showing these unexpected features and capabilities, and our fine-tuning basically resulted in the “conversation style” format of the generated text.

One key thing we have not talked about yet is why there’s a certain randomness to the text generation. I keep emphasizing statistics, math and text generation. But that would imply a deterministic outcome given the same exact words. The software that runs the calculations through the neural net effectively ends up with a list of words and probabilities. The code that is running the neural net can basically decide to take the highest probable next word, or just pick a lesser one, or basically entirely random. This is sometimes refered to as “temperature” or “creativity” (see Bing Chat) etc.

As I mentioned in a previous article the great Scott Hanselman has given various versions of his talk about AI and text prediction, and in this version of the talk (the link below is skipping straight to it) Scott shows what happens when you start dialing up the randomness (temperature) of picking the next word from the probability list. Again feel free to watch the whole thing, of course, but I’ll send you straight to the only 3 minutes I’d love for you to watch. Enjoy: https://youtu.be/RDVKl-27g9M?t=1344


There is no comment section here, but I would love to hear your thoughts! Get in touch!

Blog Links

Blog Post Collections

Recent Posts