Orchestration and Function Calling

June 17, 2024 — #tech #ai

In From Text Prediction to Action we talked about the concept of using language models to convert between natural language and structured data such as CSV or JSON. The main goal there is that “translating” natural language to JSON means we can use the JSON as input to traditional software. At the end, I mentioned a few frameworks, including Semantic Kernel). Let’s look into what that means.

Just an extra reminder since there’s several months of time between these blog posts, please go check out the previous article From Text Prediction to Action.

In the previous article we looked into giving an LLM a choice of actions and requesting it to tell us which of those actions are relevant for a given user prompt, and providing any input to those actions (‘send an email’, to: John, CC: Joris). Requiring this answer to come into a structured format (JSON) we can then add some traditional code to actually execute the selected action and pass in the given inputs that the LLM identified.

Now imagine we can make this even more complex, where we want to call multiple actions. And the output from one action could be the input to another. In the email example, we may have to lookup email addresses for the names before we can call the email action (although that in particular could just be handled by the email action itself… it’s typically best - read more reliable and cheaper - to do in traditional software what you can, and not incur any unnecessary dependency on AI calls). The concept of figuring out which actions to take, in what order, and what the parameters should be, is referred to as orchestration or planning (I’ve found those terms seem to be interchangeable). This space of orchestration, and doing it reliably, is a core piece to making the power of LLMs very useful.

Now, choosing and calling one action based on a list of available actions is a task the current top-tier LLMs like GPT4 can handle pretty well. In fact, this is exactly the basic concepts with “plugins” that have been implemented in ChatGPT and Bing Copilot. Furthermore, the OpenAI APIs even support this out of the box so you don’t have to do the work of prompting the LLM on what actions from a list to choose. OpenAI maintains that behind-the-scenes magic and you just provide the data and handle the calls. This feature is called Function Calling.

At an even higher level, frameworks such as LangChain and Semantic Kernel can handle this for you as well, with multiple LLM calls done automatically, AND potentially delegating function calling if the LLM API you’re using actually supports it. Additionally they support more complex orchestration too, although the actually success rate varies greatly by your own function descriptions and the capabilities of the backend LLM you’re using.

So, I want to dive into some code to show you just how SIMPLE this can really be when everything we discussed is completely handled for you. I love Semantic Kernel myself, not just because it’s Microsoft but because it supports C#. (Semantic Kernel also supports Java and Python, and LangChain supports JavaScript and Python).

For this demo, I just created a quick C# Console App called “SKDemo”. I add the “Microsoft.SemanticKernel” NuGet package, which will bring in some dependencies including OpenAI libraries to use Azure OpenAI or OpenAI directly as a backend LLM.

Microsoft.SemanticKernel NuGet package

In the Program.cs file we use the KernelBuilder to create a kernel that uses OpenAI’s GPT4 Turbo. You will need to sign up for an OpenAI account and get an API Key to implement this. You will have to add some credit to your account to be able to use the API. Note that every usage will incur a cost, but you will find some testing and running of this demo will likely be a few cents at the most. If you’re worried, your account’s settings allows you to set a “monthly budget” that will reject requests once you’ve hit the budget. It takes a few minutes for credit to be available to call the API. You will see a 429 error (too many requests - not enough quota) until the balance is available.

using Microsoft.SemanticKernel;

IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOpenAIChatCompletion("gpt-4-turbo", apiKey);

Kernel kernel = kernelBuilder.Build();

Task<string?> result = kernel.InvokePromptAsync<string>("What is the capital of France?");
result.Wait();

Console.WriteLine(result.Result);

This simple example should work, and you should see the answer come up (“The capital of France is Paris”).

Ok, so now let’s get to the meat of the demo. We’ll create a new C# class that has a function. We add a description to the method as well as any parameters. This description will be sent to the LLM to decide to use it or what to pass in, so tweaking these will be crucial in more complex examples.

using System.ComponentModel;
using Microsoft.SemanticKernel;

namespace SKDemo
{
    public class WordPlugins
    {
        [KernelFunction, Description("Calculates the length of a word or piece of text")]
        public string StringLength(
            [Description("word or piece of text to calculate the length for")]string wordInput
            )
        {
            return wordInput.Length.ToString();
        }
    }
}

We also decorate the function as a KernelFunction. Next, we want to change our prompt to ask for the capital but also how long that name of that capital is. We need to tell Semantic Kernel about our new plugin function, but also when we call the prompt we will tell Semantic Kernel to go ahead and automatically invoke any functions that the LLM finds. We call the “AddFromType” method which will search the class for decorated KernelFunction methods.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;

IKernelBuilder kernelBuilder = Kernel.CreateBuilder();
kernelBuilder.AddOpenAIChatCompletion("gpt-4-turbo", apiKey);

kernelBuilder.Plugins.AddFromType<WordPlugins>(); // provide our plugin class with descriptions

Kernel kernel = kernelBuilder.Build();

OpenAIPromptExecutionSettings settings = new()
{
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions, // enable the function calling feature
};

Task<string?> result = kernel.InvokePromptAsync<string>("What is the capital of France, and how long is the length of that city's name?", new KernelArguments(settings));
result.Wait();

Console.WriteLine(result.Result);

Now, to make sure our function is actually being called (and not hallucinated) and used in the LLM output you can do a few things. You can put in a breakpoint, of course. You could also skew the result - say by multiplying the length with two. That way you are completely certain that the LLM’s final answer used your function’s output…

The capital of France is Paris, and the length of the city's name is 5 characters.

Let’s remind ourselves that there were several calls to the LLM that happened here. In one call the LLM let semantic kernel which functions it wants to call. SK automatically called those functions, per our settings, and then provided those results back to the LLM. It then formulated the final answer. This is exactly the concept we discussed and tested manually in the previous blog post. This demo shows the abstracted implementation of exactly that.

My final cost for creating this demo with a few test runs, was a full $0.008. This cost will vary based on which model you use as well, and GPT4-Turbo right now is by far the most expensive. This simple demo should easily run on GPT 3.5 Turbo or GPT 4o which cost significantly less.

PRO Tip: You can add the NuGet for Microsoft.Extensions.Logging.Console (or other targets like Application Insights) to get a log of everything that is happening, including tokens used (=metric for your paid usage of the APIs). See Unlock the Power of Telemetry in Semantic Kernel SDK

Simply add:

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

and before you call kernelBuilder.Build() add the following:

using var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());
kernelBuilder.Services.AddSingleton(loggerFactory);

This will output to the console but it’ll give you an idea of what is going on and how many tokens you’re using.