Only Two Methods Needed for Agents -- LLM as a "magic function" & LLM do "function calling"

Link

Generally speaking, an agent means the AI model that can interact with the environment.

But let's look into a little deeper. As I used to seen on Twitter, there were two experiments when ChatGPT was newly introduced:

You are playing like a terminal of a computer. When I type commands, you will give me the response.

I'm playing like a terminal of a computer. When you type commands, I will give you the response.

In practice, I find these two ways are basically all an agent needs to work. Let's take a look at the first one -- in a programming way:

The prompt is:

Then we write a (pesudo) code:

The main idea here is that we treat the LLM as a magic function. It can solve a problem with some unknown magic steps. We just need to give it the input and get the output.

You can find the similar idea and tools in LlamaIndex output parser.

Now let's start from the functions. A function is basically composed of three parts:

Input: you give arguments to the function.

Output: the function gives you the result.

Side Effects: the function may change the environment.

Side effect means something other than output will be produced, during the function call. For example, a function may change the value of a global variable, or print something to the screen. This is an important concept for all programming languages except for functional programming languages.

In the context of AI agent, one of the most important side effects may be from hardwares, which can create physical changes in the real world, resulting in the thing we call "robot".

However, since LLM is simply predicting the next token from the input, it doesn't have any side effects. If we want to build an agent which can interact with the environment, we need to use some methods to let the "magic function" to create side effects. OpenAI introduced the API "function calling" to achieve this goal.

Now let's define a function which can create side effects:

Now let's update the prompt:

The (pesudo) code:

It is still a magic function, but now the code make use of the function calling API to call the function send_haiku_to_twitter, so that it indirectly creates a side effect: Twitter's server recieved a new post.

You can find the tools in LlamaIndex function tools.

Now the magic function have the ability to call other functions, the purpose can be more than the indirect side effects.

It can also use the output of the function call, that can help the agent to gather more information, or to make decisions.

For example, the agent can call a function to get the current weather of the city, and then use the weather information to generate the haiku.

Modify the prompt:

The (pesudo) code:

Now we get a simple agent. It is a magic function without input and output, but it can call other functions to make the side effect to send a haiku to Twitter.

Following the concept of "magic function" and "function calling", we can find a suprising fact: a magic function can call another magic function.

This fact enables unlimited possibilities for the agent, because in this way a agent can command other agents, and again, the other agents can command more agents.

The unlimit controling structure is as powerful as the computer program itself -- do anything if the programmer can write the code.

Let's go one step further to the idea of "magic function".

In the programming world, one of the most powerful ablitity of a function is: Recursion.

If a magic function can call itself, what will happen? In theorey, it can do anything. It can be a general magic function which can solve any task, or so called: AGI.

Let's have a try!

Very promising, isn't it? But in practice, it just not work. The recursion is a powerful tool, but also dangerous. It can easily lead to infinite loops, and the AI model may not be able to handle it well.

The AI just dives in deeper and deeper, to find every definition of every word, never to return to solve the task.

This is a little disappointing, but not suprising. The professionalism and the universality can not be achieved at the same time by a simple recursion, at least for now.

Recently I'm working for NebulaGraph's GenAI team, to explore the possibility of AI together with graph database.

In this article I explained the idea of using only "magic function" and "function calling" to build any agent, which is what I learned from our team's work.

Code in this article is pesudo code, and you can use LlamaIndex to implement the idea in practice. Also some of my code is in AgentPath

Welcome to follow our further progress. You can find me on Twitter, Telegram, and GitHub with links in https://yanli.one

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.