2024-06-12

Why Agents Fall Short

Why doesn’t your agent work as well as a SWE-agent, even when you use the same LLM? And why is it not as simple to implement?

If you’ve tried building agents with AI, you might have noticed that while a general-purpose agent on the internet seems accurate and simple, implementing it for a more specific use case often results in complications. The agent’s autonomy can be compromised, leading to subpar performance.

Agent Knowledge

The answer lies in the agents and what they know about each other. The more knowledge agents have about each other’s capabilities, the better they can delegate tasks. This is especially crucial when the AI needs to backtrack. Delegation might seem trivial if you have a narrow vision, but it’s extremely challenging. If you get delegation right, it’s like solving a big problem by implementing a small piece, similar to how neural networks are easier to understand when broken down into smaller parts, even though the full solution remains a black box.

Recruitment

How can we ensure all agents are fully aware of each other?

First, we need to determine how many agents we need and who will do what. It’s like hiring for a new organization. Decisions are usually based on budget and popular roles (backend engineer, frontend engineer, engineering manager, etc.).

Since budget isn’t limiting us in choosing our agents (though it can), we can select a few depending on our task. For example, to build an agent-based internal tool, we might have a frontend engineer agent, a backend engineer agent, a component library document retriever agent, etc.

This approach can work well, but there are cases where it causes problems. Even to get a happy path working, you need to glue things together, hard-code a few elements, and you lose the ability to interact with the LLM in natural language. It feels like programming in the pre-LLM era.

Know It All

If we introduce agents to each other, we can choose to share some information about an agent with the other agents. If we share too little, other agents may not know what to ask for or how to ask for it. The ultimate goal should be to share all the information, but this is often not possible due to context length and the needle-in-the-haystack problem.

This works exactly like human teams. If you’re an intern and your manager asks you to open a ticket with another team, that team might say they don’t handle that issue, use a different platform, or aren’t the right team. It would be much easier if you knew what each team does and how they do it. Knowing the “how” helps a lot, as it enables you to use them in all possible ways.

Humans are good at resolving such issues; LLMs are not. LLMs work with the information they have and often don’t know if they require additional information. They pattern match but don’t fill the gaps.

Fill in the Gaps

This is why you need to hard-code certain elements, like looking for a specific pattern in the output, so it can be manually delegated to another agent. You fill in the gaps for the LLMs.

Mono-Agent

You might ask, what is the ultimate solution to this? The ultimate solution is to have just one agent—a mono-agent. This means having a single superhuman employee that knows everything and does everything. There’s no need to share information about other agents because there’s only one, and it knows its own capabilities.

You’ve already seen mono-agentic approaches; the SWE-agent is one such approach. It relies purely on one LLM for all its knowledge, which is why it navigates tasks so well. Add extra knowledge that it needs to know apart from what GPT already knows (for example, using a custom component library), and you’ll have to manually hard-code the agents to fill in the gaps.

This is why anything that requires outside knowledge will require a lot of effort to build and will lose the agent’s autonomy. General-purpose agent solutions work well with general-purpose LLMs. You need a fine-tuned LLM with all your knowledge to make it work well for problems that require outside knowledge, so the agents don’t lose their ability to delegate.