Part V

Use, Improve, and Compose Your Agents

Building and improving your agents is a never ending task. Like software, you can always build more agents, or work to make them better.

After you've worked through Part 4 and built your first agent, you've got plenty to do next:

Execute on the rest of your roadmap
Extend your roadmap to include all the automatable pieces you didn't initially include
If you're willing to build them yourself, fill the quality MCP server gaps that remain

Even after all that, there are two ways to really level up your agentic flows: agent performance reviews and agent composability.

Your agents need performance reviews too

When you hire an employee, you expect them to get better over time. In many cases, they won't get better unless you give them feedback.

You should design your agents so that they can work the same way. Doing this effectively gives you a durable advantage over somebody who has not. If you use your agent every day, improving it by 1% each time; you'll eventually be using an agent that is 10x more efficient than something anyone could possibly recreate in a day.

It's also surprisingly easy to let rather broken behavior fly under the radar for a long time, because agents can be rather good at self-correcting or making efficient logical leaps: but often in undesirable ways! For example, if I give my agent a broken MCP server that is meant to fetch and summarize web URL's: the agent may attempt to use the tool, fail to receive a result, and then determine "I can guess at what content is in the URL based on the URL itself". It may then proceed to continue on its work - and you will never know it skipped that critical step, and has been skipping it every time you run your workflow.

There are several ways to hedge against these pitfalls.

Tactic 1: Just watch the conversation happen

In its most primitive form, you can start to provide feedback by simply watching the Goose chat. As it winds its way through the work, is it doing what you expected? Is it making any wrong turns?

If yes: add adjustments to your prompts like this:

#!/bin/bash

# Start a new Goose session with verbose logging
goose session start --verbose --log-level debug

# Run our GitHub Sourcer recipe
goose recipe run ./recipes/sourcer_github.yml

# Watch the conversation happen in real-time
tail -f ~/.config/goose/logs/session_$(date +%Y%m%d).log

We noticed that the GitHub MCP server was not providing metadata like "this Issue was converted to a new Discussion", and instead interpreting that scenario as "this Issue was closed as completed". So this prompt tweak helped it understand that edge case better and make better downstream decisions as a result.

Tactic 2: Review raw log files

The chat UI can be difficult to read and parse through. Luckily, Goose outputs all of its raw information into log files on your system. When you look at them, they are very hard to understand at a glance:

But they are very well-structured, and so it's actually quite easy to get Goose to create you a pretty user interface for analyzing them:

Hey Goose: I have these AI agent log files in ~/.local/state/goose/logs/server that are hard to read. Can you build me a simple web server with a very pretty UI where I can click around and parse through these as a human?

And you'll get something much better, like this:

After you've finished running through an agent session using Goose, go find those log files and assess how it went.

Or even: write a Goose prompt to do it for you!

Tactic 3: Use a dedicated observability software tool

While getting Goose to create a primitive UI is not a bad solution to get started, there are much more robust, well-maintained options out there that can similarly hook into your Goose log files and be used in much the same way. For example, try out Langfuse.

You can apply that software in much the same way you would the raw log files above.

Composing your agents is the holy grail

In many cases throughout our agent architecture, we're manually clicking around and "starting an agent", "evaluating its output," then "starting another agent," and so on. Sometimes this agent-wrangling overhead can result in taking more time to complete a process than it would have if we had just sat down to do the manual process ourselves end-to-end.

But this is a temporary state of being. It's necessary while you are still refining your agents, and building confidence that they do what you want. You want to stay close to the inputs, the outputs, the scratchpads, and the logfiles. You want to catch the bugs and tweak the prompts.

Eventually, you'll find that there aren't many bugs to catch or prompts to tweak. The value of every tweak will become a nice-to-have.

You'll get to a point where you are ready to let your Goose fly free.

And that's when composing your agents gets very interesting.

Instead of a recipe for each of GitHub Sourcer, Reddit Sourcer, Discord Sourcer, and so on: we'll eventually compose our agents into a single Sourcer agent. We'll kick off that Sourcer recipe, and a few hours later we'll have hundreds of markdown files summarizing all the data that happened across all the resources, meticulously organized to our standards.

And we can compose again and again, all the way up until the agent boundaries where we need to inject our necessarily human elements.

That's the holy grail: a zero-interventions-needed workflow that you can fire off in the morning, check in at the boundaries to inject your expertise, and ship by lunchtime.

Putting It All Together →