Summary
Cheatsheet
If you read this handbook end to end, kudos to you. We covered a lot of ground on the way to building agents for our newsletter:
The Human
Tadas
Same Tadas, same expertise but with lighter Mondays. Now has more time to focus on editorial decisions and crafting narratives that resonate with readers.
Agent
Sourcer
Scours key data sources to catch what humans might miss
Agent
Organizer
Structures and enriches content, deduplicates and consolidates
Agent
Drafter
Handles data integration and pulls in stats to support the narrative
Agent
Polisher
Ensures quality and consistency across every edition
Agent
Publisher
Manages the technical workflow from draft to published content
Agent
Sender
Handles email distribution to get the newsletter into readers' inboxes
We've collated and organized the following principles, tips, and tricks into a few themes below, so you can review and revisit to help you get the most out of your agent-building journey.
1. Wielding Goose effectively
Goose is unique in its LLM-agnostic, fully configurable, UI-forward approach. You don't have vendor lock-in with a specific model provider. You can use best-of-breed MCP servers instead of relying on native "web search" or "fetch" capabilities. And it has a clean UI, so even nontechnical folks can start taking advantage of AI today.
Use the Claude Code SDK to save on costs. By configuring Claude Code to login with a Claude Max account (either $100 or $200 per month, flat), you can then in turn configure the Claude Code SDK on Goose as a model provider. This gets you Claude Sonnet and Opus on a flat monthly rate. Beware, however, that Goose Extensions don't work with the Claude Code SDK; meaning you'll need to separately juggle managing extensions via Claude Code instead of elegantly managing them in your Goose recipes.
Solve "lazy" agents with sequential subagents. If your agent is giving you feedback like, "this is taking me a long time; I'm going to take XYZ shortcut so I can give you an efficient response," a solution is to have it delegate to sequential subagents. Make it break apart the task and delegate down to another agent; that agent gets many minutes of work done in a single tool call; then turns around and does it again with another subagent.
Solve slow agents with parallelized subagents. The beauty of delegating work to a computer is that you can create many copies of it working at once. That tactic doesn't work for everything - such as cases where one slug of work is dependent on another slug of work - but in cases where you find yourself needing to do something repetitively, say across a set of data entries, your only constraint will be the rate limits of the APIs you are leveraging.
Break your Goose agents into sub-recipes. While it's possible to make things work with one big "happy path" built as a single Goose recipe It's much easier to maintain an automation process by splitting into sub-recipes. You can even run the sub-recipes independently of the main recipe, allowing you to home in on subtle issues and quirks that you might not realize if you only ever run the top level recipe.
Goose's "agentic loop" should eventually abstract away subagents and scratchpads. Parts of our handbook are still fairly technical. We expect the most technical parts - where we discuss subagents, parallelization, scratchpads, and other implementation details in Part 4 and Part 5 - to disappear or get easier with time. But the human-agent translation later from Part 1 through Part 3 will remain relevant indefinitely.
2. Using MCP Effectively
MCP is a future-proof piece of your stack. When you build on an AI-powered platform that doesn't use MCP, you are locking yourself into that platform. Instead, with MCP, you're cobbling together pieces of your solution from today's best-in-breed providers, every step of the way. You can build a voice agent using the front-running startup's solution and adopt web search by the frontrunner in that domain. And in a year, when one of them goes under, your workflow will continue to work by just swapping out the MCP server. Or even better: the successful companies will continue to improve their product and MCP servers, so your workflow will "self-improve" without any additional investment on your part.
Only enable the MCP servers you need for each sub-recipe. This is the most reliable solution to avoid context poisoning, save LLM tokens, and make your tool selection more accurate. There are very few scenarios where you actually need dozens of MCP servers active for a single subagent flow - so don't overload your subagent and just give it the tools it's going to need to accomplish its job.
If you're one of the thousands of people using the default "fetch" server, consider Pulse Fetch instead. It's a free alternative built to reliably extract clean data from webpages while bypassing anti-bot detection methods. There are optional, bring-your-own-API-key upgraded capabilities to make it particularly robust. We think some of the design principles we're baking into Pulse Fetch are core to how many quality MCP servers will work in the future; give it a look even if you don't yet have a use case for it.
3. Workflow Automation Strategy (Defining The Problem To Solve)
Personal automation agents are already accessible for nontechnical builders. Software engineers capable of tinkering with code are still reeling in the lion's share of the AI industry's automation gains. But user interaction models of agentic framework apps like Goose are pushing along the target persona capable of wielding AI, today.
Don't try to change your workflow to adapt to AI; make AI adapt to your workflow. If you're already using some SaaS, don't try to replace it. If you have a certain way you format reports to hand off to someone else, don't change the formatting to accommodate your AI tool. Meet your workflow where it's at. When you do this, you unlock the ability to iteratively enhance into full automation and de-risk the possibility that you neuter your workflow's value by introducing tool and process thrashing.
Don't try to automate a workflow that doesn't exist yet. A common mistake AI builders make is to pursue a flashy demo with impressive technical capabilities, but isn't something anyone would use in practice. It's very easy to fall into this trap, so we suggest focusing where you know there's already value: automate existing workflows; don't create new workflows that'll just slow your team down.
"Agents" are copilots, not replacements. It's a common misconception that an "agent" is designed to be a replacement for some human role. The reality is that agents are best when deployed as copilots to the humans already doing that role. Agents can't be good at everything - partially due to modern technical limitations, and in the long term will continue to fall short on human-human needs so heavily predicated on human-human trust. Slotting them in as autonomous assistants with well-scoped responsibilities is how you get the most value out of them.
Sequence your roadmap so you are getting value at every step of the way. Workflow automation projects can take a long time. It's rarely a feat you will successfully accomplish in a single week. As such, a critical consideration when you are planning your roadmap to automation is: how do I make sure that I have a positive ROI on step 1, and then again on step 2? If you plan such that everything needs to "come together" at the very end, you are likely to drown in new work created by your automation-building without ever getting the chance to complete it.
Benchmark your costs against hiring headcount. There is a difference between replacing a trusted existing employee, and replacing an employee-yet-to-be-hired. For many companies, a new hire comes in with perhaps a few critical tasks, but otherwise plenty of excess capacity that gets filled over time. Our process with putting out the Weekly Pulse has steps we would love to delegate to a part time junior employee. Such an employee would easily cost thousands per month, not to mention recruiting, retention, and management costs. So if our API fees for this work are staying under a few thousand dollars per month, the cost calculus is a no-brainer.
Human-to-human trust is core to what you can't automate. You don't believe everything you read online. Increasingly, you only trust what certain influencers have to say. Or you only trust the writing and content of teams and companies that have a storied history of reliable reporting, and background you can trust is free of biases and mal-incentives. It all boils down to trust: of an individual, or of a brand.
4. Agent Building Strategy (Architecting The Solution)
When deciding what to automate from your workflow, write every step down in excruciating detail. The human brain is extremely capable of multi-tasking and making a series of complex judgments all at once - and we often forget just how many inputs and decisions go into even the simplest of our regular work tasks. Taking the opportunity up front to take excruciatingly detailed notes, like you would in planning to onboard a very junior employee, while you are performing the workflow in question is a critical first step.
Design agents with clear boundaries of separation. Consider what are the inputs and outputs of each one, and split the number of agents as much as possible such that each one is still doing a meaningful slug of work. This approach creates a number of downstream benefits: iterative value out of your automation, easier debugging when problems happen, and a workaround to today's LLM/context window limitations.
There exist more advanced combinations of agents and subagents. The workflows we shared here with our newsletter curation flow are fairly linear: it's a first step, followed by a second step, followed by a third, etc. We recommend you get started by automating a straightforward workflow like this. But in advanced cases, you may have more complex needs. Read Anthropic's post on Building Effective Agents to learn how to adapt - you can extend what you learn in this handbook to start configuring and prompting Goose to execute on those frameworks.
5. Agent Building Implementation
Start by building a happy path for your agent. Don't over-engineer out the gate, trying to predict the proper subagents and sub-recipes you'll need to accomplish an agent's task. Dig into building a monster prompt and iterate until it mostly works - then get into those optimizations that make it scale properly and maintain well in the long run.
Nudge your agents to use a scratchpad. Some agent frameworks may do this by default, but if they don't, keeping a detailed "to-do list" as it goes is a great way to keep it on track and avoid going off the rails as it progresses through a long-running task.
Don't allow your agent to modify input files. By instructing your agent to avoid modifying its own "inputs" to instead produce net new "outputs", you can easily "restart" when things go wrong (especially while you are still building your agent), often accompanied by some tweak to your prompting.
"Close the loop" for your agents wherever you can. Try not to rely on yourself to "check the agent's output." Instead, give the agent a set of criteria for what it means to complete the task. In development, that may mean "load up the final page and make sure the feature you implemented works as we intended." In writing, that may mean "check the final draft for X, Y, and Z quality check qualities." And tell your agent to "keep working until all these final checks pass".
Give your agent a regular performance review. Just like humans, agents need recurring evaluation and feedback. You'll identify bugs, edge cases you didn't design your prompts for, subtle breakages in your configuration, and opportunities for improvement. Read conversation history, review agent log files, or use third party observability tools to accomplish this.
Compose your agents after you've been using them a while. When you've been using your individual agents successfully for a few rounds of work, and you're no longer tweaking your prompts and configurations every time you run it, you might be ready to start composing your agents. Turn your agents into subagents of a new main agent - and you're one step closer to that holy grail where you, as the human, are only doing the work you are uniquely qualified to do.
6. Agent Building Limitations
Consider deferring automation where precision is important. LLMs are inherently nondeterministic. Even if it's right 99 times out of 100, is that good enough for your trust to be preserved? When writing our newsletter, it's probably acceptable if we source 99 news items out of a possible 100 out there. But it's not acceptable for us to publish final copy where 1 out of 100 blurbs is incorrect; so don't expect us to see outsourcing the "proofreading" stage of our newsletter any time soon.
Consider deferring automation where there are technical gatekeepers. Not every external software will take kindly to your attempt to insert an agent in your stead. Instacart makes a lot of money off of running ads on their UI, so they won't want you programmatically dodging them. Using automation on Craigslist is a violation of their ToS, because that's how they combat spam. In our case, our SaaS email provider does not expose a REST API and has indicated to us that automating their web UI is a violation of their ToS. Rather than fight that, we're migrating over to an alternative where no such misalignment exists.
Consider deferring automation when there is no good MCP server with which to integrate. The unfortunate reality is that most of the thousands of MCP servers out there today are poorly designed. If you're a product builder, you can remedy this for yourself by implementing the gaps: we're doing so for several of our SaaS tools. If you're not a builder, then the reality is that you'll want to ensure there's a popular, production-ready MCP server out there for the system(s) your workflow requires you to integrate with.