JDML
← All notes

AI Tooling

Block's Goose: an open-source coding agent worth running locally

· 3 min read · By

Goose is Block's open-source, on-machine AI developer agent. It runs locally, connects to your development tools via Model Context Protocol (MCP), and handles the kind of multi-step engineering tasks that most AI assistants fumble: running terminal commands, reading files, writing and running tests, and iterating until something works. It's genuinely different from a chat interface with a code block.

What makes Goose different

Most AI coding tools are glorified autocomplete with a chat panel. Goose is an agent that operates your development environment. It reads your codebase, runs commands, checks test output, and loops until the task is done or it gets stuck and asks for help. The MCP integration means it can be extended to connect to any tool that has an MCP server, which is a growing list.

Performance with recent models

Goose is model-agnostic and the difference between models is noticeable. Recent OpenAI models have run particularly well, handling the tool-use loops cleanly and recovering from errors without needing to be re-prompted. The combination of a capable recent model with Goose's local environment access covers most of the engineering workflow we'd normally handle manually.

Honest caveats

It still requires oversight. An agent that can run arbitrary terminal commands can also make messes that take time to clean up. We run it with version control and a clean working tree before every session. Context management over long tasks is imperfect, and the agent occasionally loops on a problem it should ask about. None of this is fatal, but it's worth knowing going in.

Our take

Goose is the most useful local AI engineering tool we've run regularly. Open-source, extensible via MCP, and genuinely useful for the kind of repetitive engineering work that slows you down. The gap between Goose and a senior engineer on an unfamiliar codebase is still large, but the gap between Goose and no AI tooling is larger. Worth running.

Building something in this space? Let's talk.

We spend a lot of time with these tools. If you're trying to figure out which model fits your workload, we're happy to share what we've learned.

Get in touch