Techniques to improve coding agent velocity

April 3, 2026

This post is a grab bag of techniques to improve engineering velocity that I haven't seen discussed much online. Improving coding agent autonomy is integral to your engineering velocity, and therefore your company's success.¹

Share transcripts across your team

Context about why certain design decisions were made is often available from past session transcripts. As your team scales, most of this context is not readily accessible on your machine. We built an internal tool that uploads past transcripts to a shared MCP server and gave our coding agents a tool to query historical conversations.

Here's an example of this in practice: Sam and I were pairing on a memory leak bug, and I asked Claude Code to read his agent's transcripts to get up to speed.

> What did Sam conclude about the memory leak?

● stash history search "memory leak build() file_cache"

Sam's session from last night (22 events):

Sam identified 10 build() calls/min from _draft_refresh_loop, not webhooks.

Sam fixed GmailClient with lazy init + context managers.

Leak is still active post-fix: 4.5-7.1 MB/min monotonic growth.

● Reading Sam's commits and checking production metrics...

● CalendarClient has no close() in any path — _refresh_stale_drafts creates both clients per draft

Sam's fix covered GmailClient. But CalendarClient has no close() anywhere. Let me re-apply my __del__ fix on top of Sam's commits.

We evaluated the effectiveness of this technique informally by asking Claude Code to fix the same (real) bug several times, and comparing trajectories of agents that have access to transcript sharing against agents that don't. The shared transcripts comprised three previous Claude Code sessions where we had unsuccessfully attempted to fix the bug.

We found that for Claude Code instances without access to transcript sharing, 48.6% of work was wasted re-investigating application-level fixes that earlier sessions had already tried and ruled out.

Claude Code arrives at the fix faster with transcript sharing

Tool calls		272 ~137
Agent turns		123 ~71
Wasted actions		192 ~5

Without shared transcripts With shared transcripts

Figure 1. With access to prior transcripts, the agent could skip past disproven hypotheses: ~50% fewer tool calls, 97% reduction in wasted work.

Remove blockers to autonomous action

Here are good reasons for a coding agent to stop looping:

It's done with the task
It needs clarification on the scope of the task

Here are bad reasons for stopping:

It needs the user to unblock it

A general principle that I've found to be helpful is "if the agent asks me for something that it could theoretically know the answer to, make sure it has access to it next time". Here is a summary of common "preventable" reasons that a coding agent stops prematurely and how we've addressed them:

Reason for stopping	Fix
Finishes designing UI and asks me how it looks	Playwright mcp / chrome devtools mcp so that it can test UI autonomously
Needs access to production resources	Render / neon mcp
Needs to test a "real-world" capability	Give it access to the real world (eg give it an email)
Needs permission to do something dangerous (eg reading env vars)	Use auto mode, prompt it to never ask for permission

Here are some examples of these fixes in practice:

Autonomous UI testing with Playwright MCP — instead of asking "how does this look?", the agent takes screenshots and iterates

Dev server is already running. Let me take a mobile screenshot to see the current state.

● playwright browser_navigate → localhost:3000

● playwright browser_take_screenshot → mobile-before.png

● ...edits CSS...

● playwright browser_take_screenshot → mobile-after.png

Much better. The empty space is gone and the email card demo is now visible on mobile. Let me also check tablet and desktop breakpoints.

● playwright browser_take_screenshot → desktop-after.png

● playwright browser_take_screenshot → tablet-after.png

Production access via Render MCP — instead of asking me to check prod, the agent pulls live metrics and logs directly

Let me check the current memory situation on Render and read the key source files.

● render select_workspace

● render get_metrics → memory_usage, memory_limit

{"metrics": [{"type": "memory_usage", "data": [...]}]}

● render list_logs → filter: "draft_composer"

Real-world testing via Gmail API — instead of asking me to send a test email, the agent does it itself

> Can you send the email for me?

Watch is renewed. No webhook yet — let me send the test email for you.

● ...restarts local server, registers auth session, queries Gmail API...

Returns real inbox emails: "wanna hang tmrw?" from htdowling@gmail.com

There are two test scheduling emails already in your inbox. The webhooks hit the old server and returned 200, but the background task ran on that process.

Auto mode for dangerous operations — instead of asking permission to read env vars, the agent just checks

ANTHROPIC_API_KEY is not in .env. Let me check further.

● Bash env | grep ANTHROPIC_API_KEY | head -c 30

ANTHROPIC_API_KEY=sk-ant-api03...

It's set in your shell environment (not .env). Now adding the note to CLAUDE.md.

Across 738 transcripts, 11.4% ended with a preventable stop. 30% of stops are directly addressable by adding tools.

Coding agents stop prematurely because they need permission and tools

Needs permission		45 stops (54%)
UI feedback		15 (18%)
Prod access		9 (11%)
Other		9 (11%)
Real-world test		4 (5%)
Scope clarification		2 (2%)

Figure 2. Stop reasons across 738 agent sessions. "Needs permission" should be fixed by running auto mode; UI feedback, prod access, and real-world test are fixable by adding additional tools.²

Automatically auditing prod agent transcripts

Most teams evaluate prod agent outputs. Evaluating session transcripts in addition to agent outputs is useful, since coding agents can arrive at a correct output despite legitimate blockers. For example, if you ask an agent to edit a powerpoint, and it's unable to make effective edits to the powerpoint .xml file, it may resort to recreating the powerpoint using html and then converting the html to .pptx.

To facilitate this kind of evaluation, we persist production transcripts and run coding agents that automatically audit every transcript generated by our product and check to see if the agent got blocked or confused. Then it makes a list for us to audit.³

Here's an example of this technique in practice:

> Audit the latest presentation transcript for errors

● a6cd000 — Analyzed main transcript. 18 subagents dispatched in 3 waves, all slides updated with sourced climate data.

● abe224f — Scanned all 18 subagent transcripts. Found 353 tool calls, ~38 real errors: "Shape not found" (16x), "Chart not found" (6x), sibling cascades (7x), MCP timeouts (3x).

At Fergana Labs, we're helping coding agents to share a brain. If this sounds interesting to you, please do reach out and say hi to me at henry@ferganalabs.com.

As a friend of mine once said, "we're all just playing factorio now". ↩
"Tools" including resources etc. ↩
You might wonder why not to close the loop completely and automatically improve the product in this way. The answer is that this can lead to the agent getting confused and suggesting un-useful changes. I'd estimate that sometime this year this issue will disappear and autonomous improvement in this way will be reliable. ↩