2026-03-03 07:41 - CodeWithLLM

Two years ago, programming models behaved like a genie — you’d ask them for something, and they’d do it technically correctly but with a catch. To combat this, many "harnesses" (wrappers) were devised. Apps like Cursor were pioneers in exploring how to do this effectively.

2026 models have become significantly more obedient, so, as I wrote earlier, the AGENTS.md file is no longer as critical. Another recent example is Vercel, which removed 80% of specialized tools from its internal text-to-SQL agent, leaving only a single "execute bash" in a sandbox (https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools).

We are learning to simplify the architectures we over-engineered over the past two years, using minimal tools to avoid hindering powerful models.

NxCode Team on AI Agent Operations
https://www.nxcode.io/resources/news/harness-engineering-complete-guide-ai-agent-codex-2026
Explains the harness as a "bridle + saddle + reins" for a powerful but uncontrolled "horse" (the model). An example is LangChain, which boosted a coding agent from 52.8% to 66.5% on Terminal Bench without changing the model—only through middleware (self-verification, loop detection, context mapping).

Agents fail not because of model quality, but because of a poor harness.

It’s important to add that an ideal harness won't save a weak model.

OpenAI on Harness Engineering
https://openai.com/index/harness-engineering/
They state that in the world of agents, the engineer's role is shifting from "writing code" to "managing the environment," where humans steer the direction and agents execute.

The most important thing now is not just a high-quality model, but the environment:
– A structured docs/ folder as the single source of truth,
– A short AGENTS.md (~100 lines) instead of a massive prompt,
– Mechanical linters + CI that check invariants (architecture rules, naming, file size, etc.),
– A "doc-gardening" agent that automatically fixes outdated documentation.

A single Codex run can last up to 6 hours (often overnight). Therefore, it’s better to have all knowledge contained within the repository (versioned artifacts). No external chats or verbal discussions.

Discussion on HN about Harness Engineering
https://news.ycombinator.com/item?id=46988596
Can Bölük (author of https://github.com/can1357/oh-my-pi) took 16 different LLM models and ran them twice on the same benchmark for fixing real bugs in a React app. He changed only one tool—the file editing format. Instead of apply_patch / str_replace, he introduced Hashline (each line gets a short hash, and the model edits by hash rather than text). From this change alone, 14 out of 16 models improved their results.

The primary skill for an IT developer now is designing the harness, not writing code manually. Many confirm that hash-line gives agents a significant boost.

Conspiracy theory: "Companies intentionally keep the best harnesses secret to avoid decreasing token consumption." In recent weeks, Anthropic and Google have been banning custom harnesses; even the post's author was cut off from Gemini during his benchmark.

#harness

2026

2025

2024