CodeWithLLM-Updates
-

Two years ago, programming models behaved like a genie — you’d ask them for something, and they’d do it technically correctly but with a catch. To combat this, many "harnesses" (wrappers) were devised. Apps like Cursor were pioneers in exploring how to do this effectively.

2026 models have become significantly more obedient, so, as I wrote earlier, the AGENTS.md file is no longer as critical. Another recent example is Vercel, which removed 80% of specialized tools from its internal text-to-SQL agent, leaving only a single "execute bash" in a sandbox (https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools).

We are learning to simplify the architectures we over-engineered over the past two years, using minimal tools to avoid hindering powerful models.

NxCode Team on AI Agent Operations
https://www.nxcode.io/resources/news/harness-engineering-complete-guide-ai-agent-codex-2026
Explains the harness as a "bridle + saddle + reins" for a powerful but uncontrolled "horse" (the model). An example is LangChain, which boosted a coding agent from 52.8% to 66.5% on Terminal Bench without changing the model—only through middleware (self-verification, loop detection, context mapping).

Agents fail not because of model quality, but because of a poor harness.

It’s important to add that an ideal harness won't save a weak model.

OpenAI on Harness Engineering
https://openai.com/index/harness-engineering/
They state that in the world of agents, the engineer's role is shifting from "writing code" to "managing the environment," where humans steer the direction and agents execute.

The most important thing now is not just a high-quality model, but the environment:
– A structured docs/ folder as the single source of truth,
– A short AGENTS.md (~100 lines) instead of a massive prompt,
– Mechanical linters + CI that check invariants (architecture rules, naming, file size, etc.),
– A "doc-gardening" agent that automatically fixes outdated documentation.

A single Codex run can last up to 6 hours (often overnight). Therefore, it’s better to have all knowledge contained within the repository (versioned artifacts). No external chats or verbal discussions.

Discussion on HN about Harness Engineering
https://news.ycombinator.com/item?id=46988596
Can Bölük (author of https://github.com/can1357/oh-my-pi) took 16 different LLM models and ran them twice on the same benchmark for fixing real bugs in a React app. He changed only one tool—the file editing format. Instead of apply_patch / str_replace, he introduced Hashline (each line gets a short hash, and the model edits by hash rather than text). From this change alone, 14 out of 16 models improved their results.

The primary skill for an IT developer now is designing the harness, not writing code manually. Many confirm that hash-line gives agents a significant boost.

Conspiracy theory: "Companies intentionally keep the best harnesses secret to avoid decreasing token consumption." In recent weeks, Anthropic and Google have been banning custom harnesses; even the post's author was cut off from Gemini during his benchmark.

Some people are already tired of increasingly heavy tools like Claude Code or Cursor, where more and more features are unnecessary, prompts are massive, and everything is hidden.

Pi Agent
https://shittycodingagent.ai/ https://pi.dev/
A super-minimalist open-source AI coding agent for the terminal — just 4 basic tools: read, write, edit, bash. Everything else is handled via extensions. It works as a CLI, headless, RPC, or SDK — which is why Pi is "under the hood" of OpenClaw.

Tree-based sessions — you can branch out, go back, and export to HTML. Full transparency — you can see everything that is happening.

Pi allows connecting various LLM providers. Settings are stored in ~/.pi/agent/ (globally) or .pi/ (locally in the project). Key files: settings.json for general parameters and files like SYSTEM.md for custom prompts. Authentication can be done in two ways: via subscription (OAuth/login) or via API key.

https://www.youtube.com/watch?v=boSPk_Ig4gU

You can set up and use Pi Coding Agent locally for free via Ollama.

How the author built it
https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
https://news.ycombinator.com/item?id=46844822
Without built-in planning modes, background bash, sub-agents, or MCP. The agent avoids hidden injections from other harnesses, ensuring full observability of interactions. It avoids frequent prompt/tool changes (unlike Claude Code) that break workflows.

5–10× longer context windows thanks to the minimal prompt, with the ability to change the model mid-session.

It works with unlimited access to the file system and commands, recognizing that guardrails are often ineffective and productive work requires full capabilities. The "YOLO mode" scares Hacker News commenters: risks of exfiltration, prompt injection, accidental database deletion, etc. Some suggest chroot / containers / VMs, while others argue that sandboxing in Codex is "security theater."

https://news.ycombinator.com/item?id=47143754
Users write that Pi provides a "level of control not seen before." The RPC/headless mode is great for integrations. There is an ecosystem of forks and extensions — the "oh-my-pi" project (https://github.com/can1357/oh-my-pi) is a notable "batteries-included" version, though it is said to often break tools after updates.

Possible Anthropic ban: there are warnings about the risk of account suspension for using alternative clients (similar to OpenCode).

OpenAI is actively trying to seize the initiative from Claude Code, investing heavily in this effort.

Codex remains free for another month
https://openai.com/codex/
An extension of the original limited-time promo from February 2, 2026. Following the release of the Windows version of the Codex app, the promotion has been extended by another month; free ChatGPT accounts can now generate code until April 2. Plus accounts receive double limits.

Codex app for Windows and GPT‑5.4
https://openai.com/index/introducing-gpt-5-4/
OpenAI has finally introduced the Windows version of the Codex app and GPT‑5.4 - a new model that combines the coding capabilities of GPT-5.3-Codex with powerful reasoning. As usual, the model is more token-efficient, faster in iterations, and more proactive.

https://www.youtube.com/watch?v=8hNcRChDrNk

A specialized WinUI App skill has been added for Windows developers. You can now select different terminals and switch to WSL.

Starting from version 26.305, a fast mode has been introduced where GPT-5.4 operates 1.5 times faster while maintaining the same level of intelligence.

On the downside, the "Default open destination" list cannot be edited.

Reports suggest GPT-5.4 can view screenshots, control the mouse and keyboard, and run Playwright in Interactive mode for real-time visual debugging.

WebSocket Mode
https://developers.openai.com/api/docs/guides/websocket-mode/
This is a persistent connection for the Responses API, specifically designed for long agentic workflows with numerous tool calls (agentic coding, automation, orchestration). For coding agents, it significantly reduces iteration latency, offering up to 40% faster execution with 20+ tool calls.

The mode is built into the Codex App (macOS/Windows). In Codex-Spark, this mode is enabled by default. For other models, you need to add responses_websockets_v2 = true to the ~/.codex/config.toml configuration file (CLI version v0.110 will display an "Under-development features" warning).

A year ago, Cursor was the most famous AI-oriented code editor, but competition has significantly increased since then.

They launched their own CLI — adding Plan and Ask modes, sub-agents, skills, image generation, built-in Mermaid ASCII diagrams, and keyboard shortcuts over the winter.

Cursor Cloud Agents with Computer Use
https://forum.cursor.com/t/cloud-agents-with-computer-use/152829
https://cursor.com/blog/third-era
Agents now run the created software in their own VM (a full-fledged computer), test changes, and generate PRs with screenshots and logs. They can record short demo videos. You can connect to the agent's VM and watch the process.

https://www.youtube.com/watch?v=tMflcZHo2zI

Recorded right in the new Cursor office. A deep dive into the latest major update, calling it the "third era" of Cursor: the first was simple AI completions in the editor, the second was local agents, and the third is full cloud agents with their own computer. They are moving towards becoming an agentic platform.

Cursor in Zed and JetBrains
https://forum.cursor.com/t/cursor-is-now-available-in-jetbrains-ides/153584
Added support for the Agent Client Protocol (ACP), meaning you can now use your Cursor subscription and agent in IDEs that support it, such as IntelliJ IDEA, PyCharm, and WebStorm.

Zed AI is for adults only
https://zed.dev/blog/terms-update
Among other changes, Zed introduced an 18+ restriction that applies to the "Service" — the cloud SaaS part: account creation and AI features (Zed Pro, edit prediction, etc.).

In a Hacker News thread, they explained that allowing users under 18 would require verifying parental consent, maintaining separate data storage/processing policies, and implementing an age-gate system. It was simply easier to prohibit it.

JetBrains Air
https://air.dev/changelog
JetBrains is developing Air as an Agentic Development Environment, which is very similar to a response to the OpenAI Codex app — available via JetBrains AI Pro/Ultimate subscription. Currently, a Preview version is available for macOS, while Windows and Linux versions are under development.

It started as a wrapper for Codex and Claude. On March 5, Gemini CLI and Junie were added. You can now choose between different agents depending on the task or combine them — one agent can verify the work of another.

You can use a ChatGPT subscription (in which case only Codex will be available). Login via Claude Pro, Max, and Team has been discontinued due to Anthropic's new usage policy — API keys must be added.

T3 Code
https://t3.codes/
For some reason, Theo decided to be a developer in addition to being a vlogger — so far, it's a buggy wrapper for Codex (Claude Code to follow) with minimal description and documentation. Why anyone would use this instead of the original Codex app is unclear to me.

Leanstral Model
https://mistral.ai/news/leanstral
Mistral AI introduces Leanstral — an open-source code agent for the Lean 4 programming language (which is also an interactive theorem prover). This model, with 6B active parameters in a sparse architecture, is trained not only to perform tasks but also to formally prove the correctness of implementations. This makes it a powerful tool for code verification.

Available for free in Mistral Vibe https://mistral.ai/products/vibe (via API labs-leanstral-2603) and for download for on-premise hosting and integration with lean-lsp-mcp. This is the first contribution to a future where formal verification becomes commonplace and human review is no longer a bottleneck.

HN Reaction
https://news.ycombinator.com/item?id=47404796
Enthusiasts see a future in "executable specs" where an agent writes code + proofs, making regressions impossible. Skeptics remind that proofs only guarantee validity, not that you proved exactly what you intended, and for ordinary projects (non-mathematical/critical software), this is currently "overkill".