CodeWithLLM-Updates
-

Two years ago, programming models behaved like a genie — you’d ask them for something, and they’d do it technically correctly but with a catch. To combat this, many "harnesses" (wrappers) were devised. Apps like Cursor were pioneers in exploring how to do this effectively.

2026 models have become significantly more obedient, so, as I wrote earlier, the AGENTS.md file is no longer as critical. Another recent example is Vercel, which removed 80% of specialized tools from its internal text-to-SQL agent, leaving only a single "execute bash" in a sandbox (https://vercel.com/blog/we-removed-80-percent-of-our-agents-tools).

We are learning to simplify the architectures we over-engineered over the past two years, using minimal tools to avoid hindering powerful models.

NxCode Team on AI Agent Operations
https://www.nxcode.io/resources/news/harness-engineering-complete-guide-ai-agent-codex-2026
Explains the harness as a "bridle + saddle + reins" for a powerful but uncontrolled "horse" (the model). An example is LangChain, which boosted a coding agent from 52.8% to 66.5% on Terminal Bench without changing the model—only through middleware (self-verification, loop detection, context mapping).

Agents fail not because of model quality, but because of a poor harness.

It’s important to add that an ideal harness won't save a weak model.

OpenAI on Harness Engineering
https://openai.com/index/harness-engineering/
They state that in the world of agents, the engineer's role is shifting from "writing code" to "managing the environment," where humans steer the direction and agents execute.

The most important thing now is not just a high-quality model, but the environment:
– A structured docs/ folder as the single source of truth,
– A short AGENTS.md (~100 lines) instead of a massive prompt,
– Mechanical linters + CI that check invariants (architecture rules, naming, file size, etc.),
– A "doc-gardening" agent that automatically fixes outdated documentation.

A single Codex run can last up to 6 hours (often overnight). Therefore, it’s better to have all knowledge contained within the repository (versioned artifacts). No external chats or verbal discussions.

Discussion on HN about Harness Engineering
https://news.ycombinator.com/item?id=46988596
Can Bölük (author of https://github.com/can1357/oh-my-pi) took 16 different LLM models and ran them twice on the same benchmark for fixing real bugs in a React app. He changed only one tool—the file editing format. Instead of apply_patch / str_replace, he introduced Hashline (each line gets a short hash, and the model edits by hash rather than text). From this change alone, 14 out of 16 models improved their results.

The primary skill for an IT developer now is designing the harness, not writing code manually. Many confirm that hash-line gives agents a significant boost.

Conspiracy theory: "Companies intentionally keep the best harnesses secret to avoid decreasing token consumption." In recent weeks, Anthropic and Google have been banning custom harnesses; even the post's author was cut off from Gemini during his benchmark.

Some people are already tired of increasingly heavy tools like Claude Code or Cursor, where more and more features are unnecessary, prompts are massive, and everything is hidden.

Pi Agent
https://shittycodingagent.ai/ https://pi.dev/
A super-minimalist open-source AI coding agent for the terminal — just 4 basic tools: read, write, edit, bash. Everything else is handled via extensions. It works as a CLI, headless, RPC, or SDK — which is why Pi is "under the hood" of OpenClaw.

Tree-based sessions — you can branch out, go back, and export to HTML. Full transparency — you can see everything that is happening.

Pi allows connecting various LLM providers. Settings are stored in ~/.pi/agent/ (globally) or .pi/ (locally in the project). Key files: settings.json for general parameters and files like SYSTEM.md for custom prompts. Authentication can be done in two ways: via subscription (OAuth/login) or via API key.

https://www.youtube.com/watch?v=boSPk_Ig4gU

You can set up and use Pi Coding Agent locally for free via Ollama.

How the author built it
https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
https://news.ycombinator.com/item?id=46844822
Without built-in planning modes, background bash, sub-agents, or MCP. The agent avoids hidden injections from other harnesses, ensuring full observability of interactions. It avoids frequent prompt/tool changes (unlike Claude Code) that break workflows.

5–10× longer context windows thanks to the minimal prompt, with the ability to change the model mid-session.

It works with unlimited access to the file system and commands, recognizing that guardrails are often ineffective and productive work requires full capabilities. The "YOLO mode" scares Hacker News commenters: risks of exfiltration, prompt injection, accidental database deletion, etc. Some suggest chroot / containers / VMs, while others argue that sandboxing in Codex is "security theater."

https://news.ycombinator.com/item?id=47143754
Users write that Pi provides a "level of control not seen before." The RPC/headless mode is great for integrations. There is an ecosystem of forks and extensions — the "oh-my-pi" project (https://github.com/can1357/oh-my-pi) is a notable "batteries-included" version, though it is said to often break tools after updates.

Possible Anthropic ban: there are warnings about the risk of account suspension for using alternative clients (similar to OpenCode).

OpenAI is actively trying to seize the initiative from Claude Code, investing heavily in this effort.

Codex remains free for another month
https://openai.com/codex/
An extension of the original limited-time promo from February 2, 2026. Following the release of the Windows version of the Codex app, the promotion has been extended by another month; free ChatGPT accounts can now generate code until April 2. Plus accounts receive double limits.

Codex app for Windows and GPT‑5.4
https://openai.com/index/introducing-gpt-5-4/
OpenAI has finally introduced the Windows version of the Codex app and GPT‑5.4 - a new model that combines the coding capabilities of GPT-5.3-Codex with powerful reasoning. As usual, the model is more token-efficient, faster in iterations, and more proactive.

https://www.youtube.com/watch?v=8hNcRChDrNk

A specialized WinUI App skill has been added for Windows developers. You can now select different terminals and switch to WSL.

Starting from version 26.305, a fast mode has been introduced where GPT-5.4 operates 1.5 times faster while maintaining the same level of intelligence.

On the downside, the "Default open destination" list cannot be edited.

Reports suggest GPT-5.4 can view screenshots, control the mouse and keyboard, and run Playwright in Interactive mode for real-time visual debugging.

WebSocket Mode
https://developers.openai.com/api/docs/guides/websocket-mode/
This is a persistent connection for the Responses API, specifically designed for long agentic workflows with numerous tool calls (agentic coding, automation, orchestration). For coding agents, it significantly reduces iteration latency, offering up to 40% faster execution with 20+ tool calls.

The mode is built into the Codex App (macOS/Windows). In Codex-Spark, this mode is enabled by default. For other models, you need to add responses_websockets_v2 = true to the ~/.codex/config.toml configuration file (CLI version v0.110 will display an "Under-development features" warning).

A year ago, Cursor was the most famous AI-oriented code editor, but competition has significantly increased since then.

They launched their own CLI — adding Plan and Ask modes, sub-agents, skills, image generation, built-in Mermaid ASCII diagrams, and keyboard shortcuts over the winter.

Cursor Cloud Agents with Computer Use
https://forum.cursor.com/t/cloud-agents-with-computer-use/152829
https://cursor.com/blog/third-era
Agents now run the created software in their own VM (a full-fledged computer), test changes, and generate PRs with screenshots and logs. They can record short demo videos. You can connect to the agent's VM and watch the process.

https://www.youtube.com/watch?v=tMflcZHo2zI

Recorded right in the new Cursor office. A deep dive into the latest major update, calling it the "third era" of Cursor: the first was simple AI completions in the editor, the second was local agents, and the third is full cloud agents with their own computer. They are moving towards becoming an agentic platform.

Cursor in Zed and JetBrains
https://forum.cursor.com/t/cursor-is-now-available-in-jetbrains-ides/153584
Added support for the Agent Client Protocol (ACP), meaning you can now use your Cursor subscription and agent in IDEs that support it, such as IntelliJ IDEA, PyCharm, and WebStorm.

Zed AI is for adults only
https://zed.dev/blog/terms-update
Among other changes, Zed introduced an 18+ restriction that applies to the "Service" — the cloud SaaS part: account creation and AI features (Zed Pro, edit prediction, etc.).

In a Hacker News thread, they explained that allowing users under 18 would require verifying parental consent, maintaining separate data storage/processing policies, and implementing an age-gate system. It was simply easier to prohibit it.

JetBrains Air
https://air.dev/changelog
JetBrains is developing Air as an Agentic Development Environment, which is very similar to a response to the OpenAI Codex app — available via JetBrains AI Pro/Ultimate subscription. Currently, a Preview version is available for macOS, while Windows and Linux versions are under development.

It started as a wrapper for Codex and Claude. On March 5, Gemini CLI and Junie were added. You can now choose between different agents depending on the task or combine them — one agent can verify the work of another.

You can use a ChatGPT subscription (in which case only Codex will be available). Login via Claude Pro, Max, and Team has been discontinued due to Anthropic's new usage policy — API keys must be added.

T3 Code
https://t3.codes/
For some reason, Theo decided to be a developer in addition to being a vlogger — so far, it's a buggy wrapper for Codex (Claude Code to follow) with minimal description and documentation. Why anyone would use this instead of the original Codex app is unclear to me.

Leanstral Model
https://mistral.ai/news/leanstral
Mistral AI introduces Leanstral — an open-source code agent for the Lean 4 programming language (which is also an interactive theorem prover). This model, with 6B active parameters in a sparse architecture, is trained not only to perform tasks but also to formally prove the correctness of implementations. This makes it a powerful tool for code verification.

Available for free in Mistral Vibe https://mistral.ai/products/vibe (via API labs-leanstral-2603) and for download for on-premise hosting and integration with lean-lsp-mcp. This is the first contribution to a future where formal verification becomes commonplace and human review is no longer a bottleneck.

HN Reaction
https://news.ycombinator.com/item?id=47404796
Enthusiasts see a future in "executable specs" where an agent writes code + proofs, making regressions impossible. Skeptics remind that proofs only guarantee validity, not that you proved exactly what you intended, and for ordinary projects (non-mathematical/critical software), this is currently "overkill".

Nvidia Nemotron 3 Super
https://build.nvidia.com/nvidia/nemotron-3-super-120b-a12b
Nvidia presented their new model - Nemotron 3 Super, an open hybrid Mamba-Transformer MoE model: 120B total / 12B active parameters, 1M token context. Currently available for free in Kilo Code: https://blog.kilo.ai/p/nvidia-nemotron-3-super-launch

The Hacker News post about the release gained only 13 points and 2 comments; generally, no one seems to care. Nvidia took a long time to develop this, and Qwen 3.5 has already "caught up and surpassed" many competitors.

Cursor Model Update
https://forum.cursor.com/t/introducing-composer-2/155288
https://cursor.com/blog/composer-2
Composer is Cursor's own coding LLM, which delivers good results on simple tasks. Version 2 was specifically trained on long coding tasks using reinforcement learning. The model is quite affordable, with both standard and fast variants available.

and this is Kimi K2.5
https://news.ycombinator.com/item?id=47452404
Users noticed that Cursor Composer 2 is based on the Chinese open-weight model Kimi K2.5 from Moonshot AI, rather than being a completely in-house development by Cursor from scratch.

The Kimi K2.5 model has a specific modified MIT license. It requires mandatory disclosure of the name "Kimi K2.5" in the interface if the company's revenue exceeds $20 million per month. Later, representatives from Moonshot and Cursor confirmed an official partnership between them. Cursor accesses Kimi through the inference provider Fireworks AI.

Cursor Interface Update
https://forum.cursor.com/t/what-is-cursor-glass/155327
https://cursor.com/glass
Glass is a completely new interface currently in early access, based on an agent command center paradigm. Some users are already complaining that the update "forcefully" installs Glass without a way to switch back yet.

https://www.youtube.com/watch?v=stRhZIrwa-w

Now agents are managed in a single space: project threads, parallel sessions, plugin marketplace, built-in browser+terminal, one-click Git, Shift+Tab planning with Mermaid diagrams, and todos.

It's a good step to stay competitive. Of course, there's a lack of original ideas, as the name sounds like an Apple interface and the look mimics the Codex app. However, a bigger problem now is that it’s no longer easy to create or open files manually. Consequently, Cursor is losing its status as an "AI IDE" where one could still write code directly (an editor for humans).