CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

Chinese AI services continue to gradually catch up with their US counterparts.

TRAE Solo is now Work
https://solo.trae.cn/
https://docs.trae.ai/solo/what-is-trae-solo?_lang=en
ByteDance has renamed its "Trae Solo" tool to Trae Work, highlighting a shift in positioning: from a simple developer assistant to a fully autonomous "AI employee" for various tasks (data scraping, content creation, web research, etc.). Code remains as a separate tab, and a GitHub connector is available. The interface resembles the Codex app, featuring Skills and an MCP with a catalog. The tool is accessible via web, desktop, and mobile. By default, "Privacy Mode" is disabled for new accounts, so users need to enable it manually.

The Capable GLM-5.2
https://docs.z.ai/guides/llm/glm-5.2
https://artificialanalysis.ai/articles/glm-5-2-is-the-new-leading-open-weights-model-on-the-artificial-analysis-intelligence-index
Zhipu AI has released GLM-5.2 — a 753B parameter Mixture-of-Experts (MoE) model under the MIT license, which significantly improves upon GLM-5.1. The context window has been expanded to 1M tokens (compared to 200k in its predecessor).

https://www.youtube.com/watch?v=nODxez6nZEU

The model ranked first among open-source models in the Artificial Analysis Intelligence Index (v4.1) with a score of 51, demonstrating coding skills on par with the proprietary Claude Opus 4.8. While it tends to get confused more easily and consumes more tokens overall, it still delivers results.

Discussion
https://news.ycombinator.com/item?id=48567759
On Hacker News, the model is praised for its price-to-performance ratio in long-running development cycles. However, users note that the "Max" reasoning mode is extremely slow and highly token-intensive. Due to its large size (753B), running it locally on standard MacBook Pros is not possible, but users can rent GPU cloud instances or access it via https://openrouter.ai/z-ai/glm-5-2#providers.

Current top coding models on OpenRouter by usage volume (token count):

  1. MiMo-V2.5 (by xiaomi) — the clear leader with 4.59T (trillion) tokens, representing 22.5% of the total market share.
  2. MiniMax M3 (by minimax) — holds second place with 2.45T tokens (12.0%).
  3. Hy3 preview (by tencent) — third place with 1.43T tokens (7.0%).
  4. Claude Opus 4.7 (by anthropic) — fourth place, accounting for 1.17T tokens (5.7%).
  5. DeepSeek V4 Pro (by deepseek) — closes the top five with 1.14T tokens (5.6%).
  6. DeepSeek V4 Flash (by deepseek) — sixth place with 972B (billion) tokens (4.8%).
  7. GLM 5.1 (by z-ai) — seventh place with 952B tokens (4.7%).
  8. GLM 5.2 (by z-ai) — eighth place with 820B tokens (4.0%).

GLM-5.2 on OpenCode
https://dev.to/danielbergholz/testing-glm-52-on-opencode-im-impressed-1780
The article's author, Daniel Bergholz, tested the model in real-world development conditions by integrating GLM-5.2 via OpenRouter into the free coding agent OpenCode.

In a practical test on an actual Next.js project, the model was tasked with implementing an article search feature with a 300ms debounce without cluttering the browser history. GLM-5.2 proved to be a somewhat slow but highly deliberate model: during the planning phase, it analyzed the project structure without additional prompting, recognized the difference between server and client components, and logically justified using client-side rendering for this task. It wrote clean, working code on the first attempt ("one-shot") and demonstrated a rare "restraint" for AI assistants by not trying to overcomplicate the existing project structure.

The entire session, which included repository analysis, planning, coding, review, and the final fix, cost the author only $0.265 (less than 27 cents).

Code generation quality continues to improve, but the US government is trying to restrict access for others.

Fable 5 — Pulled in 3 Days
https://www.anthropic.com/news/claude-fable-5-mythos-5
https://support.claude.com/en/articles/14328960-identity-verification-on-claude
On June 9, 2026, Anthropic introduced Claude Fable 5, a model of the new Mythos class. Tests showed a record level of autonomy (including completing games using computer vision) and exceptional performance in code generation.

https://www.youtube.com/watch?v=LoIGVdfTq9M

However, just 3 days later, access to the models was suspended. The US government issued an export directive prohibiting foreign nationals from using these models. Unable to immediately filter out foreign users, Anthropic disabled the models for all customers.

To resolve the issue, the company is launching mandatory identity verification (ID + selfie) via the Persona service. While the verification process is global and supports documents from most countries, access to the flagship Fable 5 will only be granted to verified US citizens and residents due to US regulations.

Discussion
https://news.ycombinator.com/item?id=48618455
The Hacker News community reacted highly negatively to these changes. Many international developers point out that paying for an Anthropic subscription is now pointless since they won't get access to future flagship models. The introduction of Persona verification raises serious privacy concerns, while the sudden shutdown of Fable 5 has undermined trust in US SaaS platforms as a reliable foundation for business.

Although this flagship model currently remains available only to a select group—namely the military and Anthropic employees—I believe this is temporary. OpenAI is clearly preparing its response with GPT-6, and Google is also actively developing in this direction. Therefore, the widespread availability of next-generation models with a qualitatively new level of autonomy and code generation is only a matter of months. Let's wait and see.

Chinese AI giant MiniMax has announced a new generation of M models: M3.

MiniMax M3
https://www.minimax.io/blog/minimax-m3
https://www.minimax.io/models/text/m3
MiniMax M3 is built with an emphasis on deep reasoning, coding, and autonomous pipelines. It accepts text+image+video as input and produces text as output. The model is specifically optimized for agentic workflows and complex, long-term tasks rather than simple chat interactions.

MiniMax Sparse Attention (MSA) is a new sparse attention mechanism that radically reduces computational costs for long contexts (approximately 1/20th of the previous generation). It supports up to 1M tokens, with a guaranteed minimum of 512K in the cheaper API version. Token Plans are available (starting at $20/month), with a highlighted $50/month option.

Benchmarks look impressive: SWE-Bench Pro ~59% and Terminal-Bench 2.1 ~66%. This is on par with GPT-5.5 and Gemini 3.1 Pro, trailing only Claude Opus 4.8.

There is no active discussion on Hacker News yet.

MiniMax Code Updates
https://code.minimax.io/
With the M3 update, MiniMax Code has also received a significant upgrade, fully utilizing the model's capabilities: long context, agentic skills, and native multimodality. The program can not only generate code but also create documents, PDFs, slides, tables, and icons. Thanks to multimodality, MiniMax Code supports "computer use" (controlling the computer).

The concept is built around delegation - you don't write code alongside the AI; you manage it. It uses a Producer + Verifier adversarial loop where agents constantly generate, reflect, check, and fix errors in real-time. There is a "Smart Authorize" option to avoid manual monitoring of every agent action.

https://www.youtube.com/watch?v=mBHFGeU18MI

There is native support for MCP servers to connect external databases and documentation. It features a skills marketplace and integration with bots in Telegram, WeChat, and Lark for managing agents from a phone. Autonomous operation for several days without human intervention is possible, as well as scheduled task execution.

Microsoft presented a series of changes at its May Build 2026, shifting from simple AI assistance toward autonomous agents.

Proprietary MAI Models
https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
They introduced a new family of MAI (Microsoft AI) models, totaling 7, including MAI-Code-1-Flash and MAI-Thinking-1. Microsoft is effectively reducing its dependence on OpenAI. The company claims frontier-level results for autonomous tasks.

The MAI-Code-1-Flash model has 5B active out of 137B parameters (medium-sized) and a 2-million-token context window (extremely large), operating with record-low latency. Presentation slides look perfect: they claim it outperforms previous generations of the flagship GPT-4o. It is being positioned as the core engine for pipelines integrated into GitHub Copilot, VS Code, and other MS products.

Discussion
https://news.ycombinator.com/item?id=48374466
In a active Hacker News discussion, people note that open models like Qwen 3.6 (35B) or DeepSeek V4 Flash yield better results and run significantly faster, while GitHub Copilot's new strict token billing might make using MAI-Code-1-Flash economically unviable.

GitHub Copilot — Now a Separate App
https://github.blog/2026-06-02-github-copilot-app-the-agent-native-desktop-experience/
Copilot used to be a plugin in VS Code, then became part of it. Now, following Codex, Cursor, Zed, and others, Microsoft decided to create a standalone GitHub Copilot App — an "agent-native" desktop chat application.

This is a single control plane for agents working in parallel across isolated git trees, creating PRs, and debugging code. MS are moving away from the editor concept toward delegating entire workflows. It includes support for MCP servers.

https://www.youtube.com/watch?v=5Q5mLNYJ6Hw

Instead of getting lost on the GitHub website, the app features a convenient "On your radar" (or Inbox) tab where all your Pull Requests (PRs) and Issues from selected repositories are gathered. You can open any PR, view code changes (diff), leave comments, or approve it. Furthermore, you can tag @copilot directly in comments for fixes or explanations.

Users can create "quick chats" for general questions (even non-code related, like D&D games) or sessions tied to a specific repository. The app allows switching between different LLMs. For instance, the author uses Claude Opus 4.7 for code generation.

Agent Isolation: MXC (Microsoft Execution Containers)
https://www.microsoft.com/en-us/security/blog/2026/06/02/microsoft-build-2026-securing-code-agents-and-models/
Since agents now execute real code and interact with systems and infrastructure, Microsoft is implementing MXC at the Windows 11 kernel level. This is a new level of isolation and sandboxing specifically for AI applications. Windows is essentially turning into an "Agent Runtime" platform.

The demonstration showed OpenClaw attempting to delete all files from the desktop and failing due to these restrictions.

More new models of May.

Cursor Composer 2.5
https://cursor.com/blog/composer-2-5
On May 18, 2026, the Cursor team released Composer 2.5. It is based on the open Kimi K2.5 model from Moonshot AI, but now approximately 85% is Cursor's own fine-tuning. The main change compared to Composer 2 is increased autonomy and cost optimization.

The model offers two tiers: Standard at $0.50 per 1M input / $2.50 per 1M output tokens, and Fast at $3/$15. In SWE-Bench Pro tests, it achieved a 49% success rate (compared to 12% in Composer 2), meaning coding skills and context understanding have grown significantly at a reasonable price.

Qwen 3.7 Max
https://qwen.ai/blog?id=qwen3.7
On May 20, 2026, at the Alibaba Cloud Summit, Qwen 3.7-Max was announced. Unlike the previous Qwen 3.6 line, which focused on general tasks, the new version is positioned exclusively as an agentic model for ultra-long autonomous work cycles. The key change is stability during long-running tasks.

Alibaba demonstrated a case where the model fully autonomously optimized a GPU kernel over 35 hours without any human intervention, performing over 1,100 tool calls. The context window was expanded to 1 million tokens (up from 256k in its predecessor), and the "reasoning density" per token was increased.

Qwen 3.7-Max can generate complex interactive web applications from a single prompt—including 3D scenes on Three.js, Canvas animations, full-page layouts, and dynamic SVGs.

https://openrouter.ai/qwen/qwen3.7-max
There is currently a 50% discount on the model at OpenRouter ($1.25/$3.75), making Qwen 3.7 Max perhaps the best choice for price/performance in long-running tasks.

Claude Opus 4.8 — fewer hallucinations and more control
https://www.anthropic.com/news/claude-opus-4-8
On May 28, 2026, Anthropic introduced Claude Opus 4.8 (pricing remains the same as 4.7 at $5/$25 per million tokens) and once again topped the Artificial Analysis global rankings with a score of 61.4, overtaking GPT-5.5.

Instead of focusing on abstract benchmarks, Anthropic prioritized system "honesty": the model learned to directly state "I don't know" or ask for clarification, and it misses hidden bugs in its own code 4 times less often than Opus 4.7.

Dynamic workflows appeared in Claude Code. Now Opus 4.8 can independently plan large-scale tasks, launch parallel sub-agents, and verify results before submitting the work.

Google at May I/O 2026 has already begun "tightening the screws" and radically reshaping its infrastructure for developers.

Gemini 3.5 Flash
https://deepmind.google/models/gemini-3-5-flash/
The main "engine" of the announcement was the Gemini 3.5 Flash model, which precedes the upcoming 3.5 Pro. Google claims the model works significantly faster than previous generations and shows frontier-level results in agentic coding tasks: ~76.2% on Terminal Bench 2.1 and ~55.1% on SWE-Bench Pro.

The new Flash is significantly more expensive than the previous one, and the massive use of agents quickly burns through tokens and compute.

$100 Plan
https://blog.google/innovation-and-ai/technology/ai/google-io-2026-all-our-announcements/
Google is introducing a new tariff plan — Google AI Ultra for $100 per month, which provides higher limits for using agents in Antigravity. The more expensive enterprise tier is also being updated: instead of simple message limits, a "compute-used" model is increasingly being adopted — actual payment for agent resources and execution.

Everything will be Antigravity
https://antigravity.google/blog/introducing-google-antigravity-2-0
Previously, Project IDX was based on Code OSS (open-source VS Code). Now the strategy has changed: Google is actively shifting focus from IDX and Firebase Studio toward Antigravity.

Instead of fragmented tools, Antigravity 2.0 is now being promoted — an "agent-first" development platform following the chat-centered approach popular in recent months. This is a direct response to the Codex app and Cursor 3, but with full control by Google over the execution environment, sandboxing, and agent orchestration. They are also moving away from "VS Code-like" editors, but have radically removed the text editor altogether.

https://www.youtube.com/watch?v=3arUEZlv9mc

Judging by the low activity on Hacker News and early feedback on Antigravity 2, it seems many developers haven't switched to active use of the tool after launch — it is perceived more as another experimental AI-IDE than a stable work tool.

From Gemini CLI to Antigravity CLI
https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/
Google officially announced the sunsetting of old tools. Most notably, from June 18, 2026, Gemini CLI (open source, daily quotas) and the Gemini Code Assist extension will disappear — they will stop serving requests for free users and even for AI Pro/Ultra subscribers, remaining available only for Enterprise.

Google is effectively shifting focus from Gemini CLI and Gemini Code Assist to the new Antigravity CLI (closed source), which becomes the primary terminal tool for agentic workflows. Quotas now look less like "number of prompts" and more like a compute usage model — how many agents and resources you actually use. Currently, it performs poorly and is gathering bug reports rather than serving as a developer tool.

In addition to Google's models, two Claude models from Anthropic and, for some reason, GPT-OSS 120B from OpenAI are available. That's it.

Native Android in Google AI Studio
https://android-developers.googleblog.com/2026/05/build-android-apps-google-ai-studio.html
In Google AI Studio, you can now generate a native Android app (Kotlin/Jetpack Compose) from a prompt and run it in an emulator directly in the browser.

If a project becomes complex, Google offers "seamless" export to Android Studio for further agentic development.

A few interesting updates for May. Amidst the news about xAI, Anthropic also surprised many by announcing a partnership with SpaceX on May 6 to expand their computing power.

Anthropic Discounts and Transition to New Pricing
https://www.anthropic.com/news/higher-limits-spacex
Anthropic announced a temporary "spring discount" on their API models. They also stopped blocking OpenClaw-style usage. However, this is likely an attempt to smooth things over before big changes: the company is increasingly hinting at a revision of the classic "fixed subscription — unlimited chat" model. Instead of "per token" payment, Compute-based pricing is being introduced. The cost of a request will depend on how many computing resources the model spent on "reasoning."

Claude Code Updates
https://code.claude.com/docs/en/whats-new#week-18
On Windows, Claude Code finally no longer requires Git Bash; if it's missing, the tool now natively uses PowerShell.

Cloud functions. Public access (research preview) was opened for a new command /ultrareview, which spins up several autonomous AI agents in the cloud to check the repository for vulnerabilities and bugs in parallel. Before this, they also launched the /ultraplan command — a large planning task is pushed to Anthropic servers, where an isolated virtual machine is spun up for it (4 CPU cores, 16 GB RAM, with Node.js, Python, Rust, Docker, etc., pre-installed), eventually providing a link to a web interface with the results.

Managing OpenAI Codex from Mobile
https://openai.com/news/codex-mobile-app/
Responding to a similar feature in Claude Code, OpenAI released an update for Codex that allows managing AI agents from a smartphone. Now developers don't have to be near a laptop: they can approve pull requests, run testing pipelines, resolve merge conflicts, or give prompts to fix small bugs on the go. The interface is fully optimized for voice and quick commands — essentially, it's a pocket remote for the agent on your computer.

Gemma Models in Gemini CLI
https://geminicli.com/docs/changelogs/
The Gemini CLI terminal client update (v0.40.0) added experimental integration for local Gemma models. Version v0.41.0 added support for Gemma 4 models (experimental). While intelligent Model Routing and full offline agent execution are not yet available, the team is already preparing full local task execution.

Memory handling was also improved. Tiered Memory allows the agent to store context directly in Markdown files across four levels: from global developer styles (in ~/.gemini/GEMINI.md) to specific project directory rules. A new Auto Memory feature background-analyzes old sessions, finds successful solutions, and suggests saving them as reusable skills in SKILL.md. Auto Memory Inbox (from v0.42) is a system that automatically collects, classifies, and surfaces important pieces of information for an AI assistant’s long-term memory.

Voice mode has also been improved.

Speaking of the major LLM players, xAI is the only one that hasn't been monetizing developers and programmers until now. It seems they are starting to change that.

Cursor and xAI
https://techsifted.com/posts/spacex-cursor-acquisition-april-2026/
SpaceX/xAI has secured an option to acquire Cursor for $60 billion. If the acquisition does not go through, Cursor will still receive $10 billion for partnership and joint R&D work. This is a right to buy the company later at a fixed price.

In March, several key Cursor engineers moved to xAI. In May, Cursor began a massive international expansion and hiring push. If xAI's infrastructure makes future versions even more powerful, most Cursor users will likely stay.

The developer reaction has been mixed. Part of Cursor's audience chose it specifically for its independence—not being OpenAI, Microsoft, or Google, but offering any model of choice. Now that the service is potentially joining Elon Musk's ecosystem, it remains unclear whether this will impact the priority of the Grok model.

Fine-tuning Grok on Cursor Data
https://x.com/elonmusk/status/2055914584373141906
On May 17, xAI completed the primary training of the massive Grok V9 model (1.5 trillion parameters). The next stage is supplemental training using data from Cursor. This will allow Grok models to significantly improve their coding skills, as Cursor has collected a vast database of high-quality code from developers.

Grok Build CLI Launch
https://x.ai/news/grok-build-cli https://x.ai/cli
On May 14, xAI released an early beta version of Grok Build—a code generation agent: task planning, sub-agents for parallel work, headless mode for scripts, support for AGENTS.md, diffs, plugins, etc. It’s a fully-featured tool and a direct competitor to Claude Code and similar instruments.

However, it is currently only available with a SuperGrok Heavy subscription ($300/month with a three-day trial) and runs in the terminal only on Linux/macOS (Windows via WSL). Updates are released almost daily, and users are already praising its speed and quality. Elon Musk is personally asking for feedback.

https://www.youtube.com/watch?v=l_dAOKHLiYw

xAI is currently offering a promotional SuperGrok Heavy subscription: instead of $300 per month, the rate is temporarily around $99 for the first six months. However, users are complaining that even the Heavy plan doesn't feel "unlimited," as real limits can change depending on system load.

AI Didn't Delete the Database
https://idiallo.com/blog/ai-didnt-delete-your-database-you-did
A tweet went viral: a startup founder claimed an AI agent deleted their production database in seconds. He was outraged, questioning the model and blaming "bad AI." However, the author argues: it's not the AI's fault. The issue was a public API endpoint in production that could destroy the entire database with a single request.

It's like placing a self-destruct button in plain sight and being surprised when someone presses it. Ibrahim Diallo says the AI didn't delete the database—the developers did by using unsafe architecture, lack of protection, and irresponsibility. The AI simply discovered what they carelessly left behind.

Discussion
https://news.ycombinator.com/item?id=48022742
Most people fully agree: it's not the AI's fault, but rather the person who gave the agent unrestricted production access without limiting API token permissions or setting up safeguards. A tool can be dangerous, but responsibility always lies with the operator. Many criticize "AI-maximalism"—when developers enthusiastically grant agents full access instead of using sandboxes and reviews.

10 Lessons for Agentic Coding
https://www.dbreunig.com/2026/05/04/10-lessons-for-agentic-coding.html
Thanks to modern AI agents, code has become extremely cheap to create but expensive to maintain, secure, and support. This completely changes the development approach: the key is no longer saving on writing code, but wisely utilizing its low cost.

  1. Implement to Learn.
  2. Rebuild Often.
  3. Invest in End-to-End Tests.
  4. Document Intent.
  5. Keep Your Specs Synced.
  6. Find the Hard Things.
  7. Automate Everything Easy.
  8. Develop Your Taste.
  9. Agents Amplify Expertise.
  10. Code is Cheap, but Maintenance, Support, and Security are Not. Agentic code is "free as in puppies." Maintenance isn't cheap, and neither is security. Build fast, but remember the maintenance burden you are taking on.

Discussion
https://news.ycombinator.com/item?id=48019025
The discussion is active and mostly positive—many consider this one of the most practical and sober publications on working with AI agents. Most agree: code is extremely cheap, so focus must shift to architecture, security, E2E tests, maintenance, and "taste." Skeptics argue that coding is only a small part of the job; business and organizational bottlenecks remain, and in large companies, development speed isn't the primary constraint.

Zed released in version 1.0
https://zed.dev/blog/zed-1-0
Just as Cursor bumped its major version after an interface overhaul, the code editor from the creators of Atom officially reached 1.0 on April 29, 2026. They write: "we've reached a tipping point where most developers can quickly feel at home in Zed."

Built with Rust, it features GPU acceleration, collaborative mode, built-in Git, a debugger, and AI (native and via Agent Client Protocol). Available on macOS, Windows, and Linux. Along with the release, it gained the ability to run multiple agents simultaneously in one window.

Discussion:
https://news.ycombinator.com/item?id=47949027
Many praise the speed, collaboration, native feel, and progress. There is criticism regarding project-specific configuration, AI features (which can be disabled), accessibility, and some minor nuances. Lots of practical feedback from those who switched or tried it.

Warp goes fully open source
https://www.warp.dev/blog/warp-is-now-open-source
On April 28, the AI terminal client Warp became open-source (AGPL for the core code + MIT for the UI framework). Now the community can contribute, including developing agent-first workflows through their cloud agent/orchestrator Oz.

Following the open-sourcing of the Warp client, a popular community fork called OpenWarp (https://openwarp.zerx.dev, zerx-lab) emerged. The project quickly gained popularity. It retains all the familiar Warp functionality (blocks, workflows, speed, UI) but, most importantly, fully opens the AI layer: you can connect any OpenAI-compatible provider (DeepSeek, Qwen, Ollama, OpenRouter, LM Studio, etc.), set custom system prompts via templates, and keep all keys locally without depending on a Warp cloud account or paid plans.

GitHub Copilot moves to usage-based billing
https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
Starting June 1, 2026, all plans will transition to a usage-based model with GitHub AI Credits (1 credit = $0.01). Code completions remain unlimited, while chat, agents, CLI, and other heavy features consume credits based on tokens.

GitHub explains the transition by stating that Copilot is no longer just the simple autocomplete tool it was a year ago—it now includes powerful agentic workflows, chats, code reviews, and complex agents that consume significantly more compute resources. Fixed subscriptions no longer cover the costs.

Discussion:
https://news.ycombinator.com/item?id=47923357
Many understand the reasons (expensive agents and inference) but complain heavily about the loss of predictability, rising costs for power users, and multipliers for powerful models. Tools are available to estimate future bills.

Vibe with a new model and cloud
https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
Mistral introduced a new agentic model, Medium 3.5 (128B, 256k context), and made it the primary model in the Vibe CLI. They also added remote agents that run asynchronously in isolated cloud sandboxes (similar to Codex or Claude Code) for long-running tasks. These can be launched from the CLI or the Le Chat web interface with history synchronization.

Zed Agent Interface
https://zed.dev/blog/parallel-agents
Following Cursor's lead, Zed is adapting its interface to manage multiple agent chats simultaneously. The main innovation is the Threads Sidebar, which helps group threads by project, flexibly configure agent access to repositories, and track their progress. AI panels have been moved to the left, while files and Git are now on the right.

https://www.youtube.com/watch?v=OLit5C1XE0k

Discussion
https://news.ycombinator.com/item?id=47866750
Many programmers are dissatisfied with the interface changes, noting that running several agents simultaneously creates a massive "cognitive load" and complicates code review, as AI still generates too much "garbage" code. Users also mention the unfinished Git interface and the lack of proper code review tools—issues that should be addressed first.

The biggest pain point remains the isolation of databases, configurations, ports, and test data. Developers are actively discussing how to automate this: some write custom shell scripts, some use Devcontainers, while others praise third-party tools like Conductor or Ouijit for managing the lifecycle of such environments.

Claude Design
https://www.anthropic.com/news/claude-design-anthropic-labs
Anthropic introduced a specialized AI tool based on the new Claude Opus 4.7 model and a design system (DESIGN.md) tailored for the product design process. It creates fully functional interactive prototypes, presentations, landing pages, and UI components, outputting ready-to-use HTML, CSS, and JavaScript code in real-time.

A one-click export allows users to transfer the finished design directly into the Claude Code environment.

Models updated, all promising agency:

  • DeepSeek V3.2 -> V4. Two versions: V4-Pro and V4-Flash. Open-source. Context: 1M in, 384K out. China. Cheaper scenarios for long documents, agents, and automation. Code quality is lower than other announced models.
  • GPT-5.4 -> GPT-5.5. Presented as an agent that can be trusted with work where the model must plan several steps ahead. Code generation is even better according to tests, while token consumption remains the same. The best model on the market right now, according to OpenAI.
  • Kimi K2.5 -> K2.6. Open-source. China. Moonshot AI positions the model as an agent for long-term programming tasks.
  • GLM-5 -> 5.1. Open-source. China. Claims significant improvements in code generation and cybersecurity.
  • Qwen 3.5 -> 3.6. Qwen3.6-Plus released as a closed model, followed by the flagship Qwen3.6-Max-Preview.
  • MiniMax M2.5 -> M2.7. Open-weights. China. Also for long tasks; said to have good emotional intelligence and stability on OpenClaw skills.
  • Important open-source / open-weight releases of small Qwen3.6 models for coding: Qwen3.6-35B-A3B — MoE model (35B total / 3B active), and Qwen3.6-27Bdense 27B. These are particularly interesting for running on local hardware.

The difference between GPT-5.5, Kimi K2.6, GLM-5.1, Qwen3.6 Plus, MiniMax M2.7, and DeepSeek-V4-Pro-Max on the SWE-Bench Pro test lies in the 55–59% range, meaning it is already a dense group of strong coding/agent models.

End of free Qwen Code
https://www.reddit.com/r/Qwen_AI/comments/1skeeu5/goodbye_qwen_you_tried_but_you_failed/
The Qwen OAuth free tier for Qwen Code was disabled on April 15, 2026. The old "log in via browser and use for free" scenario no longer works or returns errors such as 401 invalid access token, token expired, Internal error, or free tier quota exceeded.

Claude Code removal test from $20 plan
https://www.reddit.com/r/ClaudeAI/comments/1ss3asp/does_claudes_20_plan_no_longer_include_claude_code/
On April 21, 2026, users noticed that Claude Code disappeared from the $20 Pro plan on Anthropic's pricing page, remaining only in the more expensive Max plans. Anthropic explained that this was an A/B test / pricing experiment affecting approximately 2% of new users.

It seems cheap AI coding is gradually coming to an end.

If Anthropic is following the path of integrating Claude Code into its Work desktop app (finally adding parallel sessions: https://claude.com/blog/claude-code-desktop-redesign), OpenAI is coming from a different angle: this week they updated the Codex coding app and added computer control features. Different paths — same result.

Codex as a Superapp
https://openai.com/index/codex-for-almost-everything/
On macOS, Codex can now see the screen, move its own cursor, click, type text, open any application, and work in the background. Across all platforms, there is a built-in browser, image generation, memory (remembers preferences and previous actions — not yet in EU/UK), and over 90 plugins and integrations.

https://www.youtube.com/watch?v=sdNoaztocs0

While Codex has introduced a very "Cursor-like" pleasant feature — where you can simply click on any element (button, block, text, image) in a generated website to immediately add it to the prompt as a reference — the general trend of both companies (Anthropic and OpenAI) expanding their product audiences is slightly concerning for programmers.

Discussion
https://news.ycombinator.com/item?id=47796469
Many see this as a revolution for ordinary people (non-programmers): agents will be able to create personal UIs, automate business processes, replace entire programs, and radically increase productivity. At the same time, programmers are wary — security and privacy are still neglected: full agent access turns a computer into a "hostile device" where even a .txt file becomes an attack vector.

ChatGPT Pro for $100/mo
https://help.openai.com/en/articles/9793128-about-chatgpt-pro-tiers
At the beginning of April, the Codex token promotion ended; now, a free account can only run about two simple tasks before hitting the weekly limit. The $20 Plus plan also doesn't offer much headroom now, as the weekly limit is only suitable for light work (1-2 hours a day). That’s why, as of April 9th, an intermediate option between Plus and the $200 Pro was added.

The new $100 Pro tier has 5× higher limits than Plus and provides access to GPT-5.4 Pro and GPT-5.3 Instant. There is also a promotion until May 31, 2026, offering double tokens.

This is a direct response to Anthropic, which has Claude Max for $100.

Opus 4.7
https://www.anthropic.com/news/claude-opus-4-7
Claude Opus has updated from 4.6 to 4.7 — same features, but even better on benchmarks. They added "adaptive thinking": the model decides for itself how much to "think" before responding and hides the internal reasoning (it no longer shows the full chain of thought by default).

Discussion
https://news.ycombinator.com/item?id=47793411
The model has become stronger, especially in coding and long contexts. However, it is becoming less debuggable. It is now impossible to properly disable adaptive thinking, which makes Claude Code even worse to use; one has to jump through hoops with commands like /effort xhigh, CLAUDE_CODE_DISABLE_1M_CONTEXT=1, "display": "summarized", etc., just to understand what the model is generating.

Anthropic makes cool models, but the programming tools around them are getting worse.

For about three years, most programming applications were clones of VS Code with a side chat. A new wave seems to have been started by Codex — they released their desktop app on Electron without VSC, as did OpenCode.

Cursor 3
https://cursor.com/blog/cursor-3
The company has completely abandoned the VS Code fork model and built a new interface code-named Glass. The main innovation is the ground-up Agents Window, which allows running an unlimited number of agents simultaneously in parallel: locally, in worktree, via SSH, in the cloud, or even across multiple repositories at once. The new part is reportedly written in Rust+TS.

https://cursor.com/blog/agent-web
Later, they integrated mobile devices via PWA. Cursor Agents on web and mobile is an official way to run cloud agents directly from a phone or mobile browser. You can start a chat on your phone and continue on your desktop (or vice versa).

https://www.youtube.com/watch?v=HTKGyLar8AU

The phrase "Cursor 3 just killed the IDE" is repeated as the main hook.

Discussion
https://news.ycombinator.com/item?id=47618084
Many praise the boldness and technical progress of an agentic future, but even more express disappointment or even outrage that Cursor is radically moving away from the familiar "IDE + plugins + AI assistant" model. Critics argue the company is chasing investor hype that "AI will replace developers" rather than addressing programmers' real needs.

People who want to write code rather than manage a team of agents will have to look for something else, like VS Code or Zed.

App from The Factory
https://factory.ai/news/factory-desktop
Another company made a similar interface clone for "agent management." Interestingly, after installing on Windows 11, it tells me "Not connected to Local Machine. Please download and start the Desktop app, or upgrade to a paid plan to unlock more features," asking me to download their app. While their design is very cool, I couldn't even test their buggy Electron app.

While Claude Code was the undisputed favorite last year, with many tutorials and side projects, I can't quite understand what is happening with the project in 2026. Judging by the decreasing number of YouTube videos, other people can't either.

In February–March, Anthropic announced and rolled out several features that made Claude Code much more autonomous (agentic). There is an active transition from "a single agent in the terminal" to a managed task system and coordination of background agents (Ctrl+B) with an ecosystem of hot-reloaded MCP integrations, skills, hooks, and plugins. Through /teleport, you can initialize /remote-control sessions that can be managed from a mobile app. /loop was introduced for periodic prompt/command execution, along with in-session cron scheduling tools, etc.

Of the truly useful additions, only Auto Mode is worth noting.

Auto Mode
https://claude.com/blog/auto-mode
Presented as a "middle ground" between two extremes in Claude Code. Previously, you either had to manually approve every file change and bash command (very secure but annoying) or use the --dangerously-skip-permissions flag. The new Auto Mode allows Claude to decide for itself which actions are safe and execute them automatically without approval.

Before each tool call, a separate classifier (based on Sonnet 4.6) quickly checks the action for danger. Safe actions proceed automatically; risky ones are blocked. If the model persistently insists on blocked actions, a user prompt eventually appears anyway.

Claude Mythos Announcement Discussion
https://news.ycombinator.com/item?id=47679258
Anthropic describes the personality, goals, and limitations of the new model in a system card. It is not being released publicly—allegedly due to a sharp jump in capabilities and security risks. They claim Mythos has found thousands of zero-day vulnerabilities in OSs, browsers, virtual machines, etc. (including very old bugs). Many write that this could significantly change cybersecurity—for better or worse.

https://red.anthropic.com/2026/mythos-preview/
They also announced Project Glasswing, providing Mythos access to a limited circle of companies to fix critical software using the model.


Recently, many people paying for subscriptions have found Claude Code becoming practically unusable due to recent changes in Anthropic's policies and restrictions without clear rules. Even just mentioning OpenClaw in the system prompt causes the request to be rejected with an error. The system has also become worse at handling non-coding tasks.

Most likely, due to the launch of the new model, they had to maximize the squeeze on all compute that was previously distributed just to attract people to the infrastructure.

Claude Code Source Code
https://twitter.com/Fried_rice/status/2038894956459290963
https://news.ycombinator.com/item?id=47584540
On March 31, someone accidentally published a production build with a sourcemap file (~60 MB) to npm — and the entire Claude Code source code became publicly available. Some thought it was a brilliant April Fools' prank. A mention of a rollout window specifically for April 1–7 was even found in the code. Whether it was a joke or a real mistake is still being debated.

What exactly leaked (based on thread discussions):

  • Full Claude Code agent architecture (tool use, computer use, bash, file operations, etc.).
  • Permission system and "Bypass Permissions Mode" — a detailed description of how guardrails work.
  • Full Claude Code system prompt (including security rules and "cyber risk instructions").
  • Telemetry logic — what exactly is sent to Datadog (model, session ID, subscription type, whether the user is an Anthropic employee, etc.).
  • Internal infrastructure: WebSocket sessions, JWT for IDE integration, feature flags via GrowthBook, session-ingress, etc.
  • Hidden/unreleased features (many posts with "hidden features" breakdowns).
  • "Undercover Mode" subsystem — designed to prevent Claude from disclosing Anthropic's internal information and publishing production builds with sourcemap files.

Analysis by Alex Kim
https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/
https://news.ycombinator.com/item?id=47586778
Anthropic specifically injects fake tools to poison attempts to copy Claude's behavior. There is server-side text summarization with a cryptographic signature. A special mode (undercover.ts) forces the model to hide mentions of internal names (Capybara, Tengu, Slack channels, "Claude Code," etc.). Rigid security for bash commands (23 checks against injections, zero-width characters, etc.). A prompt caching system with "sticky latches" and 14 invalidation vectors.

The autonomous agent KAIROS is mentioned with a /dream skill, daily logs, GitHub webhooks, and updates every 5 minutes. It looks like the next big step after the current Claude Code.

The most meme-worthy moment — userPromptKeywords.ts contains a large regex that catches phrases like: wtf, ffs, omfg, shit, dumbass, fuck you, this sucks, damn it, showing that the user is angry, and the model likely reacts differently (the author assumes this is for experience improvement or escalation).

The leak is dangerous not so much for the code itself, but for revealing the roadmap and internal protection mechanisms.

Visualization
https://ccunpacked.dev/ and https://ccleaks.com/
https://news.ycombinator.com/item?id=47597085
Especially useful for developers who want to understand how Anthropic builds agentic systems (tool calling, multi-agent, planning loop, bash security, etc.).

https://www.youtube.com/watch?v=LA3l81oEzJQ

Key findings — hidden features:

  • KAIROS: A constantly active background agent that works 24/7, monitors repositories, and fixes bugs on its own.
  • ULTRAPLAN: Deep planning for up to 30 minutes in the cloud for complex tasks.
  • BUDDY: A terminal-based Tamagotchi companion with 18 species and statistics.
  • DREAM: An automatic self-cleaning and memory consolidation system.

Analysis by Joe Fabisevich
https://build.ms/2026/4/1/the-claude-code-leak/
https://news.ycombinator.com/item?id=47609294
An indie developer, author of Plinky, writes not about the leak itself, but about what it says about modern development. Anthropic immediately started sending DMCA notices to GitHub (even for their own forks of skills and examples). And then clean-room implementations in Python and Rust appeared.

The discussion jokes about "Claude leaking itself": the classic hype about the model deciding to "open" itself.

Analysis by Han HELOIR YAN, Ph.D.
https://medium.com/@han.heloir/everyone-analyzed-claude-codes-features-nobody-analyzed-its-architecture-1173470ab622
The article is more technical and calm - it focuses not on meme features (like Buddy, Undercover Mode or frustration regex), but on the architecture of Claude Code as a full-fledged production-grade AI agent.

Anthropic's moat is not in the model itself (LLM), but in the harness (the wrapper, the system around the model). It is thanks to this harness that Claude Code feels significantly more powerful than competitors, even if the model is not always the best.