CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

Model Upgrade to Kimi K2.5
https://www.kimi.com/blog/kimi-k2-5.html
This is an open-source model (though very large — requiring hundreds of GB of VRAM), setting new standards in multimodality, programming, and autonomous agent work. The developers' main pride is the Agent Swarm mode. Instead of one agent performing tasks sequentially, K2.5 can independently create and coordinate an entire "swarm" of 100 sub-agents.

https://www.youtube.com/watch?v=eQyAzZboDbw

High scores on SWE-Bench (76.8%), close to GPT-5.2 and Claude Opus 4.5. It handles real-world code generation tasks well.

Kimi K2.5 is not just a text model, but a "natively multimodal" intelligence. It was fine-tuned on a massive dataset of 15 trillion mixed visual and text tokens. Thanks to this, the model simultaneously improves its skills in both text understanding and image/video analysis.

As a result, Kimi K2.5 demonstrates excellent results in frontend development. The model can "see" its own mistakes in the visual interface and autonomously fixes them (autonomous visual debugging). The model can also convert video-to-site.

Kimi Code CLI 1.0
https://moonshotai.github.io/kimi-cli/en/
The Chinese company Moonshot AI is developing its own command-line interface, a cross-platform solution (Windows, macOS, Linux) — Kimi Code CLI. Recently, the project has evolved from a simple interactive shell into a complex system, although it is still in Technical Preview. In the best Chinese traditions, the interface is a copy of Claude Code.

The CLI already supports the Agent Client Protocol (ACP) for integration into Zed IDE, MCP, third-party providers, and its own OAuth via login/logout. There is also a web interface launch via the kimi web command.

Skills here are called Flow skills. Users can describe scenarios in SKILL.md files (with Mermaid/D2 diagram support) and trigger them with the /flow command.

Subscription for $19
https://www.kimi.com/code
The subscription is focused on programming, providing access to the CLI and IDE. Prices ($19 / $39 / $199) are on par with American market leaders, which reflects Kimi's confidence in the competitiveness of its models.

Strengthening the OpenAI Codex Team
https://www.webpronews.com/openais-strategic-acqui-hire-how-poaching-clines-engineering-team-signals-a-new-phase-in-ai-development-race/
OpenAI's Codex is still losing the battle to its competitor, Claude Code. This might be why the company hired at least seven lead developers from Cline—one of the most famous VS Code plugins for code generation (according to informal social media reports).

Cline representatives officially stated that the company continues to operate and no official acquisition deal with OpenAI took place! This is an example of an "acqui-hire"—a strategy where a large corporation absorbs talent and expertise without officially buying the company. Google did something similar with Windsurf.

Kilo Wants to Intercept Developers
https://blog.kilo.ai/p/cline-just-acqui-hired
The situation with Cline (Kilo is essentially its fork) is bad for the community—after an "acqui-hire," project vitality usually fades: updates slow down, and decisions are made behind closed doors. Cline's future has become murky.

Therefore, anyone who previously contributed to Cline's code is being given $100 to use Kilo services. The top five developers will have a trip to the company's office in Amsterdam paid for to work together.

ChatGPT Subscription in Cline
https://blog.kilo.ai/p/use-chatgpt-subscription-inside-kilo
https://cline.bot/blog/introducing-openai-codex-oauth
Following OpenCode and Kilo Code, Cline now also allows logging in via a ChatGPT subscription to use the 5 hours/week quotas for GPT models.

Ollama Updates
https://github.com/ollama/ollama/releases
Ollama is a project for automating and simplifying the deployment of open LLMs locally. It allows generation to take place directly on your own hardware, protecting private data and removing dependence on network access.

v0.14 - added compatibility with Anthropic API. Now any open model can be connected to Claude Code.

https://docs.ollama.com/integrations/claude-code
v0.15 - New convenient launch command ollama launch for using Ollama models with Claude Code, Codex, OpenCode, and Droid without separate configuration.

https://www.youtube.com/watch?v=3x2q6-5XbQ8

Of course, the generation quality will be lower than with Anthropic models, but it's 100% private and works without the internet.

https://docs.ollama.com/integrations/clawdbot
Later, ollama launch clawdbot was added to run Clawdbot/Moltbot/OpenClaw with local models.

Agent Skills Adaptation
https://agentskills.io/home
Following Anthropic's rollout of the Skills API (skills-2025-10-02) and the release of the standard on December 18, 2025, OpenAI with GPT-5.2 Thinking quietly responded almost immediately by adding /home/oai/skills to ChatGPT and skills.md support in Codex. MS integrated support into VS Code in December as well. and Cursor did too.

https://opencode.ai/docs/skills/ in OpenCode CLI v1.0.186, December 22, 2025.
https://qwenlm.github.io/qwen-code-docs/en/users/features/skills/ in Qwen code v0.6, December 26, 2025.
https://geminicli.com/docs/cli/skills/ in Gemini CLI v0.23, January 7, 2026.

Clawdbot
https://molt.bot/ and https://www.clawhub.ai/
Skills are exactly what make Clawdbot/Moltbot such a powerful tool.

Atlassian, Figma, Canva, Stripe, Notion, Zapier—just as they did with the Model Context Protocol (MCP) a year ago—have also released their own skills.

Catalogs have started to emerge:

https://github.com/runkids/skillshare - synchronization of skills between Claude Code, ClawdBot, OpenCode, etc.

Getting Started with Codex
https://www.youtube.com/watch?v=px7XlbYgk7I
OpenAI released a detailed 53-minute workshop on how to start working with Codex, their code generation tool. The presentation covers all stages: from installation to advanced use cases.

Differences between Codex in the terminal (CLI), as a VS Code extension, and in the cloud. What the AGENTS.md file does. How to connect external services (e.g., Jira, Figma, documentation databases) via MCP servers.

https://www.youtube.com/watch?v=px7XlbYgk7I

Effective Prompting: Using @ to reference specific files. Ability to add screenshots (e.g., UI mockups) for code generation. Session restoration (codex resume) to continue working on complex tasks.

Advanced Scenarios: Code Review. Writing unit tests and documentation. Automated fixing of failed tests in CI/CD pipelines. Generating diagrams (Mermaid sequence diagrams) to explain code logic.

How Codex Works
https://openai.com/index/unrolling-the-codex-agent-loop/
Recently, distrust towards Anthropic has been growing, with many highlighting that Claude Code is not an open-source project. Against this backdrop, OpenAI has an opportunity to promote Codex. They released an article emphasizing that their project is open-source, allowing anyone to audit the code, and explained how it works.

At the core of Codex CLI is an "agent loop" that coordinates interaction between the user, the AI model, and tools. This loop repeats until the model provides a final text response. Constructing the initial prompt is a complex procedure: it consists of system instructions, a list of available tools (both built-in and external via MCP servers), and a description of the local environment.

Architecturally, Codex uses a stateless approach, moving away from the previous_response_id parameter. This means all necessary information is resent in every request, supporting a "Zero Data Retention" policy for enterprise clients. It is possible to use the gpt-oss model via Ollama 0.13.4+ or LM Studio 0.3.39+ entirely locally.

https://news.ycombinator.com/item?id=46737630
Many were pleasantly surprised by the transition to Rust (the codex-rs project), which has become the primary version, though some are confused by the npm installation method. The context compaction mechanism (/responses/compact) was highly praised as being superior to competitors.

Autonomous Coding Experiment
https://cursor.com/blog/scaling-agents
Cursor launched hundreds of AI agents simultaneously to work on a single collaborative project for weeks without human intervention. The essence is to move from the "one chatbot solves one task" format to a "virtual IT company" model, where agents work in parallel without interfering with each other.

The main takeaway is that simply increasing the number of agents is effective for solving complex tasks if prompts and models are properly configured (Opus 4.5 tends to "cut corners," while GPT-5.2 is better at long-term planning). The solution was a hierarchical "Planners and Workers" approach. Planners continuously explore the code and create tasks, while Workers implement them without being distracted by overall coordination.

Agents wrote over a million lines of code, building a web browser, a Windows 7 emulator, and an Excel clone from scratch.

https://www.youtube.com/watch?v=U7s_CaI93Mo

Agents created a browser, but it doesn't work
https://emsh.cat/cursor-implied-success-without-evidence/
A blog post by embedding-shapes debunks this "success." The author claims that Cursor's experiment is a marketing illusion and fiction, and the agents' output is non-working garbage: the project cannot be built. The cargo build command returns dozens of errors. Agents spent weeks writing code but seemingly never tested it for functionality and ignored compilation errors.

This is "AI slop"—generated text that looks like code but lacks real logic or a working structure. The agents simply "inflated" the volume (a million lines) but failed the basic minimum: creating a program that at least launches and opens a simple HTML file. In other words, they created code, not a program.

https://news.ycombinator.com/item?id=46646777
Users (specifically nindalf) looked into the dependency file (Cargo.toml) and discovered that the "browser" uses ready-made components from Servo (a motor by Mozilla/Igalia) for HTML and CSS parsing, as well as the QuickJS library for JavaScript. Cursor's claim that agents wrote all of this "from scratch" was deemed a lie. The code generated by the agents is mostly "glue" connecting existing third-party libraries.

The community confirmed the findings of the embedding-shapes author: the code does not compile, tests fail, and the commit history shows that agents simply generated gigabytes of text without functional verification. The claims about "millions of lines of code" and "autonomous agents" are targeted at managers and investors who won't check the repository. The situation is being compared to fraud.

Decline in Code Generation Quality
https://spectrum.ieee.org/ai-coding-degrades
Data scientist Jamie Twiss says that in his experience, AI code generation agents reached a plateau in 2025 and the quality of their work has now begun to decline.

His assumption: previously, models often made mistakes in syntax or structure. However, an increasing number of modern AI code generator users are "vibe" programmers. If a user accepts the code, the model considers its job done. This is how Reinforcement Learning from Human Feedback (RLHF) works, which "poisons" new model iterations and teaches them to "please" the user by masking problems instead of writing correct and secure code.

https://news.ycombinator.com/item?id=46542036
A significant portion of HN commenters believe that models are not getting worse; rather, their "capability architecture" is changing, and old prompting methods no longer work. Developers must constantly adapt. Working with AI is not about magically correct answers every time, but a separate engineering discipline that requires thorough auditing and complex control tools.

Some suggest that large subscription providers dynamically swap large models for smaller (distilled) ones during peak loads. Because of this, users periodically experience AI "stupidity."

Claude Max in Opencode
https://github.com/anomalyco/opencode/issues/7410
Opencode users have started reporting notifications about the inability to use the Claude subscription plan for $200/month. Some thought it was an account-specific issue, but it seems Anthropic has decided to crack down on third-party CLIs.

One commenter noted that using the "Max" plan within Opencode was never officially authorized by Anthropic, so the block was only a matter of time.

Many developers in the comments state they have cancelled their paid Claude subscriptions because the service is not suitable for them without OpenCode. Some users are switching to "Zen," an internal service from the OpenCode developers that works via API, where payment is based on actual token usage rather than a fixed monthly fee.

https://news.ycombinator.com/item?id=46549823
Many developers on HN claim that OpenCode has recently become technically superior to Anthropic's official tool. The community believes the blocking decision is tied to telemetry: by using the official Claude Code CLI, you agree by default (or via manipulative UI) to provide Anthropic with data on how you accept or reject code. This is invaluable for training future models. Third-party clients like OpenCode "steal" this data from Anthropic.

Later, it was revealed that the "blocking" was quite primitive: OpenCode simply mimicked official behavior by sending a system prompt: "You are Claude Code, the official CLI from Anthropic." The community has already found "workarounds": changing tool names (e.g., using different capitalization) or updating plugins restores functionality for the time being.

https://github.com/anomalyco/opencode/releases/tag/v1.1.11
Opencode has added support for OpenAI's Codex pricing plan authentication.

Superset - a multiterminal for Agents
https://superset.sh/
Currently Mac only, with Windows and Linux versions planned. It is an Electron-based app, a terminal with tabs specifically adapted to manage multiple agents like Claude Code, OpenCode, OpenAI Codex, and others simultaneously.

It automatically creates isolated git worktrees (best practice), sets up environments, isolates tasks to avoid conflicts, adds notification hooks, and includes a built-in diff-viewer for quick review of changes and PR creation. Future plans include cloud workspaces, context sharing between agents, and orchestration.

Logic-wise, it is similar to https://github.com/tmux/tmux, the well-known "terminal of terminals" for Unix-like systems (Linux, macOS, BSD, etc.), which allows creating and managing multiple sessions in a single window using panes and windows.

Mysti as a team of agents
https://github.com/DeepMyst/Mysti
A VS Code extension that allows combining any two different models (shared context) in Brainstorm Mode to receive higher-quality advice. It solves the issue of switching between different paid AI subscriptions to get alternative opinions on complex architectural decisions. Currently supports models from Claude, Codex, Gemini, and GitHub Copilot CLI.

HN Discussion
https://news.ycombinator.com/item?id=46365105
The community shows significant interest in the idea of multi-agent collaboration, actively sharing personal workflows and alternative tools. Many participants experiment with similar approaches manually (e.g., via Tmux panes with several CLI agents) and believe that debates between models help identify weak ideas and improve solutions, especially when one model gets "stuck."

Regarding Mysti, there is criticism of its dependency on VS Code, as many users prefer a pure CLI experience.

New Models

Gemini 3 Flash
https://blog.google/products/gemini/gemini-3-flash/
Google is gradually rolling out its junior multimodal agentic model of the new series—benchmarks suggest it is closer to Gemini 3 Pro than Gemini 2.5 Flash. The model outperforms Gemini 2.5 Pro in many tests while being three times faster and significantly cheaper. In some benchmarks, it even surpasses flagship models from other companies.

Since its release, Gemini 3 Flash has become the default model in the Gemini mobile app (replacing 2.5 Flash) and in Google Search's AI Mode. In my Gemini CLI, neither 3 Flash nor 3 Pro has appeared yet—they can be accessed via Google AI Studio.

GLM-4.7
https://z.ai/blog/glm-4.7
Zhipu AI has updated its GLM model. Version 4.7 shows significant progress over GLM-4.6 in multilingual code generation scenarios. It supports "thinking before acting" in frameworks like Claude Code, Kilo Code, Cline, and Roo Code, ensuring stability in complex tasks. Interface generation quality has also been improved.

Model weights (MoE architecture, up to 200K token context) are publicly available on Hugging Face and ModelScope for local deployment. Access is available via Z.ai API, OpenRouter, the z.ai chat interface, and a special GLM Coding Plan ($3 for the first month, then $6).

MiniMax M2.1
https://www.minimax.io/news/minimax-m21
Release of the improved MiniMax M2 model by the Chinese company MiniMax, focused on practical development and agentic systems. It is reported that the model is significantly enhanced for working with non-Python programming languages (Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, etc.), outperforming Claude Sonnet 4.5 and approaching Claude Opus 4.5 in multilingual scenarios.

The model is open-source. API costs are quite low, about 10% of Claude Sonnet. It is compatible with popular agents like Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox, and supports context mechanisms (Skill.md, agent.md, etc.).

They also have a web platform at https://agent.minimax.io/ where you can test how the model builds applications.

https://www.youtube.com/watch?v=kEPLuEjVr_4

SWE-bench Verified comparison: Gemini 3 Flash 78%, MiniMax M2.1 74%, GLM-4.7 73.8%.

MCP as an independent standard
https://aaif.io/
https://openai.com/index/agentic-ai-foundation/
In December 2025, Anthropic transferred the Model Context Protocol (MCP) to the Agentic AI Foundation (AAIF) — a specialized foundation managed by the Linux Foundation. MCP became one of the founding projects of the newly created foundation. Along with MCP, the foundation included projects like goose by Block and AGENTS.md by OpenAI.

Agent Skills as an open standard
https://claude.com/blog/organization-skills-and-directory
https://agentskills.io and https://claude.com/connectors
Agent Skills was announced as an independent open standard on December 18, 2025, with a specification and SDK; it was not transferred to the Linux Foundation or AAIF. Microsoft has already adopted Agent Skills in VS Code and GitHub Copilot; it is also supported by Cursor, Goose, Amp, and OpenCode where Anthropic models are available.

Agent Skills Playground
https://skillsplayground.com/
On this site, by entering your API key, you can experiment with how different models utilize various skills.

Claude code 2.0.74
Added the LSP tool (Language Server Protocol) for code intelligence features such as go-to-definition, find references, and hover tooltips. This significantly improves the development experience, making code navigation faster and more convenient. For now, the agent rarely uses LSP autonomously. The open-source project OpenCode has had LSP support for about 6 months, making the slow progress of proprietary software surprising.

Cursor Visual Editor
https://cursor.com/blog/browser-visual-editor
Cursor introduced a visual editor with a "Point and Prompt" feature: you can simply click on any interface element and describe in text what needs to be changed. It also allows manipulating the site structure using drag-and-drop elements in the DOM tree, changing button order or grid settings.

https://www.youtube.com/watch?v=1S8S89X-xbs

The editor's sidebar provides visual control over component properties (props) and styles: from typography sliders to a color palette. The update aims to blur the line between design and programming, allowing developers to focus on ideas rather than mechanical code work.

Claude Code Plugins
https://code.claude.com/docs/en/plugin-marketplaces
Anthropic launched a plugins marketplace, seemingly in response to a similar one in Gemini CLI. It is not a separate website with an App Store-like interface. It's a system within Claude Code itself, where marketplaces are plugin catalogs (often based on GitHub repositories) that are added and managed via slash commands.

https://www.youtube.com/watch?v=1uWJC2r6Sss

Also, there are now prompt suggestion variants and a hotkey for switching models during a prompt. Subagents can work in parallel. Improved usage statistics and a visual fill-indicator for the context window have been added.

You can now run Claude Code tasks directly from the Claude Android mobile app. This is not a full-fledged terminal on the phone, but an asynchronous integration where Claude runs in the cloud.

Kiro Powers
https://kiro.dev/docs/powers/
Kiro is testing the concept of Powers for a model that solves the problem of context window clutter through dynamic tool activation; the system analyzes the user's query and enables only the necessary "knowledge pack." This is very similar to "Skills" in Anthropic's models.

When many tools (MCP servers) are connected to an agent, it is forced to load hundreds of function descriptions simultaneously. This "eats up" to 40% of the limit before work even begins, leading to irrelevant advice. Instead, each Power is a ready-made set containing instructions (how and when to use tools), server configuration, and automated scenarios.

For example, if you mention "payment," the Power for Stripe is activated, providing specific knowledge about the API and security. As soon as you move to working with a database, Stripe tools are disabled, and instead, the Power for Supabase or Neon is loaded. This allows the agent to remain fast, focus on a specific topic, and produce higher quality code.

The system offers an open ecosystem with one-click installation for popular services (AWS, Figma, Stripe, etc.).

Mistral Devstral 2 and Vibe
https://mistral.ai/news/devstral-2-vibe-cli
The European company Mistral AI is known for its LLMs independent of the US/China. They have updated their programming model and finally released their CLI. These announcements are extremely important for the development of the open-source AI ecosystem in software development.

https://openrouter.ai/mistralai/devstral-2512:free
The new generation of models is called Devstral 2 (123B) and Devstral Small 2 (24B), released under flexible licenses: modified MIT for Devstral 2 and Apache 2.0 for Devstral Small 2. Devstral 2 demonstrates an impressive 72.2% on the SWE-bench benchmark for open models.

The Small version can run locally on NVIDIA hardware, although the larger model (due to its density, not MoE architecture) will require serious hardware like a Mac Studio or several 3090/4090 GPUs.

Currently, Devstral 2 is offered for free via API. The model is already available in Kilo Code and Cline. According to feedback, it is quite mediocre at generating websites, frontend, and animation — it works better with small tasks involving local Python scripts.

https://help.mistral.ai/en/articles/496007-get-started-with-mistral-vibe
Mistral Vibe CLI is like Claude Code, an open-source command-line tool that runs on Windows, macOS, and Linux, based on Devstral models. It can also be run in Zed. It features interface themes, Git integration, MCP support, and agents with custom settings. It supports both interactive and autonomous operation.

https://news.ycombinator.com/item?id=46205437
Commentators noted that "Vibe" sounds like the product is geared towards vibe-coding "played around with an agent and let it churn something out" rather than controlled work by a professional programmer. Some directly call this message "the opposite" of what's needed in real work: augmenting humans, not replacing the process with "chat + tools, good luck."

Mintlify Autopilot
https://www.mintlify.com/blog/autopilot
An AI-powered system that monitors changes in your repository. On every push, it analyzes what needs to be updated in the documentation (both for humans and for AI agents). In the Autopilot dashboard, it shows which changes might require documentation updates. Then the Mintlify agent automatically generates a draft that you can review and refine. It takes into account the code context and the existing tone/style of your documentation.

Code Wiki
https://codewiki.google/
Google launched Code Wiki (currently in public preview) — a platform designed to solve the problem of AI (and humans) reading and understanding existing codebases. The system creates and continuously maintains a structured wiki page for the entire repository.

Key features: full automation, Gemini-powered chat, hyperlinked answers that point directly to code files. The system automatically generates and keeps up-to-date architectural diagrams, class diagrams, sequence diagrams, and detailed descriptions.

There is a waitlist for the upcoming version of Code Wiki that will allow teams to run the exact same system locally and securely on internal, private repositories.

Qoder Repo Wiki
https://docs.qoder.com/user-guide/repo-wiki
A feature inside the Qoder IDE that automatically generates structured documentation for a project (up to 10,000 files per project, in English and Chinese) and continuously tracks changes in both the code and the documentation itself.

It deeply analyzes project structure and implementation details, providing rich context that helps AI agents work more effectively. Wiki generation is fully dynamic.

Full Git synchronization is supported. Generated content is stored in language-specific directories (e.g., repowiki/zh/, repowiki/en/), which can be committed and pushed like regular code. Initial wiki is created with one click (up to ~120 minutes for 4,000 files). After that, the system constantly watches for code changes and can update only the affected sections when modifications are detected.

Originally the feature worked only with Git repositories, but as of December 2, 2025, they added support for generating wikis from local projects without Git.

DeepWiki (by Cognition AI)
https://deepwiki.com/
A free AI tool that turns any GitHub repository (public or private) into a Wikipedia-style knowledge base. It analyzes code, READMEs, and configs, then creates structured pages with architectural and flow diagrams, interactive code hyperlinks, and a natural-language chat interface for asking questions.

Already supports >30,000 open repositories with automatic updates after new commits. An open-source version is available for local/self-hosted deployment.

MCP Standard Year in Review
https://blog.modelcontextprotocol.io/posts/2025-11-25-first-mcp-anniversary/
The blog post describes how in one year, MCP transformed from a small open-source experiment into a de facto standard in the industry. Major companies like Notion, Stripe, GitHub, OpenAI, Microsoft, and Google have created their own servers to automate workflows. For centralized discovery and management of these servers, the MCP Registry was launched, becoming a single catalog for the entire ecosystem.

Coinciding with the anniversary, the team is releasing a new version of the MCP specification (November 2025). Key innovations include support for task-based workflows (for long-running operations), simplified and more secure authorization mechanisms, and an extensions system that allows adding specific functionality without changing the core protocol.

MCP Container Catalog
https://hub.docker.com/mcp
The site hosts a large library of ready-to-use, containerized MCP servers created by the developer community and powered by Docker technology. The servers are grouped by categories. The platform's goal is to simplify the use of MCP tools.

MCP Problems
https://www.youtube.com/watch?v=4h9EQwtKNQ8

The author argues that while MCP is a great idea in theory, in practice it has a serious problem that makes it ineffective. This problem is poor context management.

When an AI agent connects to MCP servers, all descriptions of available tools (tool definitions) are loaded into the language model's "context window." When the agent uses a tool, all results of its work (including intermediate data that may be unnecessary) are also sent to the context.

This consumes a huge number of tokens. The author gives an example where just two connected servers take up 20k tokens. It only gets worse with each iteration. The author calls this problem "context rot".

Agent Skills as an Alternative
https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
https://www.youtube.com/watch?v=fOxC44g8vig

The solution once proposed by Cloudflare is that after finding the right tool, the agent generates code (e.g., TypeScript) to call this API. Based on this idea, Anthropic later proposed Agent Skills - a deep dive into the technology is available at https://leehanchung.github.io/blogs/2025/10/26/claude-skills-deep-dive/

Opus 4.5 in Claude Code
https://www.anthropic.com/news/claude-opus-4-5
Following Sonnet and Haiku, Anthropic's largest model has also been updated to ver 4.5. The problem with version 4 was its very high token price - now it's 3 times (!) cheaper. Therefore, the overall limits have been significantly increased, and Opus 4.5 has been added to Claude Code as an enhanced planning mode. It is capable of "creative problem-solving" — finding non-standard but legitimate solutions.

The main point is that the model uses significantly fewer tokens to achieve results, making it faster and cheaper to use.

Of course, in their programming and tool use tests, Opus 4.5 surpasses both Gemini 3 Pro and GPT-5.1. In one of the tests, the model handled a complex engineering task better than any human candidate. The model manages context and memory more effectively and can coordinate the work of several "sub-agents" to perform complex tasks.

https://www.youtube.com/watch?v=5O39UDNQ8DY

Claude Code is now available in a desktop application (research preview). You can run several local and remote Claude Code sessions in parallel: one agent fixes bugs, another explores GitHub, and a third updates documentation. Git worktrees are used for parallel work with repositories.

We are waiting for the model in most modern AI applications for code generation.

https://news.ycombinator.com/item?id=46037637
Several users emphasized that it's not the price per token that matters as much as the "cost per successful task." A smarter model, like Opus 4.5, makes fewer mistakes and requires fewer tokens to solve a problem, which can ultimately make it cheaper than "cheaper" but less intelligent models.