CodeWithLLM-Updates
-

Cursor 2.0
https://cursor.com/changelog/2-0
https://cursor.com/blog/composer
A significant update to one of the main AI coding tools. Cursor decided to respond to Windsurf (by the way, they also updated their SWE model to 1.5) and also created its own model specifically for software development. They named it "Composer" and claim it's 4 times faster than models with similar intelligence, but I think this is just to pay less to external providers.

The main novelty is the ability to run up to eight agents simultaneously (Multi-Agents) and a new interface for managing these agents. Each operates in an isolated copy of the code, preventing conflicts. A voice mode for agent control has appeared.

https://www.youtube.com/watch?v=Q7NXyjIW88E

Browser and isolated terminal (sandbox) features have exited beta. Enterprise clients received extended security control, including isolated terminal settings and an audit log to track administrator actions.

https://news.ycombinator.com/item?id=45748725
Community reaction is mixed but very active, with a clear division between supporters and skeptics. Supporters emphasize that the overall experience ("flow") is unparalleled, as it allows staying focused and in the development flow, and call Cursor the only AI agent that feels like a serious product, not a prototype. The new Composer model is praised for its exceptional speed.

Some complain that requests "hang" or the program crashes, especially on Windows. Several commentators noted that due to reliability issues, they switched to Claude Code, which proved to be "faster and 100% reliable."

There is also skepticism about lack of transparency: the company is criticized for vague graphs without specific model names and for using an internal, closed benchmark (Cursor Bench) to evaluate performance. Many want to know exactly what model underpins Composer (whether it's a fine-tuned open model), but developers evade a direct answer.

ForrestKnight on AI Coding
A guide on how to effectively and professionally use AI for writing code, as experienced developers do.

https://www.youtube.com/watch?v=5fhcklZe-qE

For complex planning, use more powerful models, and for code generation, use faster and cheaper ones. Do not switch models unnecessarily within the same conversation.

AI can quickly analyze other people's code or libraries, explain architecture, and draw component interaction diagrams.

  1. Preparation. At the beginning of the work, use AI to analyze the entire project and build a context description for it. Create files with rules (global for all projects and specific to a particular one). Specify your technology stack there (e.g., TypeScript, PostgreSQL), standards, branch naming conventions, etc.
  2. Specificity. At the start of a new chat, indicate which files need to be changed and which code to pay attention to. Write in detail, for example, "Add a boolean field editable to the users table, expose it via the API, and on the frontend, show the button only if this field is true." Add logs, and error screenshots.
  3. Manage. AI first creates a detailed step-by-step implementation plan. You review, correct, and only then give the command to generate code. You cannot blindly trust its choices.
  4. Edit. Analyze the generated code. It is necessary and possible to manually edit and refine it to a high quality. Ask why AI chose a particular solution and what the risks are.
  5. Team of Agents. You can launch one agent for writing code, a second for writing tests, and a third for reviewing the first agent's code.
  6. You can give Git commands in natural language, such as "create a branch for the release and move bug fixes there."

Kimi CLI
https://github.com/MoonshotAI/kimi-cli
https://www.kimi.com/coding/docs/kimi-cli.html
A new terminal coding agent from Chinese Moonshot AI. Written in Python. Currently in technical preview. Only Kimi or Moonshot API platforms can be used as providers. https://www.kimi.com/coding/docs/ - there are tariff plans with musical names for 49 / 99 / 199 yen per month.

Interestingly: similar to Wrap, you can switch between the agent and a regular terminal. Supports ACL, meaning it can work inside Zed (which, by the way, finally released a Windows version). But Kimi CLI itself does not support Windows, only Mac and Linux for now.

Cline CLI
https://docs.cline.bot/cline-cli/overview
https://cline.ghost.io/cline-cli-return-to-the-primitives/
Cline CLI Preview is presented as a fundamental "primitive" that operates on a single agent loop of Cline Core (which uses a well-known extension). It is independent of model, platform, or runtime environment. This is the basic infrastructure upon which developers can build their own interfaces and automated processes.

Instead of developing complex mechanisms (state management, request routing, logging) from scratch, teams can use Cline as a ready-made foundation. Also currently only macOS and Linux.

Claude Code on the Web
https://www.anthropic.com/news/claude-code-on-the-web
A response to the popularity of Google Jules. The online service allows delegating several tasks to Claude Code in parallel from the browser. A new interface is also available as an early version in the mobile app for iOS. Currently in beta testing and available for Pro and Max plans.

Users can connect their GitHub repositories, describe tasks, after which the system will autonomously write code, tracking progress in real-time and automatically creating pull requests. Each task is executed in an isolated environment ("sandbox") to protect code and data.

https://www.youtube.com/watch?v=hmKRlgEdau4

Claude Haiku 4.5
https://www.anthropic.com/news/claude-haiku-4-5
The updated Haiku model, known for being fast and cheap, now matches the code generation performance of the previous-gen Sonnet 4, while being twice as fast (160-220 tokens/sec) and three times less expensive.

Most will use an architectural approach: using a smarter model (e.g., Sonnet 4.5) as an "orchestrator" that breaks down a complex problem into smaller subtasks. These subtasks are then executed in parallel by a "team" of several Haiku 4.5s.

Haiku 4.5 appears to make code changes significantly more accurately compared to GPT-5 models.


Skills for Claude Models
https://www.anthropic.com/news/skills
https://simonwillison.net/2025/Oct/16/claude-skills/
Essentially, "Agent Skills" are a folder containing onboarding, instructions, resources, and executable code. This allows Claude to be trained for specialized tasks, such as working with internal APIs, or adhering to coding standards. Integrated into all Claude products, a new /v1/skills API endpoint has appeared for management. In Claude Code, they can be installed as plugins from the marketplace or manually by adding them to the ~/.claude/skills folder.

Simon Willison believes the new feature is a huge breakthrough, potentially more important than the MCP protocol. Unlike MCP, which is a complex protocol, a Skill is just a folder with a Markdown file containing instructions and optional scripts. This approach doesn't invent new standards but relies on the existing ability of LLM agents to read files and execute code, making it incredibly flexible and intuitive. Since they are simple files, they are easy to create and share.

https://www.youtube.com/watch?v=kHg1TfSNSFI

Compared to MCP, Skills have a key advantage in token efficiency: instead of loading thousands of tokens to describe tools, the model reads only a brief description of the skill, and loads the full instructions only when needed.

https://news.ycombinator.com/item?id=45607117
https://news.ycombinator.com/item?id=45619537
Many commentators note that Skills are essentially just a way to dynamically add instructions to the model's context when needed. Their proponents say that this simplicity is precisely the genius. Skills represent a new paradigm for organizing and dynamically assembling context. Everyone generally agrees that this is a more successful and lightweight alternative to MCP, which saves us from context overload and consuming thousands of tokens.

Users have noticed that Skills are essentially a formalization of an existing AGENTS.md (or CLAUDE.md) pattern, where instructions for an agent are collected in one file, telling it where to look when something is needed. But Skills make this process more standardized, organized, and scalable. The LLM knows the standard and can help in generating a Skill.

Evaluating AI Assistants
https://www.youtube.com/watch?v=tCGju2JB5Fw

Three developers (Wes, Scott, and CJ) discuss and rank various tools, sharing their own experiences, evaluating interface usability, the quality of generated code, and the unique capabilities of each tool.

Services such as Replit and Lovable received specific criticism for their aggressive and sometimes opaque marketing strategy involving influencers. For serious development, CLI tools or IDEs are more suitable, while browser-based solutions are ideal for quick experiments.

Ultimately, Claude Code, Open Code, and ChatGPT received S-tier. Claude Code is praised for its ability to strictly follow instructions and plan work, Open Code — for its openness and the ability to use custom API keys, and ChatGPT remains indispensable for quick queries without the context of the entire project. Most other tools were rated as average — they are useful but do not offer unique advantages.


Vibe Coding Ranking
https://www.youtube.com/watch?v=ebacH8tdXug

This video is a humorous response to the previous one, and the author immediately warns that his ranking should not be taken seriously. Theo examines tools not by technical capabilities but by so-called "vibe coding." The main priority is how much it allows you to create something without looking at the code or understanding technical details.

The author jokes that "true vibe coders" avoid seeing code. Therefore, Cursor, VS Code Copilot, Open Code, and Codex receive the lowest rating because they are assistants for real developers who require active participation, writing, and reviewing code. They destroy the "vibe."

The highest rating was given to a platform that maximally abstracts from code, and that is V0 from Vercel — it has a simple interface, replaces technical terms (e.g., "fork" with "duplicate"), and offers powerful integrations that can be configured with a few clicks without any knowledge of APIs.

Surprisingly, Claude Code received an A-tier for its ability to perform tasks autonomously, hiding the technical implementation from the user.

Almost all modern AI coding tools have added the Claude Sonnet 4.5 model.

Cursor 1.7
https://cursor.com/changelog/1-7
Responding to Kiro and Github SpecKit, Cursor has redesigned its Planning mode; it now creates descriptions, plans, and task lists before starting.

https://www.youtube.com/watch?v=WInPBmCK3l4

The terminal finally runs commands in a separate sandbox, and Windows PowerShell interaction has been fixed. The agent can also open a Browser and take screenshots, and has learned to read images from disk. The OS taskbar now shows a list of agents and what they are doing.

Kiro v0.3.0
https://kiro.dev/changelog/spec-mvp-tasks-intelligent-diagnostics-and-ai-commit-messages/
Kiro has finally replaced separate limits for two modes with unified points that count everywhere, now it works like Windsurf. Sonnet 4.5 has been added, but it's strange that the coefficient is like Sonnet 4, only Auto mode is 1 credit. They still haven't made it possible to drag files or folders into the chat area to reference them as context, only via hashtag.

Codex Github Action
https://github.com/openai/codex-action
OpenAI announced at DevDay[2025] that Codex has exited beta — now stable with enhanced capabilities. There's a Codex Github Action, a built-in widget gallery, and MCP support. A Codex SDK is available for integration.

OpenAI is also transforming ChatGPT into an "operating system" for AI agents. You can now write your own applications and agents inside ChatGPT, connect payments, authorization, and metrics.

Gemini CLI Extensions
https://blog.google/technology/developers/gemini-cli-extensions/
Google has launched a separate website for Gemini CLI https://geminicli.com/ with a documentation section. Extensions for Gemini CLI are a new feature that allows developers to customize and connect various programs, integrating services like Dynatrace, Figma, Stripe, Snyk, and others.

The system is open, allowing anyone to create their own extensions, and Google has already released a set for integration with its products (Google Cloud, Firebase, Flutter).

Jules CLI
https://jules.google/docs/changelog/
The Jules cloud agent has a rather imperfect web interface from which it is unclear what it is currently doing - but it gives as many as 15 tasks a day without paying for tokens. Now you can install npm install -g @google/jules locally, all commands jules help. Windows is not supported.

The CLI allows you to create tasks, view active sessions (jules remote list), and monitor from the terminal in a convenient visual format. It supports scripting by combining with utilities such as gh, jq, or cat.

There is an option to take code from an active Jules session and apply it to a local machine for immediate testing of changes without waiting for a commit to GitHub.

ALSO

  • From September 30, 2025, Jules can learn from interaction: save settings, prompts, and corrections.
  • From September 29, 2025, you can precisely specify to Jules which files to work with for any task.
  • From September 23, 2025, Jules can read and respond to comments in pull requests.

Code Mode
https://blog.cloudflare.com/code-mode/
A new approach called "Code Mode" improves AI's interaction with external tools. Instead of forcing Large Language Models (LLMs) to directly "call tools" via the MCP protocol, which is unnatural for them, it's proposed to ask them to write TypeScript code that accesses these tools via an API.

The system automatically converts tools available via the MCP protocol into a clear TypeScript API with documentation. The AI-generated code is executed in a secure isolated "sandbox." The MCP protocol itself remains important, as it provides a standardized way to connect to services, obtain their descriptions, and securely authorize, allowing the system to manage access without directly involving the AI.

This method is much more effective because LLMs are trained on vast arrays of real code and are better at writing it than using specialized, artificially created commands.

The technological basis for this "sandbox" is the Cloudflare Workers platform, which uses lightweight and extremely fast V8 isolates instead of slow containers. This ensures high efficiency and security: the code is completely isolated from the internet and can only interact with permitted tools.