CodeWithLLM-Updates
-

Evaluating AI Assistants
https://www.youtube.com/watch?v=tCGju2JB5Fw

Three developers (Wes, Scott, and CJ) discuss and rank various tools, sharing their own experiences, evaluating interface usability, the quality of generated code, and the unique capabilities of each tool.

Services such as Replet and Lovable received specific criticism for their aggressive and sometimes opaque marketing strategy involving influencers. For serious development, CLI tools or IDEs are more suitable, while browser-based solutions are ideal for quick experiments.

Ultimately, Claude Code, Open Code, and ChatGPT received S-tier. Claude Code is praised for its ability to strictly follow instructions and plan work, Open Code — for its openness and the ability to use custom API keys, and ChatGPT remains indispensable for quick queries without the context of the entire project. Most other tools were rated as average — they are useful but do not offer unique advantages.


Vibe Coding Ranking
https://www.youtube.com/watch?v=ebacH8tdXug

This video is a humorous response to the previous one, and the author immediately warns that his ranking should not be taken seriously. Theo examines tools not by technical capabilities but by so-called "vibe coding." The main priority is how much it allows you to create something without looking at the code or understanding technical details.

The author jokes that "true vibe coders" avoid seeing code. Therefore, Cursor, VS Code Copilot, Open Code, and Codex receive the lowest rating because they are assistants for real developers who require active participation, writing, and reviewing code. They destroy the "vibe."

The highest rating was given to a platform that maximally abstracts from code, and that is V0 from Vercel — it has a simple interface, replaces technical terms (e.g., "fork" with "duplicate"), and offers powerful integrations that can be configured with a few clicks without any knowledge of APIs.

Surprisingly, Claude Code received an A-tier for its ability to perform tasks autonomously, hiding the technical implementation from the user.

Almost all modern AI coding tools have added the Claude Sonnet 4.5 model.

Cursor 1.7
https://cursor.com/changelog/1-7
Responding to Kiro and Github SpecKit, Cursor has redesigned its Planning mode; it now creates descriptions, plans, and task lists before starting.

https://www.youtube.com/watch?v=WInPBmCK3l4

The terminal finally runs commands in a separate sandbox, and Windows PowerShell interaction has been fixed. The agent can also open a Browser and take screenshots, and has learned to read images from disk. The OS taskbar now shows a list of agents and what they are doing.

Kiro v0.3.0
https://kiro.dev/changelog/spec-mvp-tasks-intelligent-diagnostics-and-ai-commit-messages/
Kiro has finally replaced separate limits for two modes with unified points that count everywhere, now it works like Windsurf. Sonnet 4.5 has been added, but it's strange that the coefficient is like Sonnet 4, only Auto mode is 1 credit. They still haven't made it possible to drag files or folders into the chat area to reference them as context, only via hashtag.

Codex Github Action
https://github.com/openai/codex-action
OpenAI announced at DevDay[2025] that Codex has exited beta — now stable with enhanced capabilities. There's a Codex Github Action, a built-in widget gallery, and MCP support. A Codex SDK is available for integration.

OpenAI is also transforming ChatGPT into an "operating system" for AI agents. You can now write your own applications and agents inside ChatGPT, connect payments, authorization, and metrics.

Gemini CLI Extensions
https://blog.google/technology/developers/gemini-cli-extensions/
Google has launched a separate website for Gemini CLI https://geminicli.com/ with a documentation section. Extensions for Gemini CLI are a new feature that allows developers to customize and connect various programs, integrating services like Dynatrace, Figma, Stripe, Snyk, and others.

The system is open, allowing anyone to create their own extensions, and Google has already released a set for integration with its products (Google Cloud, Firebase, Flutter).

Jules CLI
https://jules.google/docs/changelog/
The Jules cloud agent has a rather imperfect web interface from which it is unclear what it is currently doing - but it gives as many as 15 tasks a day without paying for tokens. Now you can install npm install -g @google/jules locally, all commands jules help. Windows is not supported.

The CLI allows you to create tasks, view active sessions (jules remote list), and monitor from the terminal in a convenient visual format. It supports scripting by combining with utilities such as gh, jq, or cat.

There is an option to take code from an active Jules session and apply it to a local machine for immediate testing of changes without waiting for a commit to GitHub.

ALSO

  • From September 30, 2025, Jules can learn from interaction: save settings, prompts, and corrections.
  • From September 29, 2025, you can precisely specify to Jules which files to work with for any task.
  • From September 23, 2025, Jules can read and respond to comments in pull requests.

Code Mode
https://blog.cloudflare.com/code-mode/
A new approach called "Code Mode" improves AI's interaction with external tools. Instead of forcing Large Language Models (LLMs) to directly "call tools" via the MCP protocol, which is unnatural for them, it's proposed to ask them to write TypeScript code that accesses these tools via an API.

The system automatically converts tools available via the MCP protocol into a clear TypeScript API with documentation. The AI-generated code is executed in a secure isolated "sandbox." The MCP protocol itself remains important, as it provides a standardized way to connect to services, obtain their descriptions, and securely authorize, allowing the system to manage access without directly involving the AI.

This method is much more effective because LLMs are trained on vast arrays of real code and are better at writing it than using specialized, artificially created commands.

The technological basis for this "sandbox" is the Cloudflare Workers platform, which uses lightweight and extremely fast V8 isolates instead of slow containers. This ensures high efficiency and security: the code is completely isolated from the internet and can only interact with permitted tools.