CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

MiniMax M2 and Agent
https://www.minimax.io/news/minimax-m2
MiniMax introduced a new model M2 and a product based on it — MiniMax Agent. The model is specifically designed for coding agents: it can plan steps and use tools (browser, code interpreter, etc.). It has 229 billion parameters, of which 10 billion are active, and a context window of 200 thousand tokens.

The main idea is to find a balance between high performance, low price, and high speed. The model is fully open source.

https://www.youtube.com/watch?v=IcE-K709QK4

In addition to official information, practical tests and reviews confirm that MiniMax M2 is an extremely powerful model, one of the best open-source models for programming to date. The model successfully coped with creating an operating system simulation with working applications, such as Paint and a terminal, and generated creative websites with unique styles and interactive elements.

At the same time, M2 demonstrated the presence of ethical limitations, refusing to create a site on a fraudulent topic, and could not cope with an overly complex task, such as a PC assembly simulator, which indicates its current limits.

https://agent.minimax.io/
MiniMax Agent online has two modes: Lightning Mode: For quick and simple tasks (answering questions, light coding). Pro Mode: For complex and long-term tasks (deep research, software development, report creation). You can only log in via Google. There is integration with Supabase and an MCP catalog. There are applications for iOS and Android.

Pro Mode is temporarily free, and the API is also temporarily free (until November 7). I did not find anything on the website about code privacy control.

Github Universe 25
https://github.com/events/universe/recap
https://github.blog/news-insights/company-news/welcome-home-agents/
Announced Agent HQ - a future open platform that will allow developers to manage, track, and customize any AI agents (from OpenAI, Google, Anthropic, and others) in one place. Mission Control is a unified interface in GitHub, Mobile, CLI, and VS Code for managing agent operations.

GitHub Copilot received integration updates into workflows. Now it can be assigned tasks from Slack, Microsoft Teams, and other tools - it will use discussion context to perform work.

https://github.blog/changelog/2025-10-28-custom-agents-for-github-copilot/
https://github.blog/changelog/2025-10-28-github-copilot-cli-use-custom-agents-and-delegate-to-copilot-coding-agent/
Custom agents can be defined using a Markdown configuration file in the .github/agents folder of your repository. These allow you to define agent "personas" by specifying instructions, tool selections, and Model Context Protocol (MCP) servers. Configured agents can be invoked from the Copilot CLI using the /agent command.

https://github.blog/changelog/2025-10-28-new-public-preview-features-in-copilot-code-review-ai-reviews-that-see-the-full-picture/
Also introduced is an "agent-powered" code review, where Copilot, in combination with CodeQL, automatically finds and fixes security vulnerabilities. For teams, GitHub Code Quality is a new feature for analyzing code quality, reliability, and maintainability across the entire organization.

For VS Code, a new Plan Mode has been announced, which allows creating a step-by-step plan for task implementation before writing code. Finally, there is support for the AGENTS.md context definition standard.

Cursor 2.0
https://cursor.com/changelog/2-0
https://cursor.com/blog/composer
A significant update to one of the main AI coding tools. Cursor decided to respond to Windsurf (by the way, they also updated their SWE model to 1.5) and also created its own model specifically for software development. They named it "Composer" and claim it's 4 times faster than models with similar intelligence, but I think this is just to pay less to external providers.

The main novelty is the ability to run up to eight agents simultaneously (Multi-Agents) and a new interface for managing these agents. Each operates in an isolated copy of the code, preventing conflicts. A voice mode for agent control has appeared.

https://www.youtube.com/watch?v=Q7NXyjIW88E

Browser and isolated terminal (sandbox) features have exited beta. Enterprise clients received extended security control, including isolated terminal settings and an audit log to track administrator actions.

https://news.ycombinator.com/item?id=45748725
Community reaction is mixed but very active, with a clear division between supporters and skeptics. Supporters emphasize that the overall experience ("flow") is unparalleled, as it allows staying focused and in the development flow, and call Cursor the only AI agent that feels like a serious product, not a prototype. The new Composer model is praised for its exceptional speed.

Some complain that requests "hang" or the program crashes, especially on Windows. Several commentators noted that due to reliability issues, they switched to Claude Code, which proved to be "faster and 100% reliable."

There is also skepticism about lack of transparency: the company is criticized for vague graphs without specific model names and for using an internal, closed benchmark (Cursor Bench) to evaluate performance. Many want to know exactly what model underpins Composer (whether it's a fine-tuned open model), but developers evade a direct answer.

ForrestKnight on AI Coding
A guide on how to effectively and professionally use AI for writing code, as experienced developers do.

https://www.youtube.com/watch?v=5fhcklZe-qE

For complex planning, use more powerful models, and for code generation, use faster and cheaper ones. Do not switch models unnecessarily within the same conversation.

AI can quickly analyze other people's code or libraries, explain architecture, and draw component interaction diagrams.

  1. Preparation. At the beginning of the work, use AI to analyze the entire project and build a context description for it. Create files with rules (global for all projects and specific to a particular one). Specify your technology stack there (e.g., TypeScript, PostgreSQL), standards, branch naming conventions, etc.
  2. Specificity. At the start of a new chat, indicate which files need to be changed and which code to pay attention to. Write in detail, for example, "Add a boolean field editable to the users table, expose it via the API, and on the frontend, show the button only if this field is true." Add logs, and error screenshots.
  3. Manage. AI first creates a detailed step-by-step implementation plan. You review, correct, and only then give the command to generate code. You cannot blindly trust its choices.
  4. Edit. Analyze the generated code. It is necessary and possible to manually edit and refine it to a high quality. Ask why AI chose a particular solution and what the risks are.
  5. Team of Agents. You can launch one agent for writing code, a second for writing tests, and a third for reviewing the first agent's code.
  6. You can give Git commands in natural language, such as "create a branch for the release and move bug fixes there."

Kimi CLI
https://github.com/MoonshotAI/kimi-cli
https://www.kimi.com/coding/docs/kimi-cli.html
A new terminal coding agent from Chinese Moonshot AI. Written in Python. Currently in technical preview. Only Kimi or Moonshot API platforms can be used as providers. https://www.kimi.com/coding/docs/ - there are tariff plans with musical names for 49 / 99 / 199 yen per month.

Interestingly: similar to Wrap, you can switch between the agent and a regular terminal. Supports ACL, meaning it can work inside Zed (which, by the way, finally released a Windows version). But Kimi CLI itself does not support Windows, only Mac and Linux for now.

Cline CLI
https://docs.cline.bot/cline-cli/overview
https://cline.ghost.io/cline-cli-return-to-the-primitives/
Cline CLI Preview is presented as a fundamental "primitive" that operates on a single agent loop of Cline Core (which uses a well-known extension). It is independent of model, platform, or runtime environment. This is the basic infrastructure upon which developers can build their own interfaces and automated processes.

Instead of developing complex mechanisms (state management, request routing, logging) from scratch, teams can use Cline as a ready-made foundation. Also currently only macOS and Linux.

Claude Code on the Web
https://www.anthropic.com/news/claude-code-on-the-web
A response to the popularity of Google Jules. The online service allows delegating several tasks to Claude Code in parallel from the browser. A new interface is also available as an early version in the mobile app for iOS. Currently in beta testing and available for Pro and Max plans.

Users can connect their GitHub repositories, describe tasks, after which the system will autonomously write code, tracking progress in real-time and automatically creating pull requests. Each task is executed in an isolated environment ("sandbox") to protect code and data.

https://www.youtube.com/watch?v=hmKRlgEdau4

Claude Haiku 4.5
https://www.anthropic.com/news/claude-haiku-4-5
The updated Haiku model, known for being fast and cheap, now matches the code generation performance of the previous-gen Sonnet 4, while being twice as fast (160-220 tokens/sec) and three times less expensive.

Most will use an architectural approach: using a smarter model (e.g., Sonnet 4.5) as an "orchestrator" that breaks down a complex problem into smaller subtasks. These subtasks are then executed in parallel by a "team" of several Haiku 4.5s.

Haiku 4.5 appears to make code changes significantly more accurately compared to GPT-5 models.


Skills for Claude Models
https://www.anthropic.com/news/skills
https://simonwillison.net/2025/Oct/16/claude-skills/
Essentially, "Agent Skills" are a folder containing onboarding, instructions, resources, and executable code. This allows Claude to be trained for specialized tasks, such as working with internal APIs, or adhering to coding standards. Integrated into all Claude products, a new /v1/skills API endpoint has appeared for management. In Claude Code, they can be installed as plugins from the marketplace or manually by adding them to the ~/.claude/skills folder.

Simon Willison believes the new feature is a huge breakthrough, potentially more important than the MCP protocol. Unlike MCP, which is a complex protocol, a Skill is just a folder with a Markdown file containing instructions and optional scripts. This approach doesn't invent new standards but relies on the existing ability of LLM agents to read files and execute code, making it incredibly flexible and intuitive. Since they are simple files, they are easy to create and share.

https://www.youtube.com/watch?v=kHg1TfSNSFI

Compared to MCP, Skills have a key advantage in token efficiency: instead of loading thousands of tokens to describe tools, the model reads only a brief description of the skill, and loads the full instructions only when needed.

https://news.ycombinator.com/item?id=45607117
https://news.ycombinator.com/item?id=45619537
Many commentators note that Skills are essentially just a way to dynamically add instructions to the model's context when needed. Their proponents say that this simplicity is precisely the genius. Skills represent a new paradigm for organizing and dynamically assembling context. Everyone generally agrees that this is a more successful and lightweight alternative to MCP, which saves us from context overload and consuming thousands of tokens.

Users have noticed that Skills are essentially a formalization of an existing AGENTS.md (or CLAUDE.md) pattern, where instructions for an agent are collected in one file, telling it where to look when something is needed. But Skills make this process more standardized, organized, and scalable. The LLM knows the standard and can help in generating a Skill.

Evaluating AI Assistants
https://www.youtube.com/watch?v=tCGju2JB5Fw

Three developers (Wes, Scott, and CJ) discuss and rank various tools, sharing their own experiences, evaluating interface usability, the quality of generated code, and the unique capabilities of each tool.

Services such as Replit and Lovable received specific criticism for their aggressive and sometimes opaque marketing strategy involving influencers. For serious development, CLI tools or IDEs are more suitable, while browser-based solutions are ideal for quick experiments.

Ultimately, Claude Code, Open Code, and ChatGPT received S-tier. Claude Code is praised for its ability to strictly follow instructions and plan work, Open Code — for its openness and the ability to use custom API keys, and ChatGPT remains indispensable for quick queries without the context of the entire project. Most other tools were rated as average — they are useful but do not offer unique advantages.


Vibe Coding Ranking
https://www.youtube.com/watch?v=ebacH8tdXug

This video is a humorous response to the previous one, and the author immediately warns that his ranking should not be taken seriously. Theo examines tools not by technical capabilities but by so-called "vibe coding." The main priority is how much it allows you to create something without looking at the code or understanding technical details.

The author jokes that "true vibe coders" avoid seeing code. Therefore, Cursor, VS Code Copilot, Open Code, and Codex receive the lowest rating because they are assistants for real developers who require active participation, writing, and reviewing code. They destroy the "vibe."

The highest rating was given to a platform that maximally abstracts from code, and that is V0 from Vercel — it has a simple interface, replaces technical terms (e.g., "fork" with "duplicate"), and offers powerful integrations that can be configured with a few clicks without any knowledge of APIs.

Surprisingly, Claude Code received an A-tier for its ability to perform tasks autonomously, hiding the technical implementation from the user.

Almost all modern AI coding tools have added the Claude Sonnet 4.5 model.

Cursor 1.7
https://cursor.com/changelog/1-7
Responding to Kiro and Github SpecKit, Cursor has redesigned its Planning mode; it now creates descriptions, plans, and task lists before starting.

https://www.youtube.com/watch?v=WInPBmCK3l4

The terminal finally runs commands in a separate sandbox, and Windows PowerShell interaction has been fixed. The agent can also open a Browser and take screenshots, and has learned to read images from disk. The OS taskbar now shows a list of agents and what they are doing.

Kiro v0.3.0
https://kiro.dev/changelog/spec-mvp-tasks-intelligent-diagnostics-and-ai-commit-messages/
Kiro has finally replaced separate limits for two modes with unified points that count everywhere, now it works like Windsurf. Sonnet 4.5 has been added, but it's strange that the coefficient is like Sonnet 4, only Auto mode is 1 credit. They still haven't made it possible to drag files or folders into the chat area to reference them as context, only via hashtag.

Codex Github Action
https://github.com/openai/codex-action
OpenAI announced at DevDay[2025] that Codex has exited beta — now stable with enhanced capabilities. There's a Codex Github Action, a built-in widget gallery, and MCP support. A Codex SDK is available for integration.

OpenAI is also transforming ChatGPT into an "operating system" for AI agents. You can now write your own applications and agents inside ChatGPT, connect payments, authorization, and metrics.

Gemini CLI Extensions
https://blog.google/technology/developers/gemini-cli-extensions/
Google has launched a separate website for Gemini CLI https://geminicli.com/ with a documentation section. Extensions for Gemini CLI are a new feature that allows developers to customize and connect various programs, integrating services like Dynatrace, Figma, Stripe, Snyk, and others.

The system is open, allowing anyone to create their own extensions, and Google has already released a set for integration with its products (Google Cloud, Firebase, Flutter).

Jules CLI
https://jules.google/docs/changelog/
The Jules cloud agent has a rather imperfect web interface from which it is unclear what it is currently doing - but it gives as many as 15 tasks a day without paying for tokens. Now you can install npm install -g @google/jules locally, all commands jules help. Windows is not supported.

The CLI allows you to create tasks, view active sessions (jules remote list), and monitor from the terminal in a convenient visual format. It supports scripting by combining with utilities such as gh, jq, or cat.

There is an option to take code from an active Jules session and apply it to a local machine for immediate testing of changes without waiting for a commit to GitHub.

ALSO

  • From September 30, 2025, Jules can learn from interaction: save settings, prompts, and corrections.
  • From September 29, 2025, you can precisely specify to Jules which files to work with for any task.
  • From September 23, 2025, Jules can read and respond to comments in pull requests.

Code Mode
https://blog.cloudflare.com/code-mode/
A new approach called "Code Mode" improves AI's interaction with external tools. Instead of forcing Large Language Models (LLMs) to directly "call tools" via the MCP protocol, which is unnatural for them, it's proposed to ask them to write TypeScript code that accesses these tools via an API.

The system automatically converts tools available via the MCP protocol into a clear TypeScript API with documentation. The AI-generated code is executed in a secure isolated "sandbox." The MCP protocol itself remains important, as it provides a standardized way to connect to services, obtain their descriptions, and securely authorize, allowing the system to manage access without directly involving the AI.

This method is much more effective because LLMs are trained on vast arrays of real code and are better at writing it than using specialized, artificially created commands.

The technological basis for this "sandbox" is the Cloudflare Workers platform, which uses lightweight and extremely fast V8 isolates instead of slow containers. This ensures high efficiency and security: the code is completely isolated from the internet and can only interact with permitted tools.

Claude Code 2.0.0
https://www.npmjs.com/package/@anthropic-ai/claude-code
Redesigned UI. Ctrl-R to search history. Tab to switch thinking modes (persists between sessions). /usage command to view plan limits. /rewind command to revert code changes - native checkpoints finally.

Added a native VS Code extension. Claude Code SDK is now called Claude Agent SDK and allows others to create their own intelligent "agents" based on this technology.

Claude Sonnet 4.5
https://www.anthropic.com/news/claude-sonnet-4-5
They claim it's the best model for programming today. Safety is specifically emphasized: the model has become "more obedient" and less prone to errors or harmful actions. Importantly, this new, more powerful version is already available to everyone at the same price as the previous one.

AI Programming University
https://cline.bot/learn
Cline also offers a resource for beginners in this field. The course consists of two main modules that sequentially introduce key concepts.

The first module, "Prompting", focuses on communicating with AI assistants. It covers clear and specific prompts to get results: writing prompts, gradually refining them, and using system prompts to configure coding style, security, and workflows.

The second module, "LLM Fundamentals", delves into the technical aspects of AI. It explains what Large Language Models (LLMs) are, how they process and generate code. It teaches how to evaluate and select the best model for specific tasks, considering criteria such as speed, cost, accuracy, and privacy.

AI Foundations
https://www.youtube.com/playlist?list=PLuI2ZfvGpzwCEXrl_K6bW5OqNpZq3HkMa
After further simplifying their website design, Cursor has now published a series of "for beginners" videos on their YouTube channel called "AI Foundations" – seems they are looking for new clients.

Topics of each video:

  1. "How AI Models Work": This video explains the basic principles of how artificial intelligence models function, for example, how they process information and generate responses.
  2. "Hallucinations": This section discusses the phenomenon of "hallucinations" in AI — when a model generates confident but untrue or fabricated information that is not based on the input data.
  3. "Tokens & Pricing": The video is dedicated to tokens — units into which text is broken down for processing by the model. It also explains how the number of tokens affects the cost of using AI models via API.
  4. "Context": It talks about the importance of context for AI models. This likely refers to the "context window" — the amount of information (previous messages) that a model can remember during a conversation to provide relevant responses.
  5. "Tool Calling": This video explains how modern AI models can interact with external tools and APIs (e.g., search engines, calculators, databases) to perform more complex tasks.
  6. "Agents": The last video reveals the concept of AI agents — autonomous systems that can independently plan and execute a sequence of actions to achieve a set goal, using various tools.

GitHub Copilot CLI
https://github.com/github/copilot-cli
https://github.blog/changelog/2025-09-25-github-copilot-cli-is-now-in-public-preview/
GitHub Copilot can now also be used from the terminal. The CLI is currently in Public Preview. Supports Linux, macOS, and Windows experimentally (via PowerShell v6+).

By default, it uses the Claude Sonnet 4 model, but also supports GPT-5 via the environment variable COPILOT_MODEL=gpt-5; there is no slash command to switch the model. AI response blocks can be expanded using Ctrl+r.

It uses the GitHub MCP server by default. Provides access to repositories, issues, and pull requests using agent queries.

https://www.youtube.com/watch?v=7tjmA_0pl2c

GitHub Spec Kit and Copilot CLI can be used together. The video shows how to create custom slash commands using additional instruction files.

llmswap
https://github.com/sreenathmmenon/llmswap
A universal tool (SDK and CLI) for developers, allowing easy interaction with various AI providers (OpenAI/GPT-4o, Claude, Gemini, Groq, etc.) directly from the terminal.

This is a Python project - installation pip install llmswap.

Directly from the terminal you can: Generate code and commands, for example llmswap generate "sort files by size" will output du -sh * | sort -hr. Create functions and entire scripts, for example llmswap generate "Python function to read JSON" will generate ready-to-use code with error handling. Perform code reviews, debug errors, analyze logs.

Use cases and examples
https://sreenathmenon.com/blog/2025-09-04-stopped-alt-tabbing-chatgpt-while-coding/
The author emphasizes the main advantage: llmswap brings AI directly into the terminal. Instead of constantly switching between a code editor, a browser with ChatGPT, a search engine, and other assistants, you get answers instantly without losing focus.

Killer feature: Integration with Vim (and other editors). Using the :r ! command in Vim, you can insert the result of any console command directly into a file. llmswap makes this incredibly useful – it can insert code snippets where the cursor is located.

The tool can be part of the workflow for:

  • DevOps: Generating docker-compose.yml, Kubernetes configurations, systemd services.
  • Database Administration: Creating complex SQL queries (find duplicate emails in table), commands for MongoDB.
  • Log Analysis: Creating commands for awk, grep, zgrep to analyze regular and archived logs.
  • and much more.