CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

Why LLMs can't really build software
https://news.ycombinator.com/item?id=44900116
More than 500 comments. Central idea: LLMs do not have an abstract "mental model" of how to create something — they only work with text. They do not "understand" code, but only mimic its writing. Many commentators emphasize that the most valuable part of their work is what happens before writing code. 95% of the work is identifying non-obvious dependencies, hidden requirements, or potential problems at the intersection of business logic and technology.

Participants in the discussion agree that LLMs can be useful, but only as a tool in the hands of an experienced specialist, because the main responsibility and control always remain with a human. Unlike traditional tools, LLMs are non-deterministic, which makes them unreliable for complex tasks. Often, fixing errors in such projects takes more time than writing the code manually.


AI Coding Sucks
https://www.youtube.com/watch?v=0ZUkQF6boNg
Already a fairly well-known video where the developer from Coding Garden curses what programming has become. As a result of his frustration, he decided to take a month-long break from all AI tools to rediscover the joy of his work.

The key reason for his dissatisfaction lies in the fundamental difference between programming and working with AI. Programming is a logical, predictable, and knowable system where the same actions always lead to the same result. AI, on the other hand, is unpredictable.

He used to enjoy programming thanks to the sense of achievement after solving a complex problem or fixing a bug, "oh, how capable I am." Now his work has turned into a constant argument with large language models (LLMs), which often generate not what is needed.

The same query to the model can give a different answer each time. This lack of stability makes it impossible to create reliable workflows and contradicts the very nature of programming. It deprives the joy of the process, replacing it with irritation.

He lists numerous advanced methods he tried to apply to make AI more manageable: creating detailed instruction files, step-by-step task planning, using agents, and forcing AI to write tests for self-verification. But the models still ignore rules, bypass problems (e.g., removing failing tests), and do not provide reliable results.

In the end, the author refutes the idea that developers who don't use AI are "falling behind," because these tools can be learned quickly, while fundamental skills are more important and are acquired slowly with experience.

He advises beginners to learn programming without AI.

https://www.youtube.com/watch?v=0ZUkQF6boNg

Comments under the video demonstrate agreement with the author. Many developers felt relief seeing that their frustration is a widespread phenomenon, not a personal problem. Developers compare working with AI to managing an overly confident but incompetent junior specialist. Such an "assistant" assures that he understood everything but actually doesn't listen and does whatever he wants. Corrected errors reappear, the tool ignores given rules, and its code cannot be relied upon.

Many commentators express concern that beginners who rely on AI will never learn to program properly. This is compared to mindlessly copying code from Stack Overflow, but on a worse scale. Beginners do not develop fundamental problem-solving skills, which in the long run makes them weaker specialists.

DPAI Arena
https://dpaia.dev/ https://github.com/dpaia
JetBrains has introduced Developer Productivity AI Arena (DPAI Arena) — another "first" open platform that evaluates the effectiveness of AI agents in code generation. To ensure neutrality and independence, JetBrains plans to transfer the project under the management of the Linux Foundation.

The company believes that existing testing methods are outdated and only evaluate language models, not full-fledged AI agents (although https://www.swebench.com/ exists). The platform aims to create a unified, trusted ecosystem for the entire industry. Currently, the site only features tests for a few CLIs, with Codex outperforming Claude Code.

A key feature of DPAI Arena is its "multi-track" architecture, which simulates real-world developer tasks. Instead of a single bug-fixing test, the platform includes separate tracks for analyzing pull requests, writing unit tests, updating dependencies, and checking compliance with coding standards.

Athas Code Editor
https://athas.dev/
Since the end of May 2025, a lightweight, free, and open-source code editor has been under development. This is not a VSC fork, but a new project built "from scratch" using Tauri, targeting all three platforms (Win-Linux-Mac) simultaneously, unlike Zed ;)

Currently at an early stage, but if everything goes according to the plan (roadmap), it could turn out to be very interesting. The idea is "Vim-first, AI-enhanced, Git-integrated". Git integration is already implemented; Vim mode will follow. It aims to be 100% customizable, with support for themes, language servers, and plugins.

Interview with 23-year-old developer from Turkey named Mehmed Ozgul
https://www.youtube.com/watch?v=Aq-VW3Ugtpo

The main goal is to create a unified, minimalist, and fast environment for developers that integrates tools that typically require running several separate applications. Basic Git functionality and functionality for viewing SQLite database content are already implemented.

Athas does not just have its own AI chat; it integrates with existing CLIs, such as claude-code, meaning it "intercepts" the AI assistant call from the built-in terminal and displays the response in a convenient graphical interface. This allows using familiar tools directly within the editor without opening a separate terminal.

https://github.com/athasdev/athas/blob/master/CONTRIBUTING.md
You can join the project via GitHub and influence its future.

Cerebras GLM 4.6
https://inference-docs.cerebras.ai/support/change-log
Cerebras announced the replacement of the Qwen3 Coder 480B model with the new GLM 4.6, which also applies to the Cerebras Code subscription ($50 or $200/month). The model is suitable for fast UI iterations and refactoring.

  • GLM 4.6 operates at 1000 tokens/second - this is fast, but still roughly twice as slow as Qwen3 Coder
  • Code quality approaches Claude Sonnet 4.5, making it competitive, but it easily gets confused on complex tasks
  • Fewer errors in tool calls compared to Qwen3, but sometimes switches to Chinese or cuts off

https://news.ycombinator.com/item?id=45852751
The discussion concluded that the replacement makes sense for Cerebras (GLM 4.6 is an open model with a clear roadmap), but for users, it's a sidestep rather than a step forward. Qwen3 was a better choice for many tasks.

Claude Code Resources
https://github.com/jmckinley/claude-code-resources
jmckinley has collected various guides in his repository on how to better provide context for Claude Code.

From his perspective, what truly matters:

  • CLAUDE.md - AI context for your project (most important!)
  • Context management - Keep conversations focused (<80%)
  • Planning is paramount - Think before generating code
  • Git safety - Feature branches + checkpoints

There are examples of agent configurations: tests, security, and code review.

MiniMax M2 and Agent
https://www.minimax.io/news/minimax-m2
MiniMax introduced a new model M2 and a product based on it — MiniMax Agent. The model is specifically designed for coding agents: it can plan steps and use tools (browser, code interpreter, etc.). It has 229 billion parameters, of which 10 billion are active, and a context window of 200 thousand tokens.

The main idea is to find a balance between high performance, low price, and high speed. The model is fully open source.

https://www.youtube.com/watch?v=dHg6VrDjuMQ

In addition to official information, practical tests and reviews confirm that MiniMax M2 is an extremely powerful model, one of the best open-source models for programming to date. The model successfully coped with creating an operating system simulation with working applications, such as Paint and a terminal, and generated creative websites with unique styles and interactive elements.

At the same time, M2 demonstrated the presence of ethical limitations, refusing to create a site on a fraudulent topic, and could not cope with an overly complex task, such as a PC assembly simulator, which indicates its current limits.

https://agent.minimax.io/
MiniMax Agent online has two modes: Lightning Mode: For quick and simple tasks (answering questions, light coding). Pro Mode: For complex and long-term tasks (deep research, software development, report creation). You can only log in via Google. There is integration with Supabase and an MCP catalog. There are applications for iOS and Android.

Pro Mode is temporarily free, and the API is also temporarily free (until November 7). I did not find anything on the website about code privacy control.

Github Universe 25
https://github.com/events/universe/recap
https://github.blog/news-insights/company-news/welcome-home-agents/
Announced Agent HQ - a future open platform that will allow developers to manage, track, and customize any AI agents (from OpenAI, Google, Anthropic, and others) in one place. Mission Control is a unified interface in GitHub, Mobile, CLI, and VS Code for managing agent operations.

GitHub Copilot received integration updates into workflows. Now it can be assigned tasks from Slack, Microsoft Teams, and other tools - it will use discussion context to perform work.

https://github.blog/changelog/2025-10-28-custom-agents-for-github-copilot/
https://github.blog/changelog/2025-10-28-github-copilot-cli-use-custom-agents-and-delegate-to-copilot-coding-agent/
Custom agents can be defined using a Markdown configuration file in the .github/agents folder of your repository. These allow you to define agent "personas" by specifying instructions, tool selections, and Model Context Protocol (MCP) servers. Configured agents can be invoked from the Copilot CLI using the /agent command.

https://github.blog/changelog/2025-10-28-new-public-preview-features-in-copilot-code-review-ai-reviews-that-see-the-full-picture/
Also introduced is an "agent-powered" code review, where Copilot, in combination with CodeQL, automatically finds and fixes security vulnerabilities. For teams, GitHub Code Quality is a new feature for analyzing code quality, reliability, and maintainability across the entire organization.

For VS Code, a new Plan Mode has been announced, which allows creating a step-by-step plan for task implementation before writing code. Finally, there is support for the AGENTS.md context definition standard.

Cursor 2.0
https://cursor.com/changelog/2-0
https://cursor.com/blog/composer
A significant update to one of the main AI coding tools. Cursor decided to respond to Windsurf (by the way, they also updated their SWE model to 1.5) and also created its own model specifically for software development. They named it "Composer" and claim it's 4 times faster than models with similar intelligence, but I think this is just to pay less to external providers.

The main novelty is the ability to run up to eight agents simultaneously (Multi-Agents) and a new interface for managing these agents. Each operates in an isolated copy of the code, preventing conflicts. A voice mode for agent control has appeared.

https://www.youtube.com/watch?v=Q7NXyjIW88E

Browser and isolated terminal (sandbox) features have exited beta. Enterprise clients received extended security control, including isolated terminal settings and an audit log to track administrator actions.

https://news.ycombinator.com/item?id=45748725
Community reaction is mixed but very active, with a clear division between supporters and skeptics. Supporters emphasize that the overall experience ("flow") is unparalleled, as it allows staying focused and in the development flow, and call Cursor the only AI agent that feels like a serious product, not a prototype. The new Composer model is praised for its exceptional speed.

Some complain that requests "hang" or the program crashes, especially on Windows. Several commentators noted that due to reliability issues, they switched to Claude Code, which proved to be "faster and 100% reliable."

There is also skepticism about lack of transparency: the company is criticized for vague graphs without specific model names and for using an internal, closed benchmark (Cursor Bench) to evaluate performance. Many want to know exactly what model underpins Composer (whether it's a fine-tuned open model), but developers evade a direct answer.

ForrestKnight on AI Coding
A guide on how to effectively and professionally use AI for writing code, as experienced developers do.

https://www.youtube.com/watch?v=5fhcklZe-qE

For complex planning, use more powerful models, and for code generation, use faster and cheaper ones. Do not switch models unnecessarily within the same conversation.

AI can quickly analyze other people's code or libraries, explain architecture, and draw component interaction diagrams.

  1. Preparation. At the beginning of the work, use AI to analyze the entire project and build a context description for it. Create files with rules (global for all projects and specific to a particular one). Specify your technology stack there (e.g., TypeScript, PostgreSQL), standards, branch naming conventions, etc.
  2. Specificity. At the start of a new chat, indicate which files need to be changed and which code to pay attention to. Write in detail, for example, "Add a boolean field editable to the users table, expose it via the API, and on the frontend, show the button only if this field is true." Add logs, and error screenshots.
  3. Manage. AI first creates a detailed step-by-step implementation plan. You review, correct, and only then give the command to generate code. You cannot blindly trust its choices.
  4. Edit. Analyze the generated code. It is necessary and possible to manually edit and refine it to a high quality. Ask why AI chose a particular solution and what the risks are.
  5. Team of Agents. You can launch one agent for writing code, a second for writing tests, and a third for reviewing the first agent's code.
  6. You can give Git commands in natural language, such as "create a branch for the release and move bug fixes there."

Kimi CLI
https://github.com/MoonshotAI/kimi-cli
https://www.kimi.com/coding/docs/kimi-cli.html
A new terminal coding agent from Chinese Moonshot AI. Written in Python. Currently in technical preview. Only Kimi or Moonshot API platforms can be used as providers. https://www.kimi.com/coding/docs/ - there are tariff plans with musical names for 49 / 99 / 199 yen per month.

Interestingly: similar to Wrap, you can switch between the agent and a regular terminal. Supports ACL, meaning it can work inside Zed (which, by the way, finally released a Windows version). But Kimi CLI itself does not support Windows, only Mac and Linux for now.

Cline CLI
https://docs.cline.bot/cline-cli/overview
https://cline.ghost.io/cline-cli-return-to-the-primitives/
Cline CLI Preview is presented as a fundamental "primitive" that operates on a single agent loop of Cline Core (which uses a well-known extension). It is independent of model, platform, or runtime environment. This is the basic infrastructure upon which developers can build their own interfaces and automated processes.

Instead of developing complex mechanisms (state management, request routing, logging) from scratch, teams can use Cline as a ready-made foundation. Also currently only macOS and Linux.

Claude Code on the Web
https://www.anthropic.com/news/claude-code-on-the-web
A response to the popularity of Google Jules. The online service allows delegating several tasks to Claude Code in parallel from the browser. A new interface is also available as an early version in the mobile app for iOS. Currently in beta testing and available for Pro and Max plans.

Users can connect their GitHub repositories, describe tasks, after which the system will autonomously write code, tracking progress in real-time and automatically creating pull requests. Each task is executed in an isolated environment ("sandbox") to protect code and data.

https://www.youtube.com/watch?v=hmKRlgEdau4

Claude Haiku 4.5
https://www.anthropic.com/news/claude-haiku-4-5
The updated Haiku model, known for being fast and cheap, now matches the code generation performance of the previous-gen Sonnet 4, while being twice as fast (160-220 tokens/sec) and three times less expensive.

Most will use an architectural approach: using a smarter model (e.g., Sonnet 4.5) as an "orchestrator" that breaks down a complex problem into smaller subtasks. These subtasks are then executed in parallel by a "team" of several Haiku 4.5s.

Haiku 4.5 appears to make code changes significantly more accurately compared to GPT-5 models.


Skills for Claude Models
https://www.anthropic.com/news/skills
https://simonwillison.net/2025/Oct/16/claude-skills/
Essentially, "Agent Skills" are a folder containing onboarding, instructions, resources, and executable code. This allows Claude to be trained for specialized tasks, such as working with internal APIs, or adhering to coding standards. Integrated into all Claude products, a new /v1/skills API endpoint has appeared for management. In Claude Code, they can be installed as plugins from the marketplace or manually by adding them to the ~/.claude/skills folder.

Simon Willison believes the new feature is a huge breakthrough, potentially more important than the MCP protocol. Unlike MCP, which is a complex protocol, a Skill is just a folder with a Markdown file containing instructions and optional scripts. This approach doesn't invent new standards but relies on the existing ability of LLM agents to read files and execute code, making it incredibly flexible and intuitive. Since they are simple files, they are easy to create and share.

https://www.youtube.com/watch?v=kHg1TfSNSFI

Compared to MCP, Skills have a key advantage in token efficiency: instead of loading thousands of tokens to describe tools, the model reads only a brief description of the skill, and loads the full instructions only when needed.

https://news.ycombinator.com/item?id=45607117
https://news.ycombinator.com/item?id=45619537
Many commentators note that Skills are essentially just a way to dynamically add instructions to the model's context when needed. Their proponents say that this simplicity is precisely the genius. Skills represent a new paradigm for organizing and dynamically assembling context. Everyone generally agrees that this is a more successful and lightweight alternative to MCP, which saves us from context overload and consuming thousands of tokens.

Users have noticed that Skills are essentially a formalization of an existing AGENTS.md (or CLAUDE.md) pattern, where instructions for an agent are collected in one file, telling it where to look when something is needed. But Skills make this process more standardized, organized, and scalable. The LLM knows the standard and can help in generating a Skill.

Evaluating AI Assistants
https://www.youtube.com/watch?v=tCGju2JB5Fw

Three developers (Wes, Scott, and CJ) discuss and rank various tools, sharing their own experiences, evaluating interface usability, the quality of generated code, and the unique capabilities of each tool.

Services such as Replit and Lovable received specific criticism for their aggressive and sometimes opaque marketing strategy involving influencers. For serious development, CLI tools or IDEs are more suitable, while browser-based solutions are ideal for quick experiments.

Ultimately, Claude Code, Open Code, and ChatGPT received S-tier. Claude Code is praised for its ability to strictly follow instructions and plan work, Open Code — for its openness and the ability to use custom API keys, and ChatGPT remains indispensable for quick queries without the context of the entire project. Most other tools were rated as average — they are useful but do not offer unique advantages.


Vibe Coding Ranking
https://www.youtube.com/watch?v=ebacH8tdXug

This video is a humorous response to the previous one, and the author immediately warns that his ranking should not be taken seriously. Theo examines tools not by technical capabilities but by so-called "vibe coding." The main priority is how much it allows you to create something without looking at the code or understanding technical details.

The author jokes that "true vibe coders" avoid seeing code. Therefore, Cursor, VS Code Copilot, Open Code, and Codex receive the lowest rating because they are assistants for real developers who require active participation, writing, and reviewing code. They destroy the "vibe."

The highest rating was given to a platform that maximally abstracts from code, and that is V0 from Vercel — it has a simple interface, replaces technical terms (e.g., "fork" with "duplicate"), and offers powerful integrations that can be configured with a few clicks without any knowledge of APIs.

Surprisingly, Claude Code received an A-tier for its ability to perform tasks autonomously, hiding the technical implementation from the user.

Almost all modern AI coding tools have added the Claude Sonnet 4.5 model.

Cursor 1.7
https://cursor.com/changelog/1-7
Responding to Kiro and Github SpecKit, Cursor has redesigned its Planning mode; it now creates descriptions, plans, and task lists before starting.

https://www.youtube.com/watch?v=WInPBmCK3l4

The terminal finally runs commands in a separate sandbox, and Windows PowerShell interaction has been fixed. The agent can also open a Browser and take screenshots, and has learned to read images from disk. The OS taskbar now shows a list of agents and what they are doing.

Kiro v0.3.0
https://kiro.dev/changelog/spec-mvp-tasks-intelligent-diagnostics-and-ai-commit-messages/
Kiro has finally replaced separate limits for two modes with unified points that count everywhere, now it works like Windsurf. Sonnet 4.5 has been added, but it's strange that the coefficient is like Sonnet 4, only Auto mode is 1 credit. They still haven't made it possible to drag files or folders into the chat area to reference them as context, only via hashtag.

Codex Github Action
https://github.com/openai/codex-action
OpenAI announced at DevDay[2025] that Codex has exited beta — now stable with enhanced capabilities. There's a Codex Github Action, a built-in widget gallery, and MCP support. A Codex SDK is available for integration.

OpenAI is also transforming ChatGPT into an "operating system" for AI agents. You can now write your own applications and agents inside ChatGPT, connect payments, authorization, and metrics.

Gemini CLI Extensions
https://blog.google/technology/developers/gemini-cli-extensions/
Google has launched a separate website for Gemini CLI https://geminicli.com/ with a documentation section. Extensions for Gemini CLI are a new feature that allows developers to customize and connect various programs, integrating services like Dynatrace, Figma, Stripe, Snyk, and others.

The system is open, allowing anyone to create their own extensions, and Google has already released a set for integration with its products (Google Cloud, Firebase, Flutter).

Jules CLI
https://jules.google/docs/changelog/
The Jules cloud agent has a rather imperfect web interface from which it is unclear what it is currently doing - but it gives as many as 15 tasks a day without paying for tokens. Now you can install npm install -g @google/jules locally, all commands jules help. Windows is not supported.

The CLI allows you to create tasks, view active sessions (jules remote list), and monitor from the terminal in a convenient visual format. It supports scripting by combining with utilities such as gh, jq, or cat.

There is an option to take code from an active Jules session and apply it to a local machine for immediate testing of changes without waiting for a commit to GitHub.

ALSO

  • From September 30, 2025, Jules can learn from interaction: save settings, prompts, and corrections.
  • From September 29, 2025, you can precisely specify to Jules which files to work with for any task.
  • From September 23, 2025, Jules can read and respond to comments in pull requests.

Code Mode
https://blog.cloudflare.com/code-mode/
A new approach called "Code Mode" improves AI's interaction with external tools. Instead of forcing Large Language Models (LLMs) to directly "call tools" via the MCP protocol, which is unnatural for them, it's proposed to ask them to write TypeScript code that accesses these tools via an API.

The system automatically converts tools available via the MCP protocol into a clear TypeScript API with documentation. The AI-generated code is executed in a secure isolated "sandbox." The MCP protocol itself remains important, as it provides a standardized way to connect to services, obtain their descriptions, and securely authorize, allowing the system to manage access without directly involving the AI.

This method is much more effective because LLMs are trained on vast arrays of real code and are better at writing it than using specialized, artificially created commands.

The technological basis for this "sandbox" is the Cloudflare Workers platform, which uses lightweight and extremely fast V8 isolates instead of slow containers. This ensures high efficiency and security: the code is completely isolated from the internet and can only interact with permitted tools.