CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

Amazon Kiro v0.2
https://kiro.dev/pricing/
Kiro IDE is still in preview development, but they have already launched paid subscriptions. Perhaps this is because in v0.1 they gave a lot of free tokens for Sonnet 4, and now registration is closed. For new users, there is only a waiting list.

Only 50 vibe requests per month will be free; currently, all new users have been given 100 vibe and spec requests, which must be used within 2 weeks. This is very unfortunate; my simple test project took 40 spec requests in just 2 hours. And 125 spec requests are now $20/month.

Furthermore, compared to Cursor, there is almost no functionality or variety of models here yet, and it feels 2-3 times slower. I don't know who would pay so much for this now.

LLMs Won't Replace Programmers
https://zed.dev/blog/why-llms-cant-build-software
A provocative post by Konrad Irwin from ZED. He says that LLMs are quite good at generating code, updating it when problems are found, and running tests. But human engineers don't work like that: they build mental models of what the code does and what it should do - then perform updates based on that.

AI agents do not create a mental model of the project and simply generate code that resembles the best code they have learned - which is why they get confused, lose context, invent unnecessary details, assume that the code they wrote works, cannot decide what to do when something doesn't work, and can simply delete everything and start over without a deeper understanding of the problem.

The author doubts this can be fixed. LLMs are suitable for simple tasks where requirements are clear, and where they can perform it "once" and run a test. A human engineer remains the "driver" to ensure the code truly does what is required of it.

Discussion
https://news.ycombinator.com/item?id=44900116
Comments show very polarized views, but most support the author's main thesis. LLMs focus on textual patterns, not deep understanding; they can "hack" tests to pass them instead of fixing real problems.

Some believe that well-tuned LLMs can perform at the level of or even better than a junior developer for certain specific, simple, isolated tasks.

Many people complain that using LLMs in IT requires constant and thorough supervision of the agent – extremely detailed "guardrails". The industry is now looking at skills in problem formulation and high-level circumvention of LLM shortcomings. This "manual work" is more tiring than classically writing code oneself.

There is a strong split in opinions on whether LLM technology continues to develop rapidly or has reached a plateau and requires new architectural breakthroughs.

AGENTS.md
https://agents.md/
A README equivalent for agents. Finally, a standard for a markdown file where we write context and additional instructions for an AI coding agent. This is a simple text file. It will also be useful for human developers, because most people barely write anything in their READMEs anymore, but that won't work here. This is because LLMs require greater detail and precision.

Previously, if I used several AI coding tools in a repository, I had to have a separate file for each, and also think about how to synchronize them. Now, only AGENTS.md in the project root is sufficient. It is possible to create separate files for subdirectories.

Already supported: OpenAI Codex, Amp, Jules by Google, Cursor, Factory, RooCode, Aider (via config), Gemini CLI (via config), Kilo Code, OpenCode, Phoenix, Zed.

As the quality of generated code increases, development moves from manual writing to task organization and AI agent management. The human role transforms into a "project manager" who sets high-level goals, breaks them down, and guides agents that are stuck or need feedback.

There is a growing demand for mobile applications to "code on the go".

OMNARA
https://omnara.com/
A mobile iOS client on top of AI agents (Claude Code, Cursor, GitHub Copilot, etc.). It allows you to run and see the work of agents from anywhere, even from the forest or the beach. Instant notifications when agents need help and the ability to respond to it.

You need to install the Omnara SDK or wrapper into your working environment.

While they are supported by the Y Combinator incubator, their use is free. Their business model is quite fragile; for example, Anthropic (Claude developer) might add similar functionality to their apps, or someone might create a Telegram bot.

Who currently offers TOP models for free
https://github.com/inmve/free-ai-coding
This repository contains a table of available projects that provide free, limited access to the best models. For example, Rovo Dev CLI gives 5M tokens per day on Claude Sonnet 4. Gemini CLI provides 100 requests per day on the Gemini 2.5 Pro model. It also states whether a credit card number is required.

Unfortunately, the author does not specify whether we automatically grant permission to these companies and projects to copy our code.

Article "Code with AI on a Minimal Budget/Free"
https://wuu73.org/blog/aiguide1.html
https://wuu73.org/blog/aiguide2.html
https://wuu73.org/blog/aiguide3.html
The author advocates a "Free AI Buffet" strategy – a set of tools that combines the best free versions of large models (Claude 4 on Poe.com, GLM 4.5, Kimi K2, Qwen3, Gemini 2.5 Pro via AI Studio, Grok, Deepseek, etc.). Each model has its advantages: one is fast, another is better at planning, another at testing, and a third at code generation.

Uses Cline, but Roo Code can also be used. For planning and brainstorming, he uses the smartest free models on the web (Gemini 2.5 Pro, o4-mini, Claude 4, GPT-5, o3).

His workflow:

  1. Launch AI Code Prep GUI (https://wuu73.org/aicp/) → prepare context.
  2. Pose a query in a web chat (Gemini, o4-mini, Claude) – planning changes.
  3. Generate a prompt for the Cline agent → copy the prompt.
  4. Transfer to Cline (GPT-4.1) – execution.
  5. If generation difficulties arise, change models in Cline.

Discussion https://news.ycombinator.com/item?id=44850913. Many users questioned the stated "freeness," pointing out that it is achieved at the cost of sharing personal data and code with companies for model training. Some consider this an acceptable price for access to advanced technologies, especially for those who cannot afford paid subscriptions.

People actively shared their favorite tools: aider, slupe, SelectToSearch, LM Studio, Continue.dev, Zed, OpenRouter, Chutes.ai, Cherry.ai, Rovodev CLI, CodeWebChat, Windsurf, Amazon Q Dev.

Google Jules
https://jules.google/docs/usage-limits
On August 6, Google Jules officially moved out of beta after two months of testing. Along with this, Google introduced usage limits and pricing plans, and upgraded the underlying model to Gemini 2.5 Thinking.

The default Jules environment now comes with Playwright for frontend testing: it can run tests on web applications and return results as screenshots. Jules also supports image inputs via public URLs.

Since August 15, the Jules VM has received important upgrades with disk space increased to 20GB. And the ability to save current progress to the repository at any time has been added.

On the intelligence side, Jules has become more proactive:

  • the agent can ask clarifying questions before executing tasks,
  • it can independently search the web for documentation, code snippets, or other relevant content.

A major addition is the new critic co-agent. Unlike a simple linter or test suite, it reviews generated code for hidden bugs, missed edge cases, and inefficiencies, then feeds that feedback back into Jules.

1M Context for Sonnet 4
https://www.anthropic.com/news/1m-context
The most popular model for code generation, Claude Sonnet 4, now supports up to 1 million tokens of context, a 5x increase that will help Claude Code understand larger codebases. Also, for requests over 200k tokens, input and output prices increase.

Talks from Code w/ Claude
https://www.youtube.com/watch?v=gv0WHhKelSE

Cal Reub from Anthropic explains how to best work with Claude Code. He is one of the main developers of Claude Code, involved in prompting, system prompts, tool descriptions, and their evaluation.

Unlike other approaches, Claude Code does not index or embed the entire codebase. Instead, it explores and understands it like a human, using "agentic search" (tools like glob, grep, find). It can iteratively refine its search queries.

Best practices: claude.md placed in the working directory or home directory for important context; when needed, use /clear or /compact to summarize the working chat context; configure automatic confirmation for safe commands and use Shift+Tab for automatic action acceptance; discuss the plan of action with the agent before execution and confirm it; use screenshots.

Clarification from Q&A: You cannot have multiple claude.md files in one directory, but you can refer to other files using @-syntax.

Advanced techniques: Run 2-4 instances of Claude Code simultaneously for parallel work; quickly stop the agent for incorrect actions via escape, double-tapping allows you to return to the conversation and correct instrutions; headless use, for example, in GitHub Actions.

Clarification from Q&A: Currently, the best way to share context between multiple agents is by writing information into a shared Markdown file.


Orta Therox on using Claude Code
https://blog.puzzmo.com/posts/2025/07/30/six-weeks-of-claude-code/
Compares it to the moment photography arrived and replaced manual drawing. Claude Code allows experimenting with ideas that would previously have been too time-consuming. It has drastically simplified the process of prototyping games.

Lists numerous large, labor-intensive tasks that he performed independently in 6 weeks - now they can be done as background "side projects".

Tips for success: monorepositories so the agent knows the entire project code; using mainstream, mature, well-documented technologies and frameworks; standard CRUD applications; not very large or old codebases.

Advises not to worry about choosing different models (Sonnet, Opus, Gemini CLI, etc.). He successfully uses Claude Code with a standard settings. If the agent "gets stuck," it likely means the user has poorly formulated the task, not a problem with the tool.

The post sparked lively discussion on https://news.ycombinator.com/item?id=44746621 - recently, many developers have expressed surprise and admiration for Claude Code's capabilities, calling it a "game-changer" and a "gift from heaven." The tool removes the burden of remembering details, routine work, and the fear of a blank page.

Comparison. Some consider Amp better than Claude Code. Gemini CLI is generally rated as significantly worse. Cursor has mixed reviews. GitHub Copilot is primarily used for autocomplete, and its agentic part is considered less useful.

Tips. The agent works great with the terminal, so it can be used to perform Git operations, Docker commands, run linters, compilers, testing, and other CLI tools. Encourage the agent to write tests (TDD) and use them to validate changes. It's possible to configure specialized sub-agents to check code for standard compliance, excessive engineering, etc. Even possible to use Claude Code SDK for integration into CI/CD or other automated processes.

People express serious concerns about the use of such tools by "juniors" who might generate slow, unsafe, or simply terrible code without understanding it. Companies might replace them with "seniors" using AI tools, potentially creating a "gap" in the training of future generations of developers. Some believe this could lead to mass unemployment in countries with cheaper labor.

Concerns are also raised about the high cost of Claude Max subscriptions and their inaccessibility to a significant portion of the global developer community.

Cursor Agent CLI
https://cursor.com/blog/cli
Due to Anthropic once choosing the path of creating a terminal programming agent, and the Claude Code project being successful and working well, a new trend emerged to create such variations of coding assistants.

OpenAI replicated this by creating Codex, although it wasn't as successful, but with the release of GPT-5, it began to work much better. Google made its version, which was copied by Chinese Qwen. And so on.

https://www.youtube.com/watch?v=Nj1MZxYhQIs

So Cursor couldn't resist and, in addition to its VSCode clone, also launched a CLI. It is currently in beta testing. For some reason, the GPT-5 model in Cursor IDE works so-so at launch, but it works well in the CLI. You can also choose any model available at your account's subscription level, unlike Claude Code, which is tied only to Anthropic.

Cursor also copied the bad – on Windows it only works via WSL, i.e., built-in Linux. And although they explicitly write "available everywhere" in their marketing material, Windows users (a paltry 73% of computer users) cannot run it directly.

New Models from OpenAI

GPT-OSS for Self-Deployment
https://openai.com/index/introducing-gpt-oss/
OpenAI, a company known for its powerful but mostly closed GPT models, recently introduced a new family of models with open weights called GPT-OSS (Open Source Software). These can be run locally or on your own infrastructure.

This is a significant step, as GPT-OSS is OpenAI's first "open" release since GPT-2. The models have an MoE architecture. They are designed for complex reasoning tasks, agentic tasks (Python code execution and web browsing), and general-purpose use.

The models are deployed by many providers.
https://openrouter.ai/openai/gpt-oss-120b
After the release of GPT-OSS, the developer community began testing their capabilities. And although, according to OpenAI's charts, these are very cool models, they show average results, especially the smaller 20b model, when it comes to code generation. Perhaps their fine-tuned versions will appear in the near future.


GPT-5 for Beautiful Interfaces
https://openai.com/index/introducing-gpt-5/
The new flagship model, a new standard for others. It can regulate its efforts and work with code for a long time in an agentic manner.
https://cursor.com/blog/gpt-5
At the presentation, one of the Cursor developers said that this is now the main model for their application and can be directly used in your work, not just for test projects.

https://www.youtube.com/watch?v=BUDmHYI6e3g

https://openai.com/index/introducing-gpt-5-for-developers/
It demonstrates significant improvements in front-end generation and understanding large repositories. It more often creates more beautiful and responsive websites, applications, and games. The model is currently available in almost all major AI programming products: Cursor, Windsurf, Vercel, JetBrains, Factory, Lovable, Gitlab, Augment Code, GitHub, Cognition.

It should become a better assistant for programmers and less annoying and reluctant to work.

model GLM-4.5
https://z.ai/blog/glm-4.5
The new agentic model GLM-4.5, developed by the Chinese company Z.ai (formerly Zhipu AI), significantly improves the capabilities of complex code generation presented in the previous version of GLM-4. The model can now create complex things such as interactive mini-games and physics simulations in HTML, SVG, Python, and other formats.

It has a 355b MoE size, with 12b active parameters. It demonstrates very good results in solving real engineering problems and creating interfaces. It demonstrates reliable mastery of multi-stage tool use and dynamic code generation – surpassing previous leaders among open models, Kimi K2 and Qwen3-Coder.

On the chat https://chat.z.ai/ there is a separate button for creating full-stack applications. The announcement shows a Pokémon catalog as an example. The chat is free; API token prices are cheaper compared to the average.

horizon-alpha horizon-beta
https://openrouter.ai/openrouter/horizon-beta
This week, an unknown provider https://openrouter.ai/provider/stealth released the Horizon model for testing, first the alpha, and now the beta version. Usage is free. The model generates interfaces well and rose to 4th place in the "Programming" ranking on OpenRouter.

Those who studied the tokenizer note that it resembles the one used by Qwen; it might be the future Qwen4.

Charm Crush CLI
https://charm.land/
https://github.com/charmbracelet/crush
Another intelligent coding assistant that integrates into the terminal. For models that support such a feature, "Agent" mode is enabled. It's possible to switch models at any time between responses. Chats are saved as sessions and can be revisited.

Using catwalk, it works with models from Anthropic, OpenAI, Groq, OpenRouter, Google Gemini, AWS Bedrock, Azure OpenAI, and you can add your own via OpenAI/Anthropic-compatible APIs. Currently, it does not support connecting subscription plans, from Anthropic, for example — only API keys with pay-per-token.

It uses LSP (Language Server Protocol) for code understanding, similar to IDEs (syntax highlighting, autocompletion, etc.), and shows diffs. Supports MCP.

https://www.youtube.com/watch?v=3crlgrd86r0

The main advantage is the interface, which is made at the highest level here: convenient and clear what is currently happening. The interface renders correctly even when resizing the terminal window, which other tools can't do.

It's great that Ctrl+p always opens the menu. But I miss a hotkey to "regenerate response."

Lately, I've been seeing more and more projects on GitHub that help launch multiple versions of Claude Code and coordinate their work.

Crystal - Multi-Session Claude Code Manager
https://github.com/stravu/crystal
Crystal is an independent project by stravu – a desktop application (built with the Electron framework) that allows you to work with many Claude Code instances simultaneously. Each agent session runs in an isolated Git worktree. This ensures that changes made by the AI do not affect the main code until the developer decides to integrate them.

https://www.youtube.com/watch?v=vGwxhBR81zY

Crystal works with any technology stack (Python, TypeScript, Electron, etc.) and integrates with existing projects. It's built around Claude Code because the developer considers it the best and most cost-effective agent on the market, especially for intensive token usage. Using Crystal significantly speeds up the development process – Crystal itself was created in 2-3 weeks.

Claude Code Subagents
https://docs.anthropic.com/en/docs/claude-code/sub-agents
In response to requests, Anthropic itself has added the functionality to run multiple sub-agents. You can create agents for code security review, test generation, database migration, etc. – configured using simple Markdown files that can be stored globally or at the project level. Anthropic recommends generating initial versions of sub-agents using Claude itself.

Each sub-agent works with its own isolated context and set of tools. Instead of one universal AI assistant, developers get a team of specialized agents, each with their own expertise.

Qwen3-Coder
https://qwenlm.github.io/blog/qwen3-coder/
The Chinese Qwen team, behind the development of advanced AI models, announced the release of Qwen3-Coder. The Qwen3-Coder-480B-A35B-Instruct model uses a Mixture-of-Experts architecture with 480 billion parameters (of which 35 billion are active), supports a context window of up to 256k tokens out-of-the-box, and can be extended to 1 million tokens. Other sizes are expected to be released.

During the post-training phase, the Qwen team scaled up reinforcement learning for code (Code RL), focusing on real-world tasks where execution success is easily verifiable. Additionally, they introduced Long-Horizon Reinforcement Learning (Long-Horizon RL or Agent RL) to teach the model to solve complex engineering problems, such as SWE-Bench, through multi-step interaction with the environment, including planning, tool use, and feedback acquisition.

The model can integrate with Claude Code and Cline.

https://qwenlm.github.io/blog/qwen3-coder/
For interaction with Qwen3-Coder, the developers introduced a command-line tool CLI – Qwen Code, which is essentially a Chinese copy of Gemini Code.

We get performance at the level of Claude 4 Sonnet, only significantly cheaper.
https://openrouter.ai/qwen/qwen3-coder

Zed without AI
https://zed.dev/blog/disable-ai-features
Finally – it's getting annoying that AI is getting into all coding tools :) Now Zed allows you to completely disable these unnecessary features via the settings.json file.

Many companies prohibit the use of AI tools. Some professionals have ethical, philosophical, or environmental reservations about using AI and prefer full control over their code and workflow without unwanted suggestions or AI interference.

GitHub Spark
https://github.blog/changelog/2025-07-23-github-spark-in-public-preview-for-copilot-pro-subscribers/
MS has released GitHub Spark, which allows you to turn an idea into a ready, deployed application in a matter of minutes. It creates a full repository on GitHub with GitHub Actions and Dependabot, ensuring full synchronization.

Spark will independently build a complete application, including both frontend and backend. All this works on AI models such as Claude Sonnet 4. Hosting, deployment, and integration with GitHub authorization – everything is included "out of the box". It is possible to integrate requests for models from OpenAI, Meta, DeepSeek, and others, without the need to manage API keys.

Currently available in public preview for Copilot Pro+ subscribers.

Terragon
https://www.terragonlabs.com/
Claude Code in the cloud, which works as parallel background agents in a separate sandbox, but with access to GitHub repositories. Start or manage tasks from the web dashboard, Terry CLI, GitHub comments, or your phone — work from anywhere.

AWS Kiro v0.1.0 (preview)
https://kiro.dev/
A new "agent-based IDE" (like Cursor and Windsurf) from Amazon Web Services (AWS), presented on July 14, 2025. Built on Visual Studio Code. Pleasant visual theme. Supports MCP. Currently uses only two Claude Sonnet models 4.0 and 3.7.

Available for free in preview, and includes limits that allow you to try the product without interruptions. In the future, various pricing plans are planned: Free, Pro, and Pro+.

https://www.youtube.com/watch?v=Z9fUPyowRLI

The key feature is a mechanism that helps transition from rapid prototyping ("vibe coding") to creating full-fledged, production-ready code ("viable code"). Kiro creates detailed structured artifacts: requirements documents (requirements.md), design documents (design.md), and task lists (tasks.md).

There are also Hooks, which configure event-driven automation, such as updating documentation or generating tests when a file is saved. And Steering, which allows defining description files to guide agent behavior, adding context, standards, and desired workflows.


Kiro's developer, NathanKP, actively interacts with users in the discussion https://news.ycombinator.com/item?id=44560662. Many people express disappointment that Kiro is another VS Code fork – reporting high CPU and RAM consumption.

Some believe that Kiro is "late to the party," as the market has already shifted to terminal agents like Claude Code. I also thought it would be a console agent.

Currently, for me (compared to Cursor and Windsurf), it runs quite slowly. Also, I haven't found checkpoints – you can only undo individual agent edits or last batch.

Grok 4
https://openrouter.ai/x-ai/grok-4
Elon Musk's xAI company has released the next version of its large language model. In their presentation, they used one of the ways to manipulate statistics, by cropping one axis to make the difference seem much more significant. Possibly other descriptions of the model are not straightforward either. But this does not negate the fact that both the model and its agentic mode with 4 competing agents show outstanding results (and pricetag:).

The context window size of 256k tokens is average by current standards. It handles confusing questions and planning very well. Judging by reviews, in practical programming tests it shows average results. It appeared in Cursor's list of available models, but not yet in Windsurf.

Kimi K2
https://moonshotai.github.io/Kimi-K2/
Unexpectedly, a bigger piece of news for programming was the appearance of a new open model from Moonshot AI. The model is not "thinking", but simply a Mixture-of-Experts (MoE). It can work as an agent because it knows how to use tools and functions.

The model can be connected via https://github.com/sst/opencode or any VSC plugins compatible with openrouter. It is also deployed on Groq.

Context window 131K. In tasks where planning is not needed, but simply code generation, the model outperforms Claude Sonnet 4. On openrouter there are different providers.