Updates - CodeWithLLM

Code With LLM

CodeWithLLM-Updates

🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

Claudia
https://claudiacode.com/
A Graphical User Interface (GUI) for managing Claude Code - replaces terminal work with a convenient visual interface. The project is free, open-source for Windows, macOS, Linux. https://github.com/getAsterisk/claudia

Provides a overview of projects/sessions/agents, creation of custom AI agents, visual monitoring of costs (tokens/funds), built-in prompt editor (Markdown), chat version control (like Git).

SuperClaude v2
https://github.com/NomenAK/SuperClaude

The configuration framework has been updated, extending Claude Code with specialized commands (18 total), cognitive personas, and development methodologies. The 9 personas can now be launched via command-line flags (--persona-architect, --persona-security, etc.)

MCP integration with Context7, Sequential, Magic, Puppeteer is supported.

Trae Agent CLI
https://github.com/bytedance/trae-agent
In addition to their IDE Trae, Bytedance has also released an open-source agent that works in the terminal. It is currently in an alpha-experimental development stage and is more for testing and improvement.

They emphasize that their agent has a modular architecture. It constantly writes detailed logs and provides a short summary at each step. It supports OpenAI, Anthropic, Doubao, Azure, and OpenRouter APIs. It uses trae_config.json for configuration.

Amazon is also preparing something in the field of a CLI agent under the code name Kiro - we'll wait and see.

#claudecode #trae #autonomousagents

Build apps with Gemini
https://aistudio.google.com/apps
Inside Google's AI Studio there's a section where you can create simple AI applications using prompts. It's a quick tool for prototyping and testing ideas.

The environment uses the Gemini SDK, runs the app in the browser within a sandboxed iframe, and does not support Next.js, Svelte, Vue, or Astro. It also opens an editor with access to the code. There are examples, including music generation.

You can further deploy it on Cloud Run.

https://www.youtube.com/watch?v=YEzjVvE44R0

The first version of the app generates images with unreadable, distorted text. The author doesn't give up and directly in the AI Studio editor provides the artificial intelligence with additional instructions so that it first analyzes the provided information and then creates a higher-quality result. After this, the app starts generating significantly better creatives with a relevant image and correct text.

#google #gemini

MCP as a Universal Ecosystem
https://worksonmymachine.substack.com/p/mcp-an-accidentally-universal-plugin
The protocol doesn't judge — it just works with anything. Just like USB-C can be used to make toasters with HDMI output, and car cigarette lighters are now universal power sockets.

Thanks to the AI hype, MCP (Model Context Protocol) "accidentally" became a universal interaction system. Each new MCP server expands the capabilities of all applications. Analogy: LEGO with MCP bricks.

The main idea is that it's not just for AI; any application can use MCP.

#mcp

Claude Artifacts catalog
https://www.anthropic.com/news/build-artifacts
The key new feature is the ability to embed AI capabilities via the Claude API directly into Artifacts (prompt mini-apps), turning them into interactive applications.

Also, users now have a dedicated space in Claude Chat to view examples and organize their Artifacts. Created artifacts can be shared with others. The feature is available to users on free plans, as well as Pro and Max.

Roo Code 3.22
https://docs.roocode.com/update-notes/v3.22
Added the ability to use free requests to the Pro model with Gemini CLI - then maybe Google lawyers came, and that was it - this function was removed.

Now you can share tasks like this https://docs.roocode.com/roo-code-cloud/task-sharing

Added ~/.roo/rules/ where you can write global rules. They are loaded first, while project rules, if they exist, have priority.

#roo #claudeartifacts

Gemini CLI Agent
https://blog.google/technology/developers/introducing-gemini-cli-open-source-ai-agent/
Google is complementing its AI programming tools lineup - there is a cloud agent Jules, there is a plugin Gemini Code Assist, there is Stitch for UI design, and there is a cloud IDE Firebase studio.

Claude Code is gaining more and more popularity, so its interface and approach are starting to be directly cloned, which is what OpenAI, Atlassian (it works, but keeps deleting everything), and now Google (it freezes for me on authorization - i.e., it doesn't work at all in Ukraine region) are trying to do.

The project is open source, works on Windows, the code is on GitHub - written in TS, not Python. And the main thing is - there are color themes ;)

Capabilities:

Query and edit large codebases within and beyond the 1M context window.
Generate new programs from PDFs or sketches.
Automate tasks like pull request queries or complex rebase processing.
Use MCP tools and servers.

Integrates with Code Assist billing (currently 1000 free requests to Gemini 2.5 Pro per day) or you can use your own API keys.

The video states that the tool is "raw" - it repeats messages, formats code poorly, can endlessly respond "I believe the task is done." Sometimes it just freezes. It can load the CPU almost to 100%. Even for a very simple task (add a button that opens a panel), the tool used over 1M input tokens!

https://www.youtube.com/watch?v=VwOfvE_3Mtk

He watched the code and saw that the request length, tool calls, and response metadata (time, tokens) are logged. The request text itself (unless explicitly included), file contents, API response contents, or personal information are not logged.

but
https://github.com/google-gemini/gemini-cli/blob/main/docs/tos-privacy.md
Google will not use your code, prompts, or responses for training models only if you use a paid Gemini API key, a Code Assist license, or the Vertex AI Gen API.

The "Usage Statistics" setting, if enabled, allows Google to collect both anonymous telemetry (e.g., commands executed) and your prompts and responses for model improvement. Disabling this setting disables the collection of prompts/responses for model improvement, but Google may still use data as described in their terms for free services.

#autonomousagents #google

Atlassian Rovo Dev CLI
https://community.atlassian.com/forums/Rovo-Dev-AI-Agents-Beta-articles/Introducing-Rovo-Dev-CLI-AI-Powered-Development-in-your-terminal/ba-p/3043623
RovoDev is a new AI tool from Atlassian (creators of Jira, Bitbucket, Confluence), which is the answer to Anthropic's Claude Code.

https://www.youtube.com/watch?v=MjOJE7WbvfE

You can download the CLI from the official Atlassian blog post or the community page. You need to create a free Atlassian account. Installation is done using Windows PowerShell (for the x86-64 version). It works natively with Windows (unlike Claude Code which requires WSL). Authentication is done via the acli.exe rodev login command (you need to enter your email and API key).

The demo showed how the agent can analyze code (for example, a "prisoner's dilemma" simulator) and perform tasks like "optimize performance".

Key advantage: during beta testing, Atlassian provides 20 million free tokens daily. This is a significant advantage compared to Claude Code, which costs 5-10 dollars per use.

#rovodev #autonomousagents

Google Agent Update
https://jules.google/docs/changelog/

If you added a setup script, Jules now runs it serially.
Context. Jules reads and uses AGENTS.md if it exists in your repository.
More testing. Jules writes and runs tests more often on its own.
Agent punting significantly reduced. Strengthened the loop so Jules keeps moving forward.

#jules #autonomousagents

Lecture by Andrej Karpathy

https://www.youtube.com/watch?v=LCEmiRjPEtQ

1. Three Eras of Software

Software 1.0 👨‍💻: Traditional code (e.g., C++, Python) written by humans.
Software 2.0 🧠: Neural networks (model weights) trained on data.
Software 3.0 🤖: LLMs (ChatGPT etc.), where programs are natural language prompts (English).

"We are now programming computers in English - it's insane!"

2. Is LLM the New OS?

Analogy with Operating Systems:
- LLM = CPU 🖥️
- Context window = RAM
- Multimodality and tools = APIs
1960s of AI: Currently LLMs are expensive and run in the cloud (like mainframes), but will soon be local (like PCs).
LLM Problems: Hallucinations, "spiky intelligence" (super smart in one area, dumb in another), vulnerabilities.

3. How to Work with LLMs?

Partial Autonomy:
- Examples: Cursor (autocode assistant), Perplexity (AI search).
- "Autonomy slider": From suggestions to a full agent.
- GUI is critical: Visualizing changes speeds up verification.
Best Practices:
- Clear prompts → fewer errors.
- Keep the AI "on a leash" (too large changes are hard to verify).

4. Vibe Coding — Programming for Everyone

Now everyone can code just by describing tasks in English.
Example: Karpathy built an iOS app in a day without knowing Swift. But deployment and DevOps still require manual work (and that's a pain 😅).

5. The Future: Infrastructure for Agents

Need "Documents for AI":
- lm.txt instead of robots.txt — instructions for LLMs.
- Markdown documentation (like Vercel and Stripe).
- GitHub → Ingest (converts repository to text for LLM). https://gitingest.com/

6. Summary

We are in the 1960s of AI — everything is just starting.
Transition from "crutches" to agents will take years (don't believe the 2025 hype).
Iron Man suit analogy: Currently augmenting humans, but moving towards full autonomy. The future belongs to a hybrid of humans and AI!

Discussion on HN
https://news.ycombinator.com/item?id=44314423

LLM as a new programming paradigm

Arguments "FOR":

"English — the new programming language": This is a fundamental shift from deterministic, formal languages to probabilistic ones, allowing non-programmers to create software ("vibe coding").
A new tool: LLM is another tool in the developer's arsenal that complements, rather than replaces, existing approaches.
Dealing with uncertainty: Programmers have always dealt with non-determinism (API responses, user input), so working with LLMs is just an extension of this practice.

Arguments "AGAINST" (Skepticism):

Formal languages are an advantage, not a drawback: They provide precision, reliability, and verifiability, which are the foundation of engineering. Abandoning them in favor of natural language is a step back towards "magical thinking."
LLM non-determinism is dangerous: Unlike an API error, an LLM can produce "garbage that looks like gold" — a plausible but completely incorrect answer that is difficult to detect.
Hype vs. reality: Many believe that the capabilities of LLMs are greatly exaggerated, comparing the current excitement to the "cryptocurrency bubble."

The developer's role is transforming:

From writing code line by line to "context wrangling" and prompt engineering.
The human becomes the verifier and curator — the one who provides a fast "feedback loop" (generation → verification → correction).
Some fear that this will devalue the profession, turning engineers into "AI QA testers." Others see it as an opportunity for experts from other fields to create their own tools.

Practical tools and challenges

Structured Outputs: Using JSON mode is a "superpower" that makes LLM output predictable and suitable for programmatic processing. This is often an underestimated tool.
Determinism vs. Chaos: LLMs are not entirely random. At temperature 0, they are deterministic but "chaotic" (small changes in input can lead to large changes in output).

The transparency of Claude Code token usage is problematic.

https://github.com/Maciek-roboblog/Claude-Code-Usage-Monitor
A project that solves this problem by providing a clear picture of token and time usage. It uses local Claude Code logs (~/.claude/projects/*/*.jsonl).

Data is updated every 3 seconds, the tool calculates when tokens might run out based on the current usage rate. It works with Pro, Max5, and Max20 plans and can automatically detect your current plan. It allows you to set your own time and time zone for limit resets according to your needs.

In the future, the author plans to use DuckDB for more complex log analysis.

#claudecode

Windsurf Wave 10
The presentation of this update was spread over several days, which allowed for more blog posts and YouTube videos – this was done for Wave 8 and probably works well from a marketing perspective.

https://windsurf.com/blog/windsurf-wave-10-planning-mode
A "Planning Mode" button appeared - this is a correct and obvious step (a response to MCP https://www.task-master.dev/). First, the task is broken down into subtasks using a "thinking" model, and then the coder (a simpler model) doesn't get confused about what to do, but follows the steps (it's also cheaper for them this way). Works well with memory.

https://www.youtube.com/watch?v=BmRJ_yH6BpU

https://windsurf.com/blog/windsurf-wave-10-browser
Again, there were MCPs. Now they've been redesigned into one button to launch a managed Chromium instance so that the chat can directly see what's happening in it.

https://windsurf.com/blog/windsurf-wave-10-ux-enterprise
One cluster has been launched in Europe. Now the PCW metric is used not only for autocompletions in the editor but also to assess how the chat agents are performing.

Percentage of Code Written (PCW)
https://windsurf.com/blog/percentage-code-written
PCW — is the percentage of code written with the help of AI tools. Helps assess the real benefit of AI in development and excludes metric inflation (unlike competitors' "adoption rate"). Only code that makes it into a commit is considered (unsuccessful edits are not counted), while the metric does not take into account architecture, debugging, or reviews.

W — bytes of code from Windsurf (Tab, Cascade).
D — bytes of code written manually.
PCW = (100 × W) / (W + D).

#windsurf

The Cursor team continues its marketing adventures on YouTube – they went to Anthropic to also appear on their channel:

https://www.youtube.com/watch?v=BGgsoIgbT_Y

Cursor is growing rapidly thanks to AI integration, especially Claude models. In a year, the company reached $300 million in revenue, and millions of developers started using their tools. Initially, AI only helped with auto-completion or editing a single file, but thanks to Claude 3.5 Sonnet, more complex features appeared, such as multi-file changes or a background agent that performs tasks in parallel. The Cursor team itself uses its product for development, which allows for quickly testing ideas and discarding non-working solutions.

However, working with large codebases remains a challenge – AI often doesn't understand internal nuances like DSLs or non-obvious rules that are passed on verbally. Therefore, code review remains a key step, even if AI writes most of it. In the future, approaches like pseudocode for concise description of changes or integration with other systems (e.g., Slack) might be possible, allowing AI to consider context. At the same time, Claude 3.5 Sonnet and newer versions have significantly improved the quality of generated code.

AI will not replace developers, but it will change their role. It already allows even non-experts (e.g., sales department employees) to create simple tools, and engineers to focus on architecture and UX. By 2027, almost 100% of code will be created with AI assistance, but understanding context will remain a key skill.

#cursor

At the WWDC conference, there was even more about Xcode 26 and AI-assisted programming capabilities.

https://developer.apple.com/documentation/xcode/writing-code-with-intelligence-in-xcode
You can choose ChatGPT (for some reason, they write it exactly like that, not mentioning the model or OpenAI) or add a model provider from the internet or locally.

Code Understanding: "What does this code do?" → Xcode will provide a detailed answer.
Code Generation and Correction: You can ask to add properties, create a list, change the interface, etc. Example: "Create a table with all properties of the object."
Automatic Changes Application: Enable "Automatically Apply Changes" or review proposed edits manually.
Error Fixes: Xcode suggests fixes for compilation errors.

You can revert changes through "History", but a Git repository is required.

It looks like this:
https://www.youtube.com/watch?v=OV38tVwySE0

The functionality is quite similar to most "add chatgpt" plugins for VSC in 2023, but the visual design is certainly much better.

#applexcode

Continuation of Cursor's marketing tribulations - renamed version 0.51 to 1.0.0 so that "people understand" it's a real program. They posted a description on their YouTube channel.

https://www.cursor.com/changelog/1-0
They continue to roll out the background agent, but you need to disable "privacy mode" (!) to start sharing your code with Cursor and allow them to collect telemetry. Those who don't want this cannot use it yet.

They also added BugBot for GitHub which also works in the background. They reworked memory mode between chats again (Windsurf has been doing this for a long time), and finally added support for markdown tables and Mermaid diagrams.

Like most other AI coding tools, they have compiled an MCP catalog, which is at https://docs.cursor.com/tools - currently 8 verified ones.

Discussion on HN
https://news.ycombinator.com/item?id=44185256
A significant part of the discussion focuses on comparing it with Claude Code from Anthropic. Many users, especially those who paid significant amounts for Cursor Pro (e.g., $100-$800/month), switched to Claude Code (with $100 or $200/month plans) and note a significantly better agent experience: fewer errors in tool calls, premature completion, problems applying changes. Just a day before the discussion, Claude Code was added to Anthropic's Pro plan for $20/month, making it significantly more accessible.

There is an opinion that the current prices for AI tools (including Cursor) are subsidized by VC money, and the companies are not yet profitable.

Advantages of Claude Code (according to some users):

Better agent performance, fewer errors. High productivity with parallel sessions.
Works well with the command line (e.g., can connect via SSH and execute commands, asking for permission).
Considered "smarter" due to a different system prompt and behavior compared to using the same Claude model through Cursor.

Disadvantages of Claude Code and Advantages of Cursor:

Claude Code: Can quickly "burn" tokens, sometimes makes strange errors. Lacks some Cursor features like "checkpoints" for rolling back changes (although there are workarounds).
Cursor: Users note the fast "Tab" autocompletion for small changes. The basic Pro plan ($20/month) is (was?) more affordable.

Other tools and approaches:

Aider: Mentioned as a more "precise tool," integrates better with git (makes commits, which Cursor/Claude Code don't do by default), more controlled.
Zed: Some users are switching to Zed due to better performance compared to Cursor.

#cursor #claudecode

https://www.youtube.com/@cursor_ai
Cursor probably ran out of user growth, because instead of making their product good, they created a YouTube channel and started talking about how good their product is.

Currently two videos. An announcement and a discussion about their model.

https://www.youtube.com/watch?v=sLaxGAL_Pl0

Key points:

The goal of Cursor is to create an AI assistant for developers that understands code better than a human.
Approach: They train models on huge amounts of data (including private repositories). They apply "curriculum learning" - from simple to complex.
Results: Cursor's models outperform Copilot and ChatGPT in code understanding tests. They can edit code, not just generate it (e.g., make changes based on instructions).
Features: "Code infilling" - predicting missing parts of the code. "Long-range dependencies" - understanding connections in large files.

#cursor

Mistral Agents API
https://mistral.ai/news/agents-api

Mistral AI introduces the Agents API — a tool for creating autonomous AI agents that: perform actions (code, search, image generation), while maintaining context between requests and coordinating with each other.

Code execution (Python in a sandbox, TypeScript promised soon). https://docs.mistral.ai/agents/connectors/code_interpreter/
Image generation (Black Forest Lab FLUX1.1).
Document processing (RAG).

An example provided is a Developer Assistant - integration with GitHub.

https://www.youtube.com/watch?v=1Tt9Fq1pUPQ

#autonomousagents #mistral

Factory Droids (bad name from a search perspective)
https://www.factory.ai/news/ga
Startup Factory announced the launch of its Droids platform — another "world's first" autonomous agents for the full Software Development Life Cycle (SDLC).

They do everything that background AI coders currently do:

Autonomous development — create production-ready features based on specs or requests. Automatically prioritize and assign tickets.
Incident resolution — analyze alerts, find root causes, and fix bugs. Perform context-aware PR reviews.
Deep code analysis — search for answers in the codebase, documentation, and the internet.

The interface can be seen in the video:
https://www.youtube.com/watch?v=GkFd3d8suLM

Costs $40+$10 per month.

#autonomousagents