CodeWithLLM-Updates
-

Atlassian Rovo Dev CLI
https://community.atlassian.com/forums/Rovo-Dev-AI-Agents-Beta-articles/Introducing-Rovo-Dev-CLI-AI-Powered-Development-in-your-terminal/ba-p/3043623
RovoDev is a new AI tool from Atlassian (creators of Jira, Bitbucket, Confluence), which is the answer to Anthropic's Claude Code.

https://www.youtube.com/watch?v=MjOJE7WbvfE

You can download the CLI from the official Atlassian blog post or the community page. You need to create a free Atlassian account. Installation is done using Windows PowerShell (for the x86-64 version). It works natively with Windows (unlike Claude Code which requires WSL). Authentication is done via the acli.exe rodev login command (you need to enter your email and API key).

The demo showed how the agent can analyze code (for example, a "prisoner's dilemma" simulator) and perform tasks like "optimize performance".

Key advantage: during beta testing, Atlassian provides 20 million free tokens daily. This is a significant advantage compared to Claude Code, which costs 5-10 dollars per use.

Google Agent Update
https://jules.google/docs/changelog/

  • If you added a setup script, Jules now runs it serially.
  • Context. Jules reads and uses AGENTS.md if it exists in your repository.
  • More testing. Jules writes and runs tests more often on its own.
  • Agent punting significantly reduced. Strengthened the loop so Jules keeps moving forward.

Lecture by Andrej Karpathy

https://www.youtube.com/watch?v=LCEmiRjPEtQ

1. Three Eras of Software

  • Software 1.0 👨‍💻: Traditional code (e.g., C++, Python) written by humans.
  • Software 2.0 🧠: Neural networks (model weights) trained on data.
  • Software 3.0 🤖: LLMs (ChatGPT etc.), where programs are natural language prompts (English).

"We are now programming computers in English - it's insane!"

2. Is LLM the New OS?

  • Analogy with Operating Systems:
    • LLM = CPU 🖥️
    • Context window = RAM
    • Multimodality and tools = APIs
  • 1960s of AI: Currently LLMs are expensive and run in the cloud (like mainframes), but will soon be local (like PCs).
  • LLM Problems: Hallucinations, "spiky intelligence" (super smart in one area, dumb in another), vulnerabilities.

3. How to Work with LLMs?

  • Partial Autonomy:
    • Examples: Cursor (autocode assistant), Perplexity (AI search).
    • "Autonomy slider": From suggestions to a full agent.
    • GUI is critical: Visualizing changes speeds up verification.
  • Best Practices:
    • Clear prompts → fewer errors.
    • Keep the AI "on a leash" (too large changes are hard to verify).

4. Vibe Coding — Programming for Everyone

  • Now everyone can code just by describing tasks in English.
  • Example: Karpathy built an iOS app in a day without knowing Swift. But deployment and DevOps still require manual work (and that's a pain 😅).

5. The Future: Infrastructure for Agents

  • Need "Documents for AI":
    • lm.txt instead of robots.txt — instructions for LLMs.
    • Markdown documentation (like Vercel and Stripe).
    • GitHub → Ingest (converts repository to text for LLM). https://gitingest.com/

6. Summary

  • We are in the 1960s of AI — everything is just starting.
  • Transition from "crutches" to agents will take years (don't believe the 2025 hype).
  • Iron Man suit analogy: Currently augmenting humans, but moving towards full autonomy. The future belongs to a hybrid of humans and AI!

Discussion on HN
https://news.ycombinator.com/item?id=44314423

LLM as a new programming paradigm

Arguments "FOR":

  • "English — the new programming language": This is a fundamental shift from deterministic, formal languages to probabilistic ones, allowing non-programmers to create software ("vibe coding").
  • A new tool: LLM is another tool in the developer's arsenal that complements, rather than replaces, existing approaches.
  • Dealing with uncertainty: Programmers have always dealt with non-determinism (API responses, user input), so working with LLMs is just an extension of this practice.

Arguments "AGAINST" (Skepticism):

  • Formal languages are an advantage, not a drawback: They provide precision, reliability, and verifiability, which are the foundation of engineering. Abandoning them in favor of natural language is a step back towards "magical thinking."
  • LLM non-determinism is dangerous: Unlike an API error, an LLM can produce "garbage that looks like gold" — a plausible but completely incorrect answer that is difficult to detect.
  • Hype vs. reality: Many believe that the capabilities of LLMs are greatly exaggerated, comparing the current excitement to the "cryptocurrency bubble."

The developer's role is transforming:

  • From writing code line by line to "context wrangling" and prompt engineering.
  • The human becomes the verifier and curator — the one who provides a fast "feedback loop" (generation → verification → correction).
  • Some fear that this will devalue the profession, turning engineers into "AI QA testers." Others see it as an opportunity for experts from other fields to create their own tools.

Practical tools and challenges

  • Structured Outputs: Using JSON mode is a "superpower" that makes LLM output predictable and suitable for programmatic processing. This is often an underestimated tool.
  • Determinism vs. Chaos: LLMs are not entirely random. At temperature 0, they are deterministic but "chaotic" (small changes in input can lead to large changes in output).

The transparency of Claude Code token usage is problematic.

https://github.com/Maciek-roboblog/Claude-Code-Usage-Monitor
A project that solves this problem by providing a clear picture of token and time usage. It uses local Claude Code logs (~/.claude/projects/*/*.jsonl).

Data is updated every 3 seconds, the tool calculates when tokens might run out based on the current usage rate. It works with Pro, Max5, and Max20 plans and can automatically detect your current plan. It allows you to set your own time and time zone for limit resets according to your needs.

In the future, the author plans to use DuckDB for more complex log analysis.

Windsurf Wave 10
The presentation of this update was spread over several days, which allowed for more blog posts and YouTube videos – this was done for Wave 8 and probably works well from a marketing perspective.

https://windsurf.com/blog/windsurf-wave-10-planning-mode
A "Planning Mode" button appeared - this is a correct and obvious step (a response to MCP https://www.task-master.dev/). First, the task is broken down into subtasks using a "thinking" model, and then the coder (a simpler model) doesn't get confused about what to do, but follows the steps (it's also cheaper for them this way). Works well with memory.

https://www.youtube.com/watch?v=BmRJ_yH6BpU

https://windsurf.com/blog/windsurf-wave-10-browser
Again, there were MCPs. Now they've been redesigned into one button to launch a managed Chromium instance so that the chat can directly see what's happening in it.

https://windsurf.com/blog/windsurf-wave-10-ux-enterprise
One cluster has been launched in Europe. Now the PCW metric is used not only for autocompletions in the editor but also to assess how the chat agents are performing.

Percentage of Code Written (PCW)
https://windsurf.com/blog/percentage-code-written
PCW — is the percentage of code written with the help of AI tools. Helps assess the real benefit of AI in development and excludes metric inflation (unlike competitors' "adoption rate"). Only code that makes it into a commit is considered (unsuccessful edits are not counted), while the metric does not take into account architecture, debugging, or reviews.

  • W — bytes of code from Windsurf (Tab, Cascade).
  • D — bytes of code written manually.
  • PCW = (100 × W) / (W + D).

The Cursor team continues its marketing adventures on YouTube – they went to Anthropic to also appear on their channel:

https://www.youtube.com/watch?v=BGgsoIgbT_Y

Cursor is growing rapidly thanks to AI integration, especially Claude models. In a year, the company reached $300 million in revenue, and millions of developers started using their tools. Initially, AI only helped with auto-completion or editing a single file, but thanks to Claude 3.5 Sonnet, more complex features appeared, such as multi-file changes or a background agent that performs tasks in parallel. The Cursor team itself uses its product for development, which allows for quickly testing ideas and discarding non-working solutions.

However, working with large codebases remains a challenge – AI often doesn't understand internal nuances like DSLs or non-obvious rules that are passed on verbally. Therefore, code review remains a key step, even if AI writes most of it. In the future, approaches like pseudocode for concise description of changes or integration with other systems (e.g., Slack) might be possible, allowing AI to consider context. At the same time, Claude 3.5 Sonnet and newer versions have significantly improved the quality of generated code.

AI will not replace developers, but it will change their role. It already allows even non-experts (e.g., sales department employees) to create simple tools, and engineers to focus on architecture and UX. By 2027, almost 100% of code will be created with AI assistance, but understanding context will remain a key skill.

At the WWDC conference, there was even more about Xcode 26 and AI-assisted programming capabilities.

https://developer.apple.com/documentation/xcode/writing-code-with-intelligence-in-xcode
You can choose ChatGPT (for some reason, they write it exactly like that, not mentioning the model or OpenAI) or add a model provider from the internet or locally.

Code Understanding: "What does this code do?" → Xcode will provide a detailed answer.
Code Generation and Correction: You can ask to add properties, create a list, change the interface, etc. Example: "Create a table with all properties of the object."
Automatic Changes Application: Enable "Automatically Apply Changes" or review proposed edits manually.
Error Fixes: Xcode suggests fixes for compilation errors.

You can revert changes through "History", but a Git repository is required.

It looks like this:
https://www.youtube.com/watch?v=OV38tVwySE0

The functionality is quite similar to most "add chatgpt" plugins for VSC in 2023, but the visual design is certainly much better.

Continuation of Cursor's marketing tribulations - renamed version 0.51 to 1.0.0 so that "people understand" it's a real program. They posted a description on their YouTube channel.

https://www.cursor.com/changelog/1-0
They continue to roll out the background agent, but you need to disable "privacy mode" (!) to start sharing your code with Cursor and allow them to collect telemetry. Those who don't want this cannot use it yet.

They also added BugBot for GitHub which also works in the background. They reworked memory mode between chats again (Windsurf has been doing this for a long time), and finally added support for markdown tables and Mermaid diagrams.

Like most other AI coding tools, they have compiled an MCP catalog, which is at https://docs.cursor.com/tools - currently 8 verified ones.


Discussion on HN
https://news.ycombinator.com/item?id=44185256
A significant part of the discussion focuses on comparing it with Claude Code from Anthropic. Many users, especially those who paid significant amounts for Cursor Pro (e.g., $100-$800/month), switched to Claude Code (with $100 or $200/month plans) and note a significantly better agent experience: fewer errors in tool calls, premature completion, problems applying changes. Just a day before the discussion, Claude Code was added to Anthropic's Pro plan for $20/month, making it significantly more accessible.

There is an opinion that the current prices for AI tools (including Cursor) are subsidized by VC money, and the companies are not yet profitable.

Advantages of Claude Code (according to some users):

  • Better agent performance, fewer errors. High productivity with parallel sessions.
  • Works well with the command line (e.g., can connect via SSH and execute commands, asking for permission).
  • Considered "smarter" due to a different system prompt and behavior compared to using the same Claude model through Cursor.

Disadvantages of Claude Code and Advantages of Cursor:

  • Claude Code: Can quickly "burn" tokens, sometimes makes strange errors. Lacks some Cursor features like "checkpoints" for rolling back changes (although there are workarounds).
  • Cursor: Users note the fast "Tab" autocompletion for small changes. The basic Pro plan ($20/month) is (was?) more affordable.

Other tools and approaches:

  • Aider: Mentioned as a more "precise tool," integrates better with git (makes commits, which Cursor/Claude Code don't do by default), more controlled.
  • Zed: Some users are switching to Zed due to better performance compared to Cursor.

https://www.youtube.com/@cursor_ai
Cursor probably ran out of user growth, because instead of making their product good, they created a YouTube channel and started talking about how good their product is.

Currently two videos. An announcement and a discussion about their model.

https://www.youtube.com/watch?v=sLaxGAL_Pl0

Key points:

  • The goal of Cursor is to create an AI assistant for developers that understands code better than a human.
  • Approach: They train models on huge amounts of data (including private repositories). They apply "curriculum learning" - from simple to complex.
  • Results: Cursor's models outperform Copilot and ChatGPT in code understanding tests. They can edit code, not just generate it (e.g., make changes based on instructions).
  • Features: "Code infilling" - predicting missing parts of the code. "Long-range dependencies" - understanding connections in large files.

Mistral Agents API
https://mistral.ai/news/agents-api

Mistral AI introduces the Agents API — a tool for creating autonomous AI agents that: perform actions (code, search, image generation), while maintaining context between requests and coordinating with each other.

An example provided is a Developer Assistant - integration with GitHub.

https://www.youtube.com/watch?v=1Tt9Fq1pUPQ

Factory Droids (bad name from a search perspective)
https://www.factory.ai/news/ga
Startup Factory announced the launch of its Droids platform — another "world's first" autonomous agents for the full Software Development Life Cycle (SDLC).

They do everything that background AI coders currently do:

  1. Autonomous development — create production-ready features based on specs or requests. Automatically prioritize and assign tickets.
  2. Incident resolution — analyze alerts, find root causes, and fix bugs. Perform context-aware PR reviews.
  3. Deep code analysis — search for answers in the codebase, documentation, and the internet.

The interface can be seen in the video:
https://www.youtube.com/watch?v=GkFd3d8suLM

Costs $40+$10 per month.