CodeWithLLM-Updates
-

GitHub Copilot is also not very good with naming; now under the name agent will also be the cloud agent in response to the agent from OpenAI.

https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
The cloud Copilot agent can automatically solve tasks in the repository: assign an issue (or several) — it will analyze the code, make changes, test with tests, and send a PR for review. Copilot works in the background, using a secure cloud environment (based on GitHub Actions). Available only for Copilot Pro+ and Enterprise, consumes GitHub Actions minutes.

Apparently, GitHub Copilot for VSCode is not developing as quickly and well as competitors, so MS decided to open-source its code to everyone. Grok 3 model was also added.

https://jules.google/
Google's announcement also includes such a cloud agent Jules, but only a waitlist is on the website. Also, for some reason, the design is made like a pixel game.
UPD: During I/O, they announced beta access for users from the USA (5 total tasks per day)

https://docs.anthropic.com/en/docs/claude-code/sdk
Claude Code SDK. Anthropic announced an SDK for their agent programming system from the console. In fact, it doesn't look like the usual one, where we can connect some product to our code and interact with it. More precisely, it doesn't look like that yet, it says "The SDK currently support command line usage". That is, they rather expanded the possibilities of interacting with it from the console.

OpenAI did it
https://openai.com/index/introducing-codex/

They presented a cloud-based software engineering agent called Codex, powered by Codex-1 (a specialized version of o3), which should not be confused with the 2021 Codex model or the Codex CLI agent programming tool released last month.

Seriously, I recently wrote that it's currently very important to solve the problem of orchestrating AI programming agents' tasks, and it seems from the video presentation that they have done just that. It's not yet available in the standard Plus plan, only in Pro ($200/month), so not everyone will be able to try it.

Codex handles small, well-defined tasks well, but according to users feedback, it struggles with follow-up requests in the chat so far. This means you need to first break down the work into a set of tasks that will not change afterward.

Codex is not intended for "vibe coding" and is best suited for experienced engineers working with stable repositories: adding features or fixing bugs. It has a simple interface, similar to the familiar ChatGPT, with a text field for describing the task and "Ask" and "Code" buttons.

https://www.youtube.com/watch?v=utujQfglbk8

There's a button similar to "play" that sends the task to the agent in the cloud in the background. It queues the task, then shows a detailed execution log. In the video presentation, it looks like a significant achievement for the field of AI programming agents.

By the way, Cursor also added a preview of the background agent feature for a limited number of users in the new version 0.50.

Amp available to everyone since May 15
https://ampcode.com/how-i-use-amp

Sourcegraph decided to take an interesting marketing approach. They already have a VSC plugin AI agent for writing code (Cody) positioned for business - now they have created a new separate website in a strange, informal, and conversational style and are selling the AI agent plugin they named AMP this way.

It has such manual, which already look like a different site https://ampcode.com/manual - there they write about principles, one of which is "No model selection, always the best models. You don't choose the models, we do" and currently use Claude 3.7 Sonnet Extended thinking, which is certainly good, but from the leaderboards, the best is Gemini 2.5 Pro.

Currently, they give 1000 free credits (from my usage, it's about 700k tokens), then packages are $5 for 500.

The system instructions file is here AGENT.md - it's unknown when we will all agree on one name, and for now, there will be 10 copies in repositories for each AI agent.

Based on my observations, by the end of 2024, few people took Codeium Windsurf seriously.

Here's a Hacker News thread from 70 days ago comparing Windsurf and Cursor, which didn't attract much engagement https://news.ycombinator.com/item?id=43288745. Cursor is mentioned as one of the first AI IDEs users tried; it's well-configured and 'just works'. Windsurf's positives include a free autocompletion feature and greater versatility. Github Copilot lags behind Cursor and Windsurf in functionality.

When the topic of vibe coding came up, Windsurf, being a simpler system compared to Cursor, started attracting more users. Subsequently, they rebranded, and the company's focus improved. News about a potential acquisition by OpenAI has been circulating for several weeks, further increasing interest.

In a new comparison poll on Hacker News https://news.ycombinator.com/item?id=43959710, significantly significantly significantly more people participated.

People note that the AI IDE market is changing rapidly. Developers are constantly releasing new features, and tools borrow ideas from each other. This leads to the 'leader' often changing.

Discussion on "Agentic / Vibe Coding":

  • people see the potential in "agentic mode" for automating routine tasks (e.g., adding types, creating boilerplate), but emphasize the need for careful review of generated code.
  • there's a significant range of opinions on the effectiveness and safety of "agentic / vibe coding" where the AI independently makes changes across any files in the repository.
  • some experienced developers believe that AI helps non-experts more, while for experienced users, it's more like 'smarter autocompletion'.

Cursor Pros:

  • excellent autocompletion ("tab-complete"), better than competitors
  • the Cmd-K feature (inline editing) broadly made the IDE known and continues to be liked by users
  • clear pricing ($20 per month) which is quite cheap for access to the best models

Cursor Cons:

  • issues with context limitation in Cursor to save costs - the system tries to use as few tokens as possible
  • the "Agent mode" is quite imperfect and too "jumpy" forward

Windsurf Pros:

  • repository code awareness seems better
  • feels faster in some aspects

Windsurf Cons:

  • problems with large files and similar context limitation where only a small piece of code is sent to the model
  • the interface is more suited for vibe coding, making it harder to work "manually"
  • pricing - some find it more expensive than Cursor in agent mode, because with active use, on top of the up to $15 per month, you need to buy $10/250 credits packages.

Thread participants express positive feedback about Zed as a fast, efficient, and 'uncluttered' editor. But AI autocompletion and 'intelligence' in Zed are not yet at Cursor's level. Additionally, it doesn't support Windows.

They are also compared with Aider, Cline, GitHub Copilot, JetBrains IDEs (IntelliJ, PyCharm, Rider, etc.). Quite a few other AI tools are also mentioned: Claude Code (very expensive), Amazon Q (good for AWS), Machtiani, Brokk (an Aider alternative), Repomix, Void (an open-source Cursor alternative), Nonbios.ai, Amp.

Many participants recommend trying multiple tools, as the situation is changing rapidly, and what works today may change tomorrow.

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Google DeepMind AlphaEvolve
Available to academic researchers, the AI agent for algorithm design based on Gemini (a combination of Flash and Pro), which combines the creativity of large language models (LLMs) with automated evaluators using metrics for discovering and optimizing algorithms. It uses an evolutionary approach to improve the best ideas.

Where is it already used?

1. Google Data Center Optimization 🖥️

  • AlphaEvolve found a more efficient algorithm for resource allocation in Borg (Google's data center management system).
  • Result: +0.7% of Google's global computing resources are now used more efficiently.

2. Hardware Design 💻

  • Optimized matrix multiplications in TPUs (Google's specialized chips for AI).
  • Accelerated the operation of arithmetic circuits while maintaining correctness.

3. Accelerating AI Training ⚡

  • Reduced Gemini training time by 1% by optimizing matrix operations.
  • Accelerated FlashAttention (a core algorithm for transformers) by 32.5%.

Improved Strassen's algorithm (1969) for 4x4 matrices, reducing the number of operations. Improved the best solutions for 20% of open problems in mathematical analysis, geometry, and combinatorics.

Interestingly, AlphaEvolve was used to optimize components involved in training the Gemini models themselves. This raises questions about the potential for recursive AI self-improvement and the approach towards a "singularity".

It seems that using Claude code, Cursor, and others has become largely repetitive. The workflow usually involves planning the task (roadmap file), then giving commands to the agent to implement the plan in code.

Thus, task orchestration is the next necessary thing for every agentic AI solution.

I have already mentioned https://www.task-master.dev/, which is currently a popular solution due to MCP.


aider
https://aider.chat/docs/scripting.html
aider natively allows using simple scripting from the terminal to perform repetitive actions. There is also an additional Python API function for scripting, but it is not officially supported or documented.

Roo Code | Boomerang Orchestrator (since ver 3.14.3)
https://docs.roocode.com/features/boomerang-tasks
They added "🪃 Orchestrator" as a built-in mode. It allows breaking down complex projects into smaller, manageable parts. Each sub-task is then executed in its own context, often using a different mode tailored for that specific task.


Code Claude Code
https://github.com/RVCA212/codesys
A project developing a Python SDK for interacting with the Claude CLI tool. The most effective way to use it is by mimicking your actual workflow. Supports resuming specific conversations by ID.

Cloud Code SDK
https://cloudcoding.ai/
A programmable AI Coder SDK in Python - both locally and in a Sandbox cloud. You can think of it as a way to interact with Cursor or Claude code, at a low level with great control. But instead of using these applications, the project uses its own agent that can modify code and use its own built-in tools. Currently supports only OpenAI and Anthropic models. Works with or without Git repositories.

Github has posted a large tutorial on the new Github Copilot.

https://www.youtube.com/watch?v=0Oz-WQi51aU

Three Modes (now similar to Cursor 😉):

  • Ask Mode 💬 – for discussing changes and getting answers.
  • Edit Mode ✏️ – for precise edits and refactoring.
  • Agent Mode 🤖 – automated task execution (e.g., code generation from README).

Example: Creating a hotel booking application using different models (Claude 3.5, Gemini 2.5 Pro, GPT-4).

🔧 Working Techniques

Structured README file 📄: A clear description of the project, tech stack, and file structure helps the agent generate code more accurately.

Copilot Instructions 📌: A file with global guidelines (e.g., code style requirements, security, logs).

Visual Prompting 🖼️: Some models support uploading screenshots for UI analysis.

🛠️ Problem Solving

  • Browser Caching: Copilot can suggest clearing the cache or a fix for templates.
  • Testing: Automated test generation (e.g., for Flask endpoints) using the /test command.
  • Documentation: Updating the README file via Gemini 2.5 Pro with Mermaid diagrams.

🚀 Tips

Claude 3.5 – balances speed and quality.
Gemini 2.5 Pro – powerful documentation generation.
GPT-4 – for complex tasks with context.

Security: Always ask Copilot for a code audit (e.g., How can I make this app more secure?).

Windsurf is in talks to be acquired by OpenAI for about $3 billion.

Apple and Anthropic are teaming up to build a “vibe-coding” software platform that will use generative AI to write, edit, and test code for programmers.

https://developers.googleblog.com/en/gemini-2-5-pro-io-improved-coding-performance/

Google release Gemini 2.5 Pro Preview (I/O edition). This update features even stronger coding capabilities. Expect meaningful improvements for front-end and UI development, alongside improvements in fundamental coding tasks such as transforming and editing code, and creating sophisticated agentic workflows.

https://windsurf.com/
https://lovable.dev/

Windsurf and Lovable have improved the design of their products and pricing strategy.

Windsurf has a new logo and more transparent use of "credits" for the AI chat. Free tier now has new, higher limits, unlimited Fast Tab and Cascade Base.

https://lovable.dev/blog/lovable-2-0

Lovable 2.0 introduces key innovations: switching the agent to chat mode for better understanding and planning, workspaces for collaborative development, and a security scanning function to detect vulnerabilities.

In addition to major functional updates, Lovable 2.0 has updated its brand and interface, added the ability to visually edit styles, and simplified the process of connecting custom domains.

Changes in pricing plans, which now include Pro and Teams, are aimed at better meeting the needs of both individual developers and teams.