CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

GitHub Copilot is also not very good with naming; now under the name agent will also be the cloud agent in response to the agent from OpenAI.

https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/
The cloud Copilot agent can automatically solve tasks in the repository: assign an issue (or several) — it will analyze the code, make changes, test with tests, and send a PR for review. Copilot works in the background, using a secure cloud environment (based on GitHub Actions). Available only for Copilot Pro+ and Enterprise, consumes GitHub Actions minutes.

Apparently, GitHub Copilot for VSCode is not developing as quickly and well as competitors, so MS decided to open-source its code to everyone. Grok 3 model was also added.

https://jules.google/
Google's announcement also includes such a cloud agent Jules, but only a waitlist is on the website. Also, for some reason, the design is made like a pixel game.
UPD: During I/O, they announced beta access for users from the USA (5 total tasks per day)

https://docs.anthropic.com/en/docs/claude-code/sdk
Claude Code SDK. Anthropic announced an SDK for their agent programming system from the console. In fact, it doesn't look like the usual one, where we can connect some product to our code and interact with it. More precisely, it doesn't look like that yet, it says "The SDK currently support command line usage". That is, they rather expanded the possibilities of interacting with it from the console.

OpenAI did it
https://openai.com/index/introducing-codex/

They presented a cloud-based software engineering agent called Codex, powered by Codex-1 (a specialized version of o3), which should not be confused with the 2021 Codex model or the Codex CLI agent programming tool released last month.

Seriously, I recently wrote that it's currently very important to solve the problem of orchestrating AI programming agents' tasks, and it seems from the video presentation that they have done just that. It's not yet available in the standard Plus plan, only in Pro ($200/month), so not everyone will be able to try it.

Codex handles small, well-defined tasks well, but according to users feedback, it struggles with follow-up requests in the chat so far. This means you need to first break down the work into a set of tasks that will not change afterward.

Codex is not intended for "vibe coding" and is best suited for experienced engineers working with stable repositories: adding features or fixing bugs. It has a simple interface, similar to the familiar ChatGPT, with a text field for describing the task and "Ask" and "Code" buttons.

https://www.youtube.com/watch?v=utujQfglbk8

There's a button similar to "play" that sends the task to the agent in the cloud in the background. It queues the task, then shows a detailed execution log. In the video presentation, it looks like a significant achievement for the field of AI programming agents.

By the way, Cursor also added a preview of the background agent feature for a limited number of users in the new version 0.50.

Amp available to everyone since May 15
https://ampcode.com/how-i-use-amp

Sourcegraph decided to take an interesting marketing approach. They already have a VSC plugin AI agent for writing code (Cody) positioned for business - now they have created a new separate website in a strange, informal, and conversational style and are selling the AI agent plugin they named AMP this way.

It has such manual, which already look like a different site https://ampcode.com/manual - there they write about principles, one of which is "No model selection, always the best models. You don't choose the models, we do" and currently use Claude 3.7 Sonnet Extended thinking, which is certainly good, but from the leaderboards, the best is Gemini 2.5 Pro.

Currently, they give 1000 free credits (from my usage, it's about 700k tokens), then packages are $5 for 500.

The system instructions file is here AGENT.md - it's unknown when we will all agree on one name, and for now, there will be 10 copies in repositories for each AI agent.

Based on my observations, by the end of 2024, few people took Codeium Windsurf seriously.

Here's a Hacker News thread from 70 days ago comparing Windsurf and Cursor, which didn't attract much engagement https://news.ycombinator.com/item?id=43288745. Cursor is mentioned as one of the first AI IDEs users tried; it's well-configured and 'just works'. Windsurf's positives include a free autocompletion feature and greater versatility. Github Copilot lags behind Cursor and Windsurf in functionality.

When the topic of vibe coding came up, Windsurf, being a simpler system compared to Cursor, started attracting more users. Subsequently, they rebranded, and the company's focus improved. News about a potential acquisition by OpenAI has been circulating for several weeks, further increasing interest.

In a new comparison poll on Hacker News https://news.ycombinator.com/item?id=43959710, significantly significantly significantly more people participated.

People note that the AI IDE market is changing rapidly. Developers are constantly releasing new features, and tools borrow ideas from each other. This leads to the 'leader' often changing.

Discussion on "Agentic / Vibe Coding":

  • people see the potential in "agentic mode" for automating routine tasks (e.g., adding types, creating boilerplate), but emphasize the need for careful review of generated code.
  • there's a significant range of opinions on the effectiveness and safety of "agentic / vibe coding" where the AI independently makes changes across any files in the repository.
  • some experienced developers believe that AI helps non-experts more, while for experienced users, it's more like 'smarter autocompletion'.

Cursor Pros:

  • excellent autocompletion ("tab-complete"), better than competitors
  • the Cmd-K feature (inline editing) broadly made the IDE known and continues to be liked by users
  • clear pricing ($20 per month) which is quite cheap for access to the best models

Cursor Cons:

  • issues with context limitation in Cursor to save costs - the system tries to use as few tokens as possible
  • the "Agent mode" is quite imperfect and too "jumpy" forward

Windsurf Pros:

  • repository code awareness seems better
  • feels faster in some aspects

Windsurf Cons:

  • problems with large files and similar context limitation where only a small piece of code is sent to the model
  • the interface is more suited for vibe coding, making it harder to work "manually"
  • pricing - some find it more expensive than Cursor in agent mode, because with active use, on top of the up to $15 per month, you need to buy $10/250 credits packages.

Thread participants express positive feedback about Zed as a fast, efficient, and 'uncluttered' editor. But AI autocompletion and 'intelligence' in Zed are not yet at Cursor's level. Additionally, it doesn't support Windows.

They are also compared with Aider, Cline, GitHub Copilot, JetBrains IDEs (IntelliJ, PyCharm, Rider, etc.). Quite a few other AI tools are also mentioned: Claude Code (very expensive), Amazon Q (good for AWS), Machtiani, Brokk (an Aider alternative), Repomix, Void (an open-source Cursor alternative), Nonbios.ai, Amp.

Many participants recommend trying multiple tools, as the situation is changing rapidly, and what works today may change tomorrow.

https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Google DeepMind AlphaEvolve
Available to academic researchers, the AI agent for algorithm design based on Gemini (a combination of Flash and Pro), which combines the creativity of large language models (LLMs) with automated evaluators using metrics for discovering and optimizing algorithms. It uses an evolutionary approach to improve the best ideas.

Where is it already used?

1. Google Data Center Optimization 🖥️

  • AlphaEvolve found a more efficient algorithm for resource allocation in Borg (Google's data center management system).
  • Result: +0.7% of Google's global computing resources are now used more efficiently.

2. Hardware Design 💻

  • Optimized matrix multiplications in TPUs (Google's specialized chips for AI).
  • Accelerated the operation of arithmetic circuits while maintaining correctness.

3. Accelerating AI Training ⚡

  • Reduced Gemini training time by 1% by optimizing matrix operations.
  • Accelerated FlashAttention (a core algorithm for transformers) by 32.5%.

Improved Strassen's algorithm (1969) for 4x4 matrices, reducing the number of operations. Improved the best solutions for 20% of open problems in mathematical analysis, geometry, and combinatorics.

Interestingly, AlphaEvolve was used to optimize components involved in training the Gemini models themselves. This raises questions about the potential for recursive AI self-improvement and the approach towards a "singularity".

It seems that using Claude code, Cursor, and others has become largely repetitive. The workflow usually involves planning the task (roadmap file), then giving commands to the agent to implement the plan in code.

Thus, task orchestration is the next necessary thing for every agentic AI solution.

I have already mentioned https://www.task-master.dev/, which is currently a popular solution due to MCP.


aider
https://aider.chat/docs/scripting.html
aider natively allows using simple scripting from the terminal to perform repetitive actions. There is also an additional Python API function for scripting, but it is not officially supported or documented.

Roo Code | Boomerang Orchestrator (since ver 3.14.3)
https://docs.roocode.com/features/boomerang-tasks
They added "🪃 Orchestrator" as a built-in mode. It allows breaking down complex projects into smaller, manageable parts. Each sub-task is then executed in its own context, often using a different mode tailored for that specific task.


Code Claude Code
https://github.com/RVCA212/codesys
A project developing a Python SDK for interacting with the Claude CLI tool. The most effective way to use it is by mimicking your actual workflow. Supports resuming specific conversations by ID.

Cloud Code SDK
https://cloudcoding.ai/
A programmable AI Coder SDK in Python - both locally and in a Sandbox cloud. You can think of it as a way to interact with Cursor or Claude code, at a low level with great control. But instead of using these applications, the project uses its own agent that can modify code and use its own built-in tools. Currently supports only OpenAI and Anthropic models. Works with or without Git repositories.

Github has posted a large tutorial on the new Github Copilot.

https://www.youtube.com/watch?v=0Oz-WQi51aU

Three Modes (now similar to Cursor 😉):

  • Ask Mode 💬 – for discussing changes and getting answers.
  • Edit Mode ✏️ – for precise edits and refactoring.
  • Agent Mode 🤖 – automated task execution (e.g., code generation from README).

Example: Creating a hotel booking application using different models (Claude 3.5, Gemini 2.5 Pro, GPT-4).

🔧 Working Techniques

Structured README file 📄: A clear description of the project, tech stack, and file structure helps the agent generate code more accurately.

Copilot Instructions 📌: A file with global guidelines (e.g., code style requirements, security, logs).

Visual Prompting 🖼️: Some models support uploading screenshots for UI analysis.

🛠️ Problem Solving

  • Browser Caching: Copilot can suggest clearing the cache or a fix for templates.
  • Testing: Automated test generation (e.g., for Flask endpoints) using the /test command.
  • Documentation: Updating the README file via Gemini 2.5 Pro with Mermaid diagrams.

🚀 Tips

Claude 3.5 – balances speed and quality.
Gemini 2.5 Pro – powerful documentation generation.
GPT-4 – for complex tasks with context.

Security: Always ask Copilot for a code audit (e.g., How can I make this app more secure?).

Windsurf is in talks to be acquired by OpenAI for about $3 billion.

Apple and Anthropic are teaming up to build a “vibe-coding” software platform that will use generative AI to write, edit, and test code for programmers.

https://developers.googleblog.com/en/gemini-2-5-pro-io-improved-coding-performance/

Google release Gemini 2.5 Pro Preview (I/O edition). This update features even stronger coding capabilities. Expect meaningful improvements for front-end and UI development, alongside improvements in fundamental coding tasks such as transforming and editing code, and creating sophisticated agentic workflows.

https://windsurf.com/
https://lovable.dev/

Windsurf and Lovable have improved the design of their products and pricing strategy.

Windsurf has a new logo and more transparent use of "credits" for the AI chat. Free tier now has new, higher limits, unlimited Fast Tab and Cascade Base.

https://lovable.dev/blog/lovable-2-0

Lovable 2.0 introduces key innovations: switching the agent to chat mode for better understanding and planning, workspaces for collaborative development, and a security scanning function to detect vulnerabilities.

In addition to major functional updates, Lovable 2.0 has updated its brand and interface, added the ability to visually edit styles, and simplified the process of connecting custom domains.

Changes in pricing plans, which now include Pro and Teams, are aimed at better meeting the needs of both individual developers and teams.

https://docs.cursor.com/guides/advanced/large-codebases

Cursor developers shared tips and techniques for effectively working with large and complex codebases.

They highlighted key aspects that help in navigating unfamiliar code faster. Key recommendations include:

  • Using Chat for Code Understanding: Via the chat mode, you can quickly get explanations on how certain parts of the code work. It is also recommended to enable the "Include Project Structure" feature for better understanding of the project structure.
  • Writing Rules: Creating rules allows emphasizing important project information and ensures better understanding for the Cursor agent.
  • Detailed Planning of Changes: For large tasks, it's worth spending time creating an accurate and well-structured plan of action steps.
  • Choosing the Right Tool: Cursor offers various tools (Tab, Cmd K, Chat), each with its advantages for specific tasks – from quick fixes to large-scale changes across multiple files.

They emphasize the importance of breaking down large tasks into smaller parts, including relevant context, and frequently creating new chats to maintain focus.

https://memex.tech/blog/introducing-memex-the-everything-builder-for-your-computer

Memex has officially announced the launch of its platform, which allows you to create any software, from web applications to 3D designs. It is worth noting that they chose a very unfortunate name for themselves, because firstly it is the term of the inventor Vannevar Bush, and secondly there are already many projects with it.

Memex is positioned as "The Everything Builder" for the computer. The platform supports any technology stacks and programming languages. Memex works on Windows/Mac/Linux (this is the Tauri framework) and allows everyone, regardless of their technical experience, to explore, build and deploy software solutions by talking to AI.

The agent uses Claude models - a combination of Sonnet 3.7 + Haiku, and has access to the Internet. Creates checkpoints via built-in shadow git. Plans to support Gemini 2.5 and MCP.

https://www.byterover.dev/

Memory of the code and implementation of various functions as an MCP server was done by ByteRover. That is, using this or a similar project, you can switch between Cursor, Windsurf, Cline/Roo, and other coding agents with MCP, and each will know what has already been done. Free plan for 1k records/month.

Minus - their cloud is used, that is, the data is not stored locally, but in a company that needs to be trusted.

https://www.youtube.com/watch?v=9sPsraoe0_c

https://github.com/github/github-mcp-server

GitHub launched their official MCP server.

https://www.youtube.com/watch?v=d3QpQO6Paeg


https://modelcontextprotocol.io/

The Model Context Protocol (MCP) was introduced by Anthropic on November 24, 2024, as an open standard for connecting AI systems to data sources. The first connectors released were for GitHub, Google Drive, and Slack.

By February 2025, the developer community had created over 1000 open MCP connectors, demonstrating significant ecosystem growth and interest in the protocol. Support for MCP also gradually appeared in all major AI programming applications/extensions, including Cline/Roo, Cursor, Windsurf, and Continue.

Through MCP, you can work with Postgres, Upstash, and Slack directly in the code editor. Browsertools MCP provides access to the browser console for debugging. And https://context7.com/ provides up-to-date documentation for AI code editors.

A significant step was OpenAI's announcement on March 26, 2025, of support for MCP. Soon after, at Google Next 2025, Google announced MCP support in the SDK for their Gemini models (though they also introduced the A2A protocol). Thus, the protocol is gradually becoming universal.


Organization and Ecosystem. Following the initial repository (https://github.com/modelcontextprotocol/servers), third-party online catalogs began to emerge (such as https://opentools.com/ https://mcp.so/ https://mcpserverdirectory.org/, etc.), where you can find the necessary server. Projects for MCP managers are appearing that simplify installation, for example https://mcp-get.com/ https://mcpm.sh/ https://mcpmanager.app/ https://mcpmcp.io/, etc.

There are projects that help convert a standard REST API to MCP - for example https://rapid-mcp.com/ https://api200.co/mcp.

The problem with open catalogs is the unclear reliability of the hosted servers.

Security. Since an MCP server acts as an intermediary between the model and the data source, a malicious actor who sets up a server can log everything, including API access keys to the data. Authentication and authorization are not yet standardized within MCP.

Servers are divided into official and community types. Obviously, official servers are not intermediaries, and requests to them are analogous to requests to API endpoints. Community servers, set up by third parties, should be treated with caution, and it's worth checking who is behind them. You can also set up your own server in cloud (for example, weather on AWS lambda) or with a container via mcp-containers.

The more the protocol spreads, the more official servers will appear, as was the case with REST API.

Claude Code, OpenAI Codex, and Aider are agents that work with the console.

https://github.com/coder/agentapi
The AgentAPI project allows managing such systems via HTTP API (GET and POST). This allows, for example, launching multiple systems and "talking" to them through one chat, or creating an MCP so that one agent system can task another.

https://github.com/eyaltoledano/claude-task-master
For clear management of development steps, you can use this project and connect it as an MCP.

https://www.anthropic.com/engineering/claude-code-best-practices
For Claude Code, it turns out that there is a command word "ultrathink", which you can read about in a fairly detailed document they posted on the site.

"We recommend using the word "think" to activate an extended reasoning mode that gives Claude additional compute time to more thoroughly evaluate alternatives. These specific phrasings map directly onto increasing levels of compute budget in the system:
"think" < "think hard" < "think harder" < "ultrathink".
Each level allocates more and more compute budget for Claude to use."

Other recommendations:

  • configure the context (here the CLAUDE.md file) through system instructions. Сode standards, commands, etc.
  • use .allowed-tools to allow frequently used tools. Configure secure MCPs
  • plan and add tests (via TTD) before generating code
  • have the agent make regular commits
  • explain to the agent specifically and thoroughly. The more specific the request, the better the result.
  • use less automatic (auto-accept) mode: monitor what the agent outputs and correct it as early as possible (Escape key to stop) if it chooses the wrong path
  • Advanced level — run two agents: one writes code, the other checks.

Varun Mohan, co-founder and CEO of Codeium, now Windsurf, shares the company's story, discusses two key pivots, hiring philosophy, the impact of AI on the engineering profession, corporate market entry strategy, and demonstrates Windsurf capabilities.

https://www.youtube.com/watch?v=5Z0RCxDZdrE

First pivot (2022): With the emergence of ChatGPT, the team shifted its focus to AI-powered coding, creating a free plugin for code autocompletion (supporting VSCode, JetBrains, etc.). Second pivot → Windsurf: VSCode API limitations forced them to fork the IDE and create an AI-native environment with advanced features (e.g., visual editing).

New paradigm: AI writes >90% of the code → the developer focuses on review and architecture. For non-developers: creating simple applications without deep knowledge.

AI model usage strategy - hybrid approach: Frontier models (e.g., Sonnet) for high-level tasks. Own models for code retrieval and editing.

The conversation highlights how quickly the development landscape is changing thanks to AI. Windsurf is actively shaping this future, not afraid of radical pivots and betting on a deep understanding of code and "agentic" AI capabilities, not just autocompletion.

The possibility of OpenAI acquiring Windsurf is currently being actively discussed in the news.

https://github.com/openai/codex

OpenAI finally responded to Claude Code and released their version of an agent for programming that works through the terminal and can create and edit code files. The project is open source.

Also, like Claude Code, it officially supports only macOS and Linux. Windows support is available through WSL.

They named it Codex, which may now be confusing, as one of the first models for programming (from 2021), on which GitHub Copilot started working, had the same name.

It is installed simply as a global package npm install -g @openai/codex. There are three Approval Modes - by default, it's Suggest (read-only), but it can also be set to editing and full auto (with command execution in the terminal).

https://www.youtube.com/watch?v=FUq9qRwrDrI

Announced along with the thinking models o3 and o4-mini, which were finally given the ability to use tools. By default, Codex uses o4-mini, but you can specify any model available in the Responses API.

All file operations and command executions occur locally - the request, context, and diff summaries are sent to the model on the server for generation.