CodeWithLLM-Updates
-
🤖 AI tools for smarter coding: practical examples, step-by-step instructions, and real-world LLM applications. Learn to work efficiently with modern code assistants.

Varun Mohan, co-founder and CEO of Codeium, now Windsurf, shares the company's story, discusses two key pivots, hiring philosophy, the impact of AI on the engineering profession, corporate market entry strategy, and demonstrates Windsurf capabilities.

https://www.youtube.com/watch?v=5Z0RCxDZdrE

First pivot (2022): With the emergence of ChatGPT, the team shifted its focus to AI-powered coding, creating a free plugin for code autocompletion (supporting VSCode, JetBrains, etc.). Second pivot → Windsurf: VSCode API limitations forced them to fork the IDE and create an AI-native environment with advanced features (e.g., visual editing).

New paradigm: AI writes >90% of the code → the developer focuses on review and architecture. For non-developers: creating simple applications without deep knowledge.

AI model usage strategy - hybrid approach: Frontier models (e.g., Sonnet) for high-level tasks. Own models for code retrieval and editing.

The conversation highlights how quickly the development landscape is changing thanks to AI. Windsurf is actively shaping this future, not afraid of radical pivots and betting on a deep understanding of code and "agentic" AI capabilities, not just autocompletion.

The possibility of OpenAI acquiring Windsurf is currently being actively discussed in the news.

https://github.com/openai/codex

OpenAI finally responded to Claude Code and released their version of an agent for programming that works through the terminal and can create and edit code files. The project is open source.

Also, like Claude Code, it officially supports only macOS and Linux. Windows support is available through WSL.

They named it Codex, which may now be confusing, as one of the first models for programming (from 2021), on which GitHub Copilot started working, had the same name.

It is installed simply as a global package npm install -g @openai/codex. There are three Approval Modes - by default, it's Suggest (read-only), but it can also be set to editing and full auto (with command execution in the terminal).

https://www.youtube.com/watch?v=FUq9qRwrDrI

Announced along with the thinking models o3 and o4-mini, which were finally given the ability to use tools. By default, Codex uses o4-mini, but you can specify any model available in the Responses API.

All file operations and command executions occur locally - the request, context, and diff summaries are sent to the model on the server for generation.

https://openai.com/index/gpt-4-1/

New model update from OpenAI is a response to new Google's Gemini models, which all have a 1 million token context window and more accurate instruction following.

We are particularly interested in the fact that, according to their own tests, the GPT 4.1 model has become better at code generation. That is, if 4o produced decent code on one out of three requests, then 4.1 will do it on every second one 😉.

https://aider.chat/docs/leaderboards/
In the article, the model is compared only to its own models. Overall, it can be evaluated on the Aider LLM Leaderboards, where it achieves 52.4% accuracy, while Gemini 2.5 Pro Preview 03-25 scores 72.9%.


In Cursor, gpt-4.1 is now available in the settings for available models.

This update is particularly important for GitHub Copilot (gpt 4.1 is already available), because their agent and chat are initially tied to the GPT-4 model of OpenAI, and in the free plan Claude Sonnet is still not 3.7, but 3.5.

Tomorrow there will be VS Code Live: Agent Mode Day, where I think they will tell more details.

https://www.pillar.security/blog/new-vulnerability-in-github-copilot-and-cursor-how-hackers-can-weaponize-code-agents

How can you attack automatic code generators?
By poisoning system instructions (“Rules File Backdoor”) of LLM.

Many AI coding programs now have the ability to load them from a text file (for example, in Cursor it is .cursorrules or a rules folder in the project root) - just a text file(s).

I think only inexperienced programmers or those who are not familiar with how new IDEs with agent coders work will run someone else's code without reading the instruction file beforehand if it exists.

The next option is when we create a project and copy such instructions ourselves from open directories, such as cursor.directory - again, you need to understand what you are doing, and read it beforehand.


But Pillar Security researchers found that attackers can use hidden Unicode characters and other bypass techniques in text rule files to trick agent assistants (such as Cursor or GitHub Copilot) and force them to generate code with backdoors or vulnerabilities (for example, to load external hacker javascript to the main page of the site).

How does it work?

  • Creating a malicious rules file: A hacker creates a rules file that looks harmless 👀, but contains hidden malicious instructions 😈 using Unicode characters.
  • Injection into the project: The rules file gets into a shared repository 🌐 or is distributed through communities 🧑‍🤝‍🧑.
  • Code generation: A developer, using an AI assistant, generates code 💻. AI, following malicious rules, creates code with vulnerabilities or backdoors 💥.
  • Malicious code spreads: Due to the fact that rule files are often shared and reused, infection can spread to many projects 🦠.

"Unlike traditional code injection attacks targeting specific vulnerabilities, “Rules File Backdoor” poses a significant risk because it turns AI itself into an attack vector."

The most vulnerable to such an attack are those who think little when creating code - do not read instruction files, do not check everything that was generated. Publishes code or deploys projects without prior security audit.

Theoretically, agent IDEs should be responsible at least for checking rule files and code comments for inserted invisible instructions, but, judging by the article, the developers of Cursor and GitHub Copilot said that users themselves (!) are responsible for the code they generate.

https://windsurf.com/blog/windsurf-wave-7

"Windsurf Wave 7" Update

Cascade is now available in JetBrains IDEs (IntelliJ, WebStorm, PyCharm, GoLand, and many others).

Codeium is now Windsurf
"We decided to rename the company to Windsurf and the product extension to Windsurf Plugin". There will be no more Codeium.

The company was founded in 2021 by Varun Mohan and Douglas Chen with the goal of increasing developer productivity through AI-based coding solutions, and the first year was called Exafunction (engaged in GPU virtualization).

Later, they started code autocompletion, creating a plugin for IDEs. In 2023, chat features inside the IDE and code generation were added. GPT-4 model was integrated.

On November 11, 2024, Windsurf Editor was launched, which they began to promote as the first AI agent-based IDE. Despite the fact that Cursor was first (spring 2023), their marketers tried to pretend it didn't exist.

Chats with different contexts (usually frameworks) are now available at https://windsurf.com/live/

https://console.x.ai/
Model xAI Grok-3 is finally available via API

In programming extensions where you can add your keys (Cline, Roo), you can now use it directly or through https://openrouter.ai/x-ai/grok-3-beta

In Windsurf, all top models are available today, including Gemini 2.5 Pro (which is ahead in many tests) and DeepSeek V3 (0324).

Similarly, in Cursor, you can now select deepseek-v3.1, grok-3-beta, gemini-2.5-pro-exp-03-25 and gemini-2.5-pro-max models in the settings.

In Trae, there are currently no models from Google or xAI.

https://block.github.io/goose/blog/2025/04/08/vibe-code-responsibly

The creators of the Codename Goose project (AI agent for computer control) described their pain points and possible solutions to the problem of vibe coding.

After Karpathy's tweet, which was picked up by the media, more and more people began to create "programs" simply by talking to AI and not looking at the code. But an LLM is not a programmer, it is a coder (code generator).

To put it mildly, this creates very low-quality, unprofessional code, the main problems of which are:

  • "spaghetti"-code that is difficult for a human to understand, where everything is mixed up with everything else. Usually also in one long file of thousands of lines.
  • constant mutation and drifting bugs: pieces of code that no longer do anything, and replacing well-functioning pieces with garbage.
  • huge number of vulnerabilities, code that is easy to hack.
  • leakage of closed information, such as access keys, into publicly available code.

Such code is almost impossible to maintain. It is better not to create it at all if it is not a "program just for yourself for one time use."

Goose developers suggest better control and configuration of agent systems so that they monitor what is being generated in the code:

  • 🧠 "Even if you're vibe coding, don't turn off your brain."
  • use different modes of control for agents, not just fully automatic.
  • use an ignore file (in Cursor it is .cursorignore), where you list what agents should in no case read or modify, and a file of system instructions (here it is goosehints in Cursor .cursorrules) to set restrictions.
  • there are now many MCP servers, including vibe-coded ones; they need to be checked and an Allowlist (allow policy) created for the agent, including only high-quality ones.
  • first plan, then do — a plan breaks everything down well into understandable stages and different small code files. Steps can be checked (how to do this in Cursor — see this video).
  • commit every step and use git to revert to code that worked well.

Exponent
https://x.com/exponent_run
With all these ai programs, it is not entirely clear at what stage of development they are and what they have released, but they wrote that it is still early access, they wrote so 4 months ago, maybe they have finished something.

Augment Agent
https://www.augmentcode.com/
presented the agent. there is a 14-day trial. The agent is designed to solve complex software development tasks, especially in large projects. A key feature is "Memories" that are automatically updated and stored between sessions, improving the quality of generated code and adapting to the programmer's style.

Other features include MCP (Model Context Protocol), "Checkpoints" for safe rollback of changes, multimodal support (screenshots, Figma), execution of terminal commands, and automatic mode.

https://codeium.com/blog/windsurf-wave-6

Windsurf Wave 6 Update

The main feature is "Deploys", which allows publishing websites or Javascript applications to the internet with a single click. Let there be even more of this (vibecoding slop). Currently, this function is integrated with Netlify and aims to simplify the full application development cycle directly within the IDE.

Also, in dialogues with AI agent (Cascade), memory and navigation have been improved.

For paid users, commit description generation has been added with a single button click (this has been in Cursor for a very long time, and it appeared and works for free in Github Copilot).

It appears the developers behind the Zed editor – yes, the ones who've apparently spent the last year unable to procure a Windows machine to build a version for that OS – have noticed something: their unreleased Zed AI is already becoming outdated.

Consequently, they're now rolling out 'Agentic Editing' to their beta testers. Based on the description, it seems to offer the expected suite of modern features: automatic code editing, chat profiles, a rules file for system instructions, LLM switching (including non-Anthropic options), MCP, and checkpoints (currently handled via git in beta).

Importantly, this could genuinely position Zed as a strong alternative to the dominance of VS Code and its forks. Just as soon as they manage to, you know, finally ship that Windows version. In the meantime, Windows users can install Zed using Scoop.

https://hub.continue.dev/

The creators of the Continue chat plugin for VS Code and JetBrains have added a section on their website that looks like a catalog of assistants for programming.

This is a good idea, because after working in Cursor, for example, the assistant starts accumulating system instructions, command repetitions, MCP server settings, and additionally indexed documents. It would be cool to have "snapshots" of such settings. Currently, catalogs of generic system instructions and MCP catalogs have started to appear online.

Using the example of https://hub.continue.dev/continuedev/clean-code, you can see that these are chat settings packages that consist of the following blocks:

  • Models: Blocks for defining models for different roles, such as chat, autocompletion, editing, embedder, and reranker.
  • Rules: Rule blocks are system instructions; the content of the rules is inserted at the beginning of all chat requests.
  • Docs: Blocks that point to documentation sites that will be indexed locally and can then be referenced as context using @Docs in the chat.
  • Prompts: Prompt blocks are pre-written quick commands, reusable prompts that can be referenced at any time during a chat.
  • Context: Blocks that define a context provider that can be referenced in the chat with @ to get data from external sources such as files, folders, URLs, Jira, Confluence, Github.
  • Data: The plugin automatically collects data on how you create code. By default, this data is stored in .continue/dev_data on your local computer. You can configure your own destinations, including remote HTTP and local file directories.
  • MCP Servers

The problem with this is that it works like software from 5 years ago; you have to click everything yourself and browse the site-catalog. It would be nice if the AI chat itself offered its settings from the available blocks, instead of me looking at them and choosing.

https://www.cursor.com/changelog/chat-tabs-custom-modes-sound-notification

Cursor 0.48.x

Joy=) As I already wrote in the 0.46 branch and after in 0.47, an unpleasant interface decision was made to pack everything into single chat tab, and it was very inconvenient after I got used to two. Finally, everything was fixed, and now it's possible to open as many tabs as you want (!) via "+" and include any modes in them.

For each mode ("agent", "ask", "manual" - previously "Edit" was confused with "Ask" and thus renamed) you can now set your own hotkey and fix the model.

In the settings, you can enable the ability to create additional modes (custom modes) with the choice of model, hotkey, a bunch of settings, MCP, and even a custom instruction. A very good response to the Roo Code plugin!

Also now if in the chat the context window starts to cut off the beginning of the conversation, it will be written in small text at the bottom.

So far there are no new models deepseek-v3.1 and gemini-2.5-pro in the list, but I think they will appear soon.

https://www.zbeegnew.dev/tech/build_your_own_ai_coding_assistant_a_cost-effective_alternative_to_cursor/

The article discusses how to create a cost-effective alternative to AI coding assistants like Cursor by using Claude Pro and the Model Context Protocol (MCP).

Code: https://github.com/ZbigniewTomanek/my-mcp-server

The author, Zbigniew Tomanek, shares their experience of using Claude with MCP to automate the complex task of implementing Kerberos authentication for a Hadoop cluster, reducing a full day's work to minutes.

Main Points:

  1. Problem: AI coding tools like Cursor are expensive and raise privacy concerns.

  2. Solution: Use Claude Pro ($20/month) with a custom-built MCP server to achieve similar functionality without the extra cost and with more control over data.

  3. MCP Explained: MCP is an open protocol that allows applications to provide context to Large Language Models (LLMs). The author uses a Python SDK to build a simple MCP server.

  4. Kerberos Example: The author details how Claude, using the MCP tools, analyzed project files, created a comprehensive plan, generated configuration files, and fixed errors for the Kerberos implementation.

  5. Cost Savings: Using Claude Pro + MCP saves money compared to dedicated AI coding tools.

  6. Data Privacy: Code and data remain on the user's machine, enhancing privacy.

  7. MCP Tools: The author's MCP server includes tools for file system operations, shell command execution, regular expression search, and file editing.

  8. Self-Improvement Loop: Claude can analyze and improve its own tools, leading to AI-optimized interfaces and custom tooling.

  9. Custom MCP Shines: MCP + Claude Pro offers cost-effectiveness, data control, customization, self-improvement, and complex task automation.

https://appgen.groqlabs.com/

Another experimental project that allows you to create small applications (so-called micro-apps) using text descriptions. It is integrated with the Groq platform (not to be confused with Elon Musk's Grok).

Groq's LPU™ Inference Engine is a hardware and software platform that provides very high speed computation, quality, and energy efficiency. It deploys open-source medium-sized models such as llama3, gemma2, deepseek-r1-distill, mixtral, and qwen.

Typically, the context window is small, and in the free version, the limit is 6k tokens per minute, meaning it really can't handle large codebases.

The code is open source: https://github.com/groq/groq-appgen - you can deploy it on your computer by generating an API key in the developer console.

A very interesting feature of Appgen is the ability to click on a pencil-shaped button and draw a sketch of how the interface should look with your mouse.

https://appgen.groqlabs.com/gallery
If desired, you can expand your "creation" into a gallery or view the works of others.

The default model is currently qwen-2.5-coder-32b, but you can switch to deepseek-r1-distill-llama-70b-specdec. Overall, I don't see any value in this other than the ability to compare the approaches to code generation in the available models.

Improvements to the interface of web chat platforms

https://gemini.google.com/
Google has integrated a new Canvas feature into its Gemini AI chat platform. Gemini Canvas allows users to generate code directly within the interface, including HTML, CSS, and JavaScript. The feature also supports Python scripting, web application development, and game creation. Additionally, Python code generated in Canvas can be opened and executed in Google Colab.

All updates are presented in the video.
https://www.youtube.com/watch?v=Yk-Ju-fqPP4

https://claude.ai/chats
Anthropic Claude Artifacts appeared first (mid-2024, publicly available since August). These are results generated during a chat with Claude and displayed in a separate window next to the main dialogue. For code, a "Preview" option appeared. Rendering of React, HTML, and JavaScript code is supported. Libraries such as Lodash and Papa Parse are available.

https://chatgpt.com/
In October 2024, OpenAI responded by adding the Canvas feature to ChatGPT. It became possible to highlight specific sections of code for targeted editing or obtaining explanations. Hotkeys are provided for working with code, such as code checking, adding logs/comments, debugging, and translation into other programing languages. A version history feature allows viewing and restoring previous states of the code. It also became possible to directly execute Python code in the browser using the "Run" button. Rendering of React and HTML code is supported.

I really liked the capabilities of Artifacts to quickly generate small code snippets, but Canvas in my ChatGPT was constantly buggy, glitchy, and lost context, so I quickly stopped using it.

https://chat.mistral.ai/chat
Le Chat Canvas appeared in early 2025 as an interface for collaboration with LLM Mistral. Supports rendering of React and HTML code. Users can highlight code to get explanations or make changes.

Gemini Canvas also has a window for inline queries regarding the highlighted block.

❌ So far, Grok and DeepSeek chats do not have Canvas/Artifacts features.

If you are interested in how system queries to LLM models work in popular systems for programming (and not only), you can go to GitHub to one of the many repositories of "leaked" prompts such as leaked-system-prompts and read them.

There are Cursor Windsurf bolt.new github copilot v0.dev and others.

https://www.youtube.com/watch?v=6g2r2BIj7bw

There is also a group of custom prompts that can change behavior, for example, Cursor/Windsurf as Devin — this is more of a solution for experiments out of curiosity than for work.