2026-05-29 21:29 - CodeWithLLM

More new models of May.

Cursor Composer 2.5
https://cursor.com/blog/composer-2-5
On May 18, 2026, the Cursor team released Composer 2.5. It is based on the open Kimi K2.5 model from Moonshot AI, but now approximately 85% is Cursor's own fine-tuning. The main change compared to Composer 2 is increased autonomy and cost optimization.

The model offers two tiers: Standard at $0.50 per 1M input / $2.50 per 1M output tokens, and Fast at $3/$15. In SWE-Bench Pro tests, it achieved a 49% success rate (compared to 12% in Composer 2), meaning coding skills and context understanding have grown significantly at a reasonable price.

Qwen 3.7 Max
https://qwen.ai/blog?id=qwen3.7
On May 20, 2026, at the Alibaba Cloud Summit, Qwen 3.7-Max was announced. Unlike the previous Qwen 3.6 line, which focused on general tasks, the new version is positioned exclusively as an agentic model for ultra-long autonomous work cycles. The key change is stability during long-running tasks.

Alibaba demonstrated a case where the model fully autonomously optimized a GPU kernel over 35 hours without any human intervention, performing over 1,100 tool calls. The context window was expanded to 1 million tokens (up from 256k in its predecessor), and the "reasoning density" per token was increased.

Qwen 3.7-Max can generate complex interactive web applications from a single prompt—including 3D scenes on Three.js, Canvas animations, full-page layouts, and dynamic SVGs.

https://openrouter.ai/qwen/qwen3.7-max
There is currently a 50% discount on the model at OpenRouter ($1.25/$3.75), making Qwen 3.7 Max perhaps the best choice for price/performance in long-running tasks.

Claude Opus 4.8 — fewer hallucinations and more control
https://www.anthropic.com/news/claude-opus-4-8
On May 28, 2026, Anthropic introduced Claude Opus 4.8 (pricing remains the same as 4.7 at $5/$25 per million tokens) and once again topped the Artificial Analysis global rankings with a score of 61.4, overtaking GPT-5.5.

Instead of focusing on abstract benchmarks, Anthropic prioritized system "honesty": the model learned to directly state "I don't know" or ask for clarification, and it misses hidden bugs in its own code 4 times less often than Opus 4.7.

Dynamic workflows appeared in Claude Code. Now Opus 4.8 can independently plan large-scale tasks, launch parallel sub-agents, and verify results before submitting the work.

#cursor #qwen #claude #claudecode #newllmmodel

2026

2025

2024