← Previous

June 2026

Microsoft presented a series of changes at its May Build 2026, shifting from simple AI assistance toward autonomous agents.

Proprietary MAI Models
https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
They introduced a new family of MAI (Microsoft AI) models, totaling 7, including MAI-Code-1-Flash and MAI-Thinking-1. Microsoft is effectively reducing its dependence on OpenAI. The company claims frontier-level results for autonomous tasks.

The MAI-Code-1-Flash model has 5B active out of 137B parameters (medium-sized) and a 2-million-token context window (extremely large), operating with record-low latency. Presentation slides look perfect: they claim it outperforms previous generations of the flagship GPT-4o. It is being positioned as the core engine for pipelines integrated into GitHub Copilot, VS Code, and other MS products.

Discussion
https://news.ycombinator.com/item?id=48374466
In a active Hacker News discussion, people note that open models like Qwen 3.6 (35B) or DeepSeek V4 Flash yield better results and run significantly faster, while GitHub Copilot's new strict token billing might make using MAI-Code-1-Flash economically unviable.

GitHub Copilot — Now a Separate App
https://github.blog/2026-06-02-github-copilot-app-the-agent-native-desktop-experience/
Copilot used to be a plugin in VS Code, then became part of it. Now, following Codex, Cursor, Zed, and others, Microsoft decided to create a standalone GitHub Copilot App — an "agent-native" desktop chat application.

This is a single control plane for agents working in parallel across isolated git trees, creating PRs, and debugging code. MS are moving away from the editor concept toward delegating entire workflows. It includes support for MCP servers.

https://www.youtube.com/watch?v=5Q5mLNYJ6Hw

Instead of getting lost on the GitHub website, the app features a convenient "On your radar" (or Inbox) tab where all your Pull Requests (PRs) and Issues from selected repositories are gathered. You can open any PR, view code changes (diff), leave comments, or approve it. Furthermore, you can tag @copilot directly in comments for fixes or explanations.

Users can create "quick chats" for general questions (even non-code related, like D&D games) or sessions tied to a specific repository. The app allows switching between different LLMs. For instance, the author uses Claude Opus 4.7 for code generation.

Agent Isolation: MXC (Microsoft Execution Containers)
https://www.microsoft.com/en-us/security/blog/2026/06/02/microsoft-build-2026-securing-code-agents-and-models/
Since agents now execute real code and interact with systems and infrastructure, Microsoft is implementing MXC at the Windows 11 kernel level. This is a new level of isolation and sandboxing specifically for AI applications. Windows is essentially turning into an "Agent Runtime" platform.

The demonstration showed OpenClaw attempting to delete all files from the desktop and failing due to these restrictions.

#githubcopilot #mai #agenticcoding

Comments

Chinese AI giant MiniMax has announced a new generation of M models: M3.

MiniMax M3
https://www.minimax.io/blog/minimax-m3
https://www.minimax.io/models/text/m3
MiniMax M3 is built with an emphasis on deep reasoning, coding, and autonomous pipelines. It accepts text+image+video as input and produces text as output. The model is specifically optimized for agentic workflows and complex, long-term tasks rather than simple chat interactions.

MiniMax Sparse Attention (MSA) is a new sparse attention mechanism that radically reduces computational costs for long contexts (approximately 1/20th of the previous generation). It supports up to 1M tokens, with a guaranteed minimum of 512K in the cheaper API version. Token Plans are available (starting at $20/month), with a highlighted $50/month option.

Benchmarks look impressive: SWE-Bench Pro ~59% and Terminal-Bench 2.1 ~66%. This is on par with GPT-5.5 and Gemini 3.1 Pro, trailing only Claude Opus 4.8.

There is no active discussion on Hacker News yet.

MiniMax Code Updates
https://code.minimax.io/
With the M3 update, MiniMax Code has also received a significant upgrade, fully utilizing the model's capabilities: long context, agentic skills, and native multimodality. The program can not only generate code but also create documents, PDFs, slides, tables, and icons. Thanks to multimodality, MiniMax Code supports "computer use" (controlling the computer).

The concept is built around delegation - you don't write code alongside the AI; you manage it. It uses a Producer + Verifier adversarial loop where agents constantly generate, reflect, check, and fix errors in real-time. There is a "Smart Authorize" option to avoid manual monitoring of every agent action.

https://www.youtube.com/watch?v=mBHFGeU18MI

There is native support for MCP servers to connect external databases and documentation. It features a skills marketplace and integration with bots in Telegram, WeChat, and Lark for managing agents from a phone. Autonomous operation for several days without human intervention is possible, as well as scheduled task execution.

#minimax #agenticcoding #desktopapp

Comments

Code generation quality continues to improve, but the US government is trying to restrict access for others.

Fable 5 — Pulled in 3 Days
https://www.anthropic.com/news/claude-fable-5-mythos-5
https://support.claude.com/en/articles/14328960-identity-verification-on-claude
On June 9, 2026, Anthropic introduced Claude Fable 5, a model of the new Mythos class. Tests showed a record level of autonomy (including completing games using computer vision) and exceptional performance in code generation.

https://www.youtube.com/watch?v=LoIGVdfTq9M

However, just 3 days later, access to the models was suspended. The US government issued an export directive prohibiting foreign nationals from using these models. Unable to immediately filter out foreign users, Anthropic disabled the models for all customers.

To resolve the issue, the company is launching mandatory identity verification (ID + selfie) via the Persona service. While the verification process is global and supports documents from most countries, access to the flagship Fable 5 will only be granted to verified US citizens and residents due to US regulations.

Discussion
https://news.ycombinator.com/item?id=48618455
The Hacker News community reacted highly negatively to these changes. Many international developers point out that paying for an Anthropic subscription is now pointless since they won't get access to future flagship models. The introduction of Persona verification raises serious privacy concerns, while the sudden shutdown of Fable 5 has undermined trust in US SaaS platforms as a reliable foundation for business.

Although this flagship model currently remains available only to a select group—namely the military and Anthropic employees—I believe this is temporary. OpenAI is clearly preparing its response with GPT-6, and Google is also actively developing in this direction. Therefore, the widespread availability of next-generation models with a qualitatively new level of autonomy and code generation is only a matter of months. Let's wait and see.

#anthropic #fable5

Comments

Chinese AI services continue to gradually catch up with their US counterparts.

TRAE Solo is now Work
https://solo.trae.cn/
https://docs.trae.ai/solo/what-is-trae-solo?_lang=en
ByteDance has renamed its "Trae Solo" tool to Trae Work, highlighting a shift in positioning: from a simple developer assistant to a fully autonomous "AI employee" for various tasks (data scraping, content creation, web research, etc.). Code remains as a separate tab, and a GitHub connector is available. The interface resembles the Codex app, featuring Skills and an MCP with a catalog. The tool is accessible via web, desktop, and mobile. By default, "Privacy Mode" is disabled for new accounts, so users need to enable it manually.

The Capable GLM-5.2
https://docs.z.ai/guides/llm/glm-5.2
https://artificialanalysis.ai/articles/glm-5-2-is-the-new-leading-open-weights-model-on-the-artificial-analysis-intelligence-index
Zhipu AI has released GLM-5.2 — a 753B parameter Mixture-of-Experts (MoE) model under the MIT license, which significantly improves upon GLM-5.1. The context window has been expanded to 1M tokens (compared to 200k in its predecessor).

https://www.youtube.com/watch?v=nODxez6nZEU

The model ranked first among open-source models in the Artificial Analysis Intelligence Index (v4.1) with a score of 51, demonstrating coding skills on par with the proprietary Claude Opus 4.8. While it tends to get confused more easily and consumes more tokens overall, it still delivers results.

Discussion
https://news.ycombinator.com/item?id=48567759
On Hacker News, the model is praised for its price-to-performance ratio in long-running development cycles. However, users note that the "Max" reasoning mode is extremely slow and highly token-intensive. Due to its large size (753B), running it locally on standard MacBook Pros is not possible, but users can rent GPU cloud instances or access it via https://openrouter.ai/z-ai/glm-5-2#providers.

Current top coding models on OpenRouter by usage volume (token count):

MiMo-V2.5 (by xiaomi) — the clear leader with 4.59T (trillion) tokens, representing 22.5% of the total market share.
MiniMax M3 (by minimax) — holds second place with 2.45T tokens (12.0%).
Hy3 preview (by tencent) — third place with 1.43T tokens (7.0%).
Claude Opus 4.7 (by anthropic) — fourth place, accounting for 1.17T tokens (5.7%).
DeepSeek V4 Pro (by deepseek) — closes the top five with 1.14T tokens (5.6%).
DeepSeek V4 Flash (by deepseek) — sixth place with 972B (billion) tokens (4.8%).
GLM 5.1 (by z-ai) — seventh place with 952B tokens (4.7%).
GLM 5.2 (by z-ai) — eighth place with 820B tokens (4.0%).

GLM-5.2 on OpenCode
https://dev.to/danielbergholz/testing-glm-52-on-opencode-im-impressed-1780
The article's author, Daniel Bergholz, tested the model in real-world development conditions by integrating GLM-5.2 via OpenRouter into the free coding agent OpenCode.

In a practical test on an actual Next.js project, the model was tasked with implementing an article search feature with a 300ms debounce without cluttering the browser history. GLM-5.2 proved to be a somewhat slow but highly deliberate model: during the planning phase, it analyzed the project structure without additional prompting, recognized the difference between server and client components, and logically justified using client-side rendering for this task. It wrote clean, working code on the first attempt ("one-shot") and demonstrated a rare "restraint" for AI assistants by not trying to overcomplicate the existing project structure.

The entire session, which included repository analysis, planning, coding, review, and the final fix, cost the author only $0.265 (less than 27 cents).

#trae #glm #top #opencode

Comments

OpenAI News.

GPT-5.5-Cyber and the Daybreak Initiative
https://openai.com/index/gpt-5-5-with-trusted-access-for-cyber/
GPT-5.5-Cyber has been announced as part of the Daybreak initiative. The model is tailored for defensive security, including vulnerability detection, threat modeling, and code patch generation. On the CyberGym benchmark, it scored 85.6%, outperforming the base GPT-5.5 (81.8%) and Anthropic Mythos 5 (83.8%). Currently, access is restricted to verified organizations.

Codex: Rate-Limit Resets
https://community.openai.com/t/flexible-rate-limit-resets-for-codex-and-a-method-to-get-a-reset/1383470
The primary update for June 2026 is the introduction of banked rate-limit resets. How it works: if you do not exhaust your limit within a 5-hour window, the remaining portion is "banked" and saved as a separate reset to be used later. These are not API credits or monetary balances, but rather an additional limit strictly within the ChatGPT subscription. It remains valid for 30 days from the date of accrual.

Each Plus/Pro user has received one free reset, and up to three more can be earned through referrals until June 24. Some users did not see them appear until their first referral. Banked resets can be activated via the Codex desktop app under Settings → Usage remaining. Note: CLI and VS Code plugin users (as well as some Linux users) currently lack a native way to activate them and must use the desktop app. Unofficial community scripts for CLI workarounds have already emerged.

Additionally, a Codex Profile screen has been introduced, displaying usage stats and token activity graphs, which is particularly useful with the transition to token-based billing.

Codex-Maxxing: A Guide for Long-Running Sessions
https://openai.com/index/codex-maxxing-long-running-work/
OpenAI has published an official PDF guide / white paper titled Codex-Maxxing. This guide outlines how to utilize Codex as a persistent working agent handling long-term projects, rather than just answering individual queries. OpenAI envisions a future that goes beyond starting a new chat every time.

The term "Codex-maxxing" is used here to describe an approach: rather than simply asking questions, users build an entire workflow system around the agent:

Codex can and should work with more than just code. It can control a computer, browse the web, or have MCP access to email, calendar, etc.
Instead of separate chats, use durable workflows (Durable Threads) with accumulated project history and context compaction.
Agent memory should be visible and isolated. Repositories store code, while a separate folder (Vault) stores context, decisions, open loops, current state of work, etc.
Steering (guidance during execution): Instead of assigning a task and waiting for the final result, you monitor the progress in real-time and provide adjustments along the way. This interaction resembles collaborating with a human colleague: you focus on the work itself (documents, code) rather than the chat window, providing feedback on specific parts.
Voice over text + mobile access: OpenAI believes that typed queries are often over-edited. Voice input allows users to convey emotions, raw ideas, and unstructured thoughts that modern models can process effectively. You can discuss progress with the agent on your phone while walking, while it performs tasks on your computer.
Utilizing Heartbeats (Thread Automation): scheduled automatic checks that allow the agent to monitor repository states or CI pipelines without human intervention. The agent does not wait for a new message from the user; it returns to the task according to a schedule.

Tasks are defined by clear, verifiable exit criteria (e.g., "achieve 100% test coverage" or "reduce deployment time by 30%"), and the agent works autonomously with regular checks until they are met. A poorly formulated task is "Implement the entire plan," whereas a well-formulated one is "Port the library to Rust, keep the API compatible, and consider the task complete only when all legacy tests pass successfully."

#codex #newllmmodel

Comments

← Previous

2026

2025

2024

June 2026

June 2026