2025-01-05 13:09 - CodeWithLLM

Aider LLM Leaderboards
https://aider.chat/docs/leaderboards/

Polyglot test measures the ability of LLMs to program in popular languages.

Aider works best with LLMs that are good at editing code, not just generating code well. To evaluate the editing skills of LLMs, Aider uses tests that assess the model's ability to consistently follow system prompts to successfully edit code.

At the beginning of 2025, unexpectedly, the Chinese DeepSeek V3 (671B MoE) is showing very good results. Now they have discounts until February 8 on tokens, well, and the price is $0.14/M input $0.28/M output, but the context window is cut (you can buy on openrouter ) does not compare to o1 and claude-3.5-sonnet.

#newllmmodel #compare

2025

2024