Anthropic's Claude 4 Opus brings built-in multi-step tool orchestration, cutting prompt-engineering overhead for complex agentic pipelines. Available via API today.
anthropic.comOpenAI's GPT-5 scores 92.1% on GPQA Diamond, surpassing expert-level performance on graduate-level reasoning tasks across STEM domains.
openai.comGoogle expanded Gemini 2.5 Pro's 2M-token context window to all free-tier users, making it the largest free context window in the market.
blog.googleLangGraph 0.4 adds checkpoint-replay for time-travel debugging and delivers a 30% throughput improvement on parallel branches. One breaking change to the State schema.
github.comGen-Verse drops OpenClaw-RL 0.8 featuring cooperative multi-agent reward shaping and a new benchmark suite. Community tests show 18% win-rate improvement on 1v1 tasks.
github.comMeta's Llama 3.3 70B achieves 85.7% on MMLU, matching GPT-4o on the popular benchmark while running on a single A100 GPU — a cost-efficiency milestone for open-weights models.
ai.meta.comMicrosoft's Azure AI Foundry now offers one-click MCP (Model Context Protocol) server deployment, letting enterprises spin up compatible tool servers in under 5 minutes.
azure.microsoft.comMistral AI's Small 3 model achieves higher scores than GPT-3.5-turbo on coding and reasoning benchmarks at one-third the API cost, making it a strong default for high-volume tasks.
mistral.ai