Model Bench - Search News

MSN on MSN

Google says its most powerful Gemini model yet is coming this month

Google confirmed that Gemini 3.5 Pro, the most powerful model in its Gemini lineup, is already running inside the company and ...

MSN on MSNOpinion

Microsoft unveiled MAI-Code-1-Flash, its first model that turns descriptions into working code

Software developers working on complex, multi-file projects now have a new tool to evaluate after Microsoft released MAI-Code ...

Tech Times

MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks

MiniMax M3 launched June 1, 2026 with a 1-million-token context window and company-reported SWE-Bench Pro scores that edge ...

Live Science

Scientists design new 'AGI benchmark' that indicates whether any future AI model could cause 'catastrophic harm'

OpenAI scientists have designed MLE-bench — a compilation of 75 extremely difficult tests that can assess whether a future advanced AI agent is capable of modifying its own code and improving itself.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results