Benchmark Measuring Head Models

Researchers develop new LiveBench benchmark for measuring AI models’ response accuracy

A group of researchers has developed a new benchmark, dubbed LiveBench, to ease the task of evaluating large language models’ question-answering capabilities. The researchers released the benchmark on ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

Business Wire

MLCommons Launches AILuminate, First-of-Its-Kind Benchmark to Measure the Safety of Large Language Models

SAN FRANCISCO--(BUSINESS WIRE)--MLCommons today released AILuminate, a first-of-its-kind safety test for large language models (LLMs). The v1.0 benchmark – which provides a series of safety grades for ...

Wired

A New Benchmark for the Risks of AI

MLCommons, a nonprofit that helps companies measure the performance of their artificial intelligence systems, is launching a new benchmark to gauge AI’s bad side too. The new benchmark, called ...

12d

Frontier models are failing one in three production attempts — and getting harder to audit

Stanford's 2026 AI Index: frontier models fail one in three attempts, lab transparency is declining, and benchmarks are ...

SDxCentral

MLCommons Launches AILuminate, Benchmark to Measure Safety of Language Models

MLCommons has launched AILuminate, a benchmark designed to assess the safety of large language models and promote standardized AI safety measures. MLCommons recently launched AILuminate, the first ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results