Programming Language Benchmarks

Qwen3-Coder-Next offers vibe coders a powerful open source, ultra-sparse model with 10x higher throughput for repo tasks

On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside ...

TechAnnouncer

Beyond LeetCode: Top Reddit Recommendations for Coding Practice Alternatives

Built-in IDE: Code directly in the browser without needing to set up a local environment.

12d

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between ...

Slator

Italian Benchmark Evaluates Large Language Models, Includes AI Translation

Large language models (LLMs) have driven rapid progress in natural language processing (NLP), including AI translation. Yet most benchmarks used to evaluate these systems remain heavily ...

InfoWorld

8 old programming languages developers won’t quit

Newer languages might soak up all the glory, but these die-hard languages have their place. Here are eight languages developers still use daily, and what they’re good for. The computer revolution has ...

Business Insider

Sword Health Launches MindEval, the First Multi-Turn Mental Health Benchmark for Evaluating Large Language Models in Realistic Therapeutic Dialogue

New York, NY, Dec. 09, 2025 (GLOBE NEWSWIRE) -- Sword Health, the world’s leading AI Health company, today unveiled MindEval, the industry’s first benchmark designed to evaluate how large language ...

InfoWorld

R language is making a comeback – Tiobe

The R language for statistical computing has creeped back into the top 10 in Tiobe’s monthly index of programming language popularity. “Programming language R is known for fitting statisticians and ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Morningstar

Logical Intelligence Achieves 76 Percent on Putnam Benchmark, Highlighting Shift Beyond Large Language Models to Language-free, Mathematically Grounded Models

Over the last decade, artificial intelligence (AI) has been largely built around large language models (LLMs). These systems are based on a language and guess words in a chain in the form of tokens.

Wired

Ruby Is Not a Serious Programming Language

My little theory is that the concept of “imprinting” in psychology can just as easily be applied to programming: Much as a baby goose decides that the first moving life-form it encounters is its ...

TechCrunch

Google launches Gemini 3 with new coding app and record benchmark scores

On Tuesday, Google released Gemini 3, its latest and most advanced foundation model, which is now immediately available through the Gemini app and AI search interface. Coming just seven months after ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results