LLM Benchmark - Search News

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

Business Wire

Simbian Announces Industry’s First Benchmark to Comprehensively Measure LLM Performance in Security Operations Centers

New “AI SOC LLM Leaderboard” Uniquely Measures LLMs in Realistic IT Environment to Give SOC Teams and Vendors Guidance to Pick the Best LLM for Their Organization Simbian's industry-first benchmark ...

Forbes

NVIDIA H100 Dominates New MLPerf v3.0 Benchmark Results

To know how a system performs across a range of AI workloads, you look at its MLPerf benchmark numbers. AI is rapidly evolving, with generative AI workloads becoming increasingly prominent, and MLPerf ...

Security

Simbian launches new security benchmark with AI SOC LLM Leaderboard

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range ...

Infosecurity-magazine.com

Academics Develop Testing Benchmark for LLMs in Cyber Threat Intelligence

Large language models (LLMs) are increasingly used for cyber defense applications, although concerns about their reliability and accuracy remain a significant limitation in critical use cases. A team ...

SDxCentral

Nvidia sets benchmarking performance records with its H200 and TensorRT-LLM software

Nvidia has set new MLPerf performance benchmarking records on its H200 Tensor Core GPU and TensorRT-LLM software. MLPerf Inference is a benchmarking suite that measures inference performance across ...

VentureBeat

Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark

Training AI models is a whole lot faster in 2023, according to the results from the MLPerf Training 3.1 benchmark released today. The pace of innovation in the generative AI space is breathtaking to ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

InfoQ

Hugging Face Upgrades Open LLM Leaderboard v2 for Enhanced AI Model Comparison

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Dany Lepage discusses the architectural ...

TechCrunch

This LLM framework takes a first stab at benchmarking Big AI’s compliance with the EU AI Act

While most countries’ lawmakers are still discussing how to put guardrails around artificial intelligence, the European Union is ahead of the pack, having passed a risk-based framework for regulating ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results