Toward Secure & Trustworthy AI: Independent Benchmarking

Elie Bursztein

GenAI is evolving at an unprecedented pace, with frequent releases of new large language models (LLMs) featuring performance improvements, efficiency gains, and new capabilities. Developers, researchers, and organizations looking to quickly leverage those model advances face the significant challenge of being able to consistently and reliably evaluate their performance and safety and determine which one is best suited for their use cases. To help address this need, Google DeepMind and Giskard are releasing LMEval, a large model evaluation framework, alongside the Phare Benchmark, an independent multi-lingual security and safety benchmark.

Google Slides

Available Media	Slides (pdf) Slides (Online)
Conference	InCyber (InCyber Forum) - 2025
Author	Elie Bursztein

Recent

Autonomous Timeline Analysis and Threat Hunting

FACADE High-Precision Insider Threat Detection Using Contrastive Learning

Autonomous Timeline Analysis and Threat Hunting

Get cutting edge research directly in your inbox.