theme image
Toward Secure & Trustworthy AI: Independent Benchmarking Toward Secure & Trustworthy AI: Independent Benchmarking
  1. talk
  2. AI

Toward Secure & Trustworthy AI: Independent Benchmarking

Available Media

Slides (pdf)

Slides (Online)

Conference InCyber (InCyber Forum) - 2025
Author Elie Bursztein

GenAI is evolving at an unprecedented pace, with frequent releases of new large language models (LLMs) featuring performance improvements, efficiency gains, and new capabilities. Developers, researchers, and organizations looking to quickly leverage those model advances face the significant challenge of being able to consistently and reliably evaluate their performance and safety and determine which one is best suited for their use cases. To help address this need, Google DeepMind and Giskard are releasing LMEval, a large model evaluation framework, alongside the Phare Benchmark, an independent multi-lingual security and safety benchmark.

Google Slides

newsletter signup slide

Get cutting edge research directly in your inbox.

newsletter signup slide

Get cutting edge research directly in your inbox.