ODSC West 2024 Highlight Talk "Challenges and Considerations in Language Model Evaluation"

Description

Challenges and Considerations in Language Model Evaluation

NLP and Machine Learning rely on benchmarks and evaluation to accurately track progress in the field and assess the efficacy of new models and methodologies. For this reason, good evaluation practices and accurate reporting are crucial. However, language models (LMs) not only inherit the challenges previously faced in benchmarking, but also introduce a slew of novel considerations which can make proper comparison across models difficult, misleading, or near-impossible. In this talk, we will discuss the state of language model evaluation, and highlight current challenges in evaluating language model performance through discussing the various methods of evaluation, tasks and benchmarks commonly associated with evaluating progress in language model research. We will then discuss how these common pitfalls can be addressed and what considerations should be taken to enhance future work.

Instructors Bio

Lintang Sutawika

Researcher at EleutherAI

Lintang Sutawika (he/him) is a Researcher at EleutherAI and an incoming PhD student at Carnegie Mellon University. His research interests encompass understanding how to make language technologies more capable, interpretable, and ultimately safe and useful. His work involves understanding how language models work and novel methods to expand their capabilities which includes the Pythia suite of open language models, inducing zero-shot model capabilities through multitask finetuning approaches, observing model training dynamics, and investigating methods to extend models to other languages. He is also a core maintainer of EleutherAI’s LM Evaluation Harness framework to help language model evaluation.

Outline:

A Key Challenge in LM Evaluation
What do we want to evaluate?
LM - Specific Complications
Evaluating Models vs Systems
Life of a Benchmark
Overfitting
Addressing Evaluation Pitfalls

UPCOMING LIVE TRAINING

View more courses