Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Authors: Linyu Liu, Yu Pan, Xiaocheng Li, Guanting Chen
Notes by: @gabchuayz
tldr
- Train a regression model (e.g. random forest) to estimate the uncertainty of a LLM's response
- Input: LLM's hidden-layer activations of the last token OR entropy- or probability-related outputs
- Output: Task-specific score between 0 and 1 about the certainty of the answer
Existing methods
- Based directly on the outputof the LLM: Multiple sampling / add pertubations
- Unsupervised
- Applied to transformers, but not LLMs
Why do this?
- UI/UX
- Improved performance
- Hallucination detection
- Auto-Eval (?)
#1 and #4 are my own reflections
Expressing the problem mathematically
- An LLM is given an input prompt and randomly generates a response
- We typically use the generated for a downstream task (e.g. Q&A, MCQ, translation), and these task have their own scoring function (e.g. Rouge, BLEU):
- The task of uncertainty estimation for LLMs is learning a function that predicts the score
1. Whitebox LLMs
- Use the LLM to generate responses for the sample prompts, and construct the raw dataset:
Note how multiple prompts with different responses would count as different training instances
- For each sample, extract the features to construct the uncertainty dataset:
is the vector of selected features.
For whitebox LLMs, it is the hidden-layer activations. For the experiments in this paper, they use the activations from the middle and last layer.
we note that another direct feature for predicting zi is to ask the LLM “how certain it is about the response” and incorporate its response to this question as a feature
-
Train a supervised learning model to predict the score based on the features with the uncertainty dataset.
-
At inference time, generate the responses with the LLM, extract the features, and use the learnt to predict the uncertainty score.
Feature Selection
For this paper, they use 320 features. 20 from the greybox LLMs - see below.
300 from:
- 100 by LASSO
- 100 by top mutual information
- 100 by top absolute Pearson correlation coefficient
They then train these 320 features with a random forest regressor
2. Greybox LLMs
- The following 20 features are used
3. Blackbox LLMs
-
Feed the prompt into the whitebox model to get the hidden-layer activations, and extract
-
In this paper, they treat Llama 7B and Gemma 7B as a black box, and use the other as the whitebox for uncertainty estimation.
Numerical Results
3 Tasks:
- Q&A - Rouge-1
- MCQ - Yes/No accuracy
- Translation - BLEU
Here are the results for Q&A and Translation
Choice of Layer for Hidden Activations
This may come from the fact that the last layer focuses more on the generation of the next token instead of summarizing information of the whole sentence, as has been discussed by Azaria and Mitchell (2023).