This new metric for evaluating question answering systems is called Semantic Answer Similarity (SAS). SAS measures the semantic similarity between two answer strings, rather than just their lexical overlap. This makes it a better approximation of human judgment than existing metrics like Exact Match (EM) and F1. SAS uses a cross-encoder architecture that leverages a pre-trained semantic text similarity model to assess the similarity of two strings. The SAS metric returns a score between zero and one, with higher scores indicating greater semantic similarity. To use SAS in Haystack, users can initialize the SAS model together with the EvalAnswers() node and run the pipeline to evaluate their question answering system. While SAS has its strengths, it also has limitations, such as potentially awarding high scores to strings that are semantically similar but not accurate. Nevertheless, SAS can provide a better understanding of how well a question answering system is doing compared to existing metrics like F1 and EM.