Word Error Rate (WER) is a widely used metric to evaluate Automatic Speech Recognition (ASR) systems, but it has several flaws, including giving unequal importance to different types of mistakes and being sensitive to formatting differences. The calculation of WER involves aligning the reference and recognized transcript using Levenshtein distance, then counting substitutions, insertions, and deletions. However, the metric can be misaligned with reality due to issues such as punctuation, capitalization, and variations in writing styles. These limitations make it challenging for developers to use WER as a reliable evaluation tool, especially when comparing across different vendors. In recent work, Speechmatics is exploring novel metrics that align with human judgment by harnessing the power of large language models, offering a more meaningful way to evaluate ASR systems.