Modern Question Answering Systems Explained

Company

deepset

Date Published

Nov. 28, 2019

Author

Branden Chan

Word count

1422

Language

English

Hacker News points

None

URL

www.deepset.ai/blog/modern-question-answering-systems-explained

Summary

A modern question answering system involves several stages, including preprocessing where documents are split into passages and tokens are extracted from questions and passages. The input sequence is then passed through a language model to generate contextualized word vectors, which are used to predict the start and end indices of the answer span. Aggregation layer chooses the highest scoring positive-answer span across all passages to make the final prediction. The system also formats predictions by extracting the string prediction and character index from the original document. To improve performance, multiprocessing is incorporated in the data processing pipeline, reducing training time from 20 minutes to under 2 hours on a single machine. The open-sourced Haystack framework provides a developer-friendly way to build scalable QA systems for large documents.