Company
Date Published
Author
Fabian Grewing, Joshua Christl and Aleks Matiiasevych
Word count
2757
Language
English
Hacker News points
None

Summary

DeepL's journey to enhancing document translation focuses on overcoming the complexities of recreating translated PDFs, an essential format for most users despite its challenging nature. The team at DeepL has developed a new quality metric, the Average Bounding Box Overlap Ratio, to improve the accuracy of layout preservation in translated documents, addressing issues found in pixel-based comparisons. They selected specific OCR and DOCX libraries to ensure accurate text extraction and reintegration, while an algorithmic approach was designed to manage language expansion and contraction, maintaining the document's original layout integrity. This involves setting a hierarchy of constraints prioritizing text positioning on pages and font size consistency. By iterating and optimizing their processes, DeepL aims to continuously deliver high-quality document translations across various languages and formats.