Document inlining: Crossing the modality gap with Compound AI

Post Details

Company

Fireworks AI

Date Published

Oct. 6, 2025

Author

-

Word Count

1,685

Language

English

Hacker News Points

-

Source URL

fireworks.ai/blog/document-inlining-launch

Summary

Fireworks has introduced Document Inlining, a system designed to address the challenges of processing multimedia content by converting various digital asset formats, such as PDFs and images, into text that Large Language Models (LLMs) can easily process. This solution aims to overcome the limitations of Vision Language Models (VLMs) that often struggle with non-textual data, resulting in reduced reasoning capabilities and increased costs. Document Inlining automates the transformation of documents into a structured text format, enabling LLMs to process and reason with this data effectively. By using a specialized parsing service, it handles complex document structures like tables and charts, enhancing the quality of results and improving processing speed through parallel transcription. Fireworks' approach allows for flexible input types, improved quality through specialized components, and ultra-simple usage compatible with the OpenAI API. The system has been shown to deliver superior performance compared to other models and promises to extend its capabilities to include audio inlining and long document searches in the future.