Supercharging Schema-On-Read: Logs in Object Storage Don’t Need a Data Catalog
Blog post from Imply
Machine data architectures are evolving as organizations increasingly move logs and machine data to object stores like AWS S3, driven by growing telemetry volumes and rising costs. This evolution is facilitated by modern lakehouse platforms separating storage and compute, which allow retention of more data at lower costs compared to traditional observability/SIEM infrastructures. A significant challenge with this architecture is the need for structured logs and pre-defined schemas, which can hinder rapid, exploratory investigations required during security incidents. Unlike traditional schema-on-write pipelines that demand predefined schemas and structured data, Lumi's Loglake feature leverages schema-on-read, enabling direct querying of unstructured logs in object storage. This approach allows teams to dynamically reconstruct context and optimize performance without the delay of schema definitions, making operational and security investigations faster and more efficient. As the industry shifts towards separating storage and compute, the focus is on enabling instant searchability and understanding of large volumes of machine data without unnecessary preprocessing, which Lumi and Loglake aim to address.