Document Processing and Elasticsearch

Post Details

Company

Elastic

Date Published

Nov. 12, 2014

Author

Njal Karevoll

Word Count

1,344

Language

-

Hacker News Points

-

Source URL

www.elastic.co/blog/found-document-processing

Summary

Document processing in Elasticsearch involves transforming incoming data before indexing, allowing for enhanced document functionality by tagging, rewriting, or dynamically calculating attributes. Elasticsearch offers various methods for document processing, including using the transform field in mappings, custom plugins, or external systems like Logstash and RabbitMQ. While small-scale transformations can be handled within Elasticsearch through the transform field or custom plugins, these methods are limited by their synchronous nature and resource usage. For more complex and scalable processing requirements, external systems offer flexibility, enabling asynchronous processing and integration with tools like Hadoop, Spark, or Docker containers. This decoupling of document processing from Elasticsearch allows for more efficient resource allocation and easier management of updates, although it requires a more sophisticated setup.