Smarter Machine Learning Job Placement in Elasticsearch
Blog post from Elastic
In the blog post, David Roberts discusses the enhancements made to machine learning job placement in Elasticsearch with the release of version 6.1. Previously, machine learning jobs were allocated to nodes based on the number of jobs running, without considering the actual resource usage, which could lead to inefficiencies and resource exhaustion. With the new release, job allocation is now based on estimated resource usage, specifically memory usage, to optimize the distribution across the Elasticsearch cluster. The introduction of the xpack.ml.max_machine_memory_percent setting allows for dynamic management of memory usage by ML jobs, ensuring that jobs are only allocated when sufficient resources are available. Additionally, version 6.1 reduces the default maximum model_memory_limit from 4GB to 1GB, encouraging users to specify appropriate limits based on anticipated job requirements. These improvements allow for a more efficient use of resources, especially when dealing with multiple small jobs, by increasing the default limit of open jobs per node from 10 to 20, provided that memory constraints are respected. During transitional phases, such as rolling upgrades, mixed-version clusters might temporarily revert to the previous allocation logic to ensure stability.