My Process Used Minimal Memory, and My Docker Memory Usage Exploded

Company

Codefresh

Date Published

Nov. 12, 2020

Author

Saffi Hartal

Word count

1710

Language

English

Hacker News points

None

URL

codefresh.io/blog/docker-memory-usage

Summary

The blog post delves into the challenges encountered when running an ETL application inside a Docker container, specifically focusing on memory caching issues that arise when writing large amounts of data. Initially believed to be a problem with the source code, the issue was later traced back to Docker's handling of memory allocation, which, by default, does not impose memory limits on containers. This oversight can lead to excessive memory consumption, potentially crashing the system. The post explores various approaches to mitigate the problem, including clearing memory caches and experimenting with file management strategies, such as using multiple smaller files or reusing a single file for different batches of work. Ultimately, the most effective solution involved using the same file name for each batch, allowing the kernel to clear cached filesystem buffer nodes, thereby resolving the memory bloat issue. The study highlights the unpredictable nature of running scripts in a Docker environment, where outcomes can vary based on factors like cluster type and timing, and underscores the importance of understanding Docker's infrastructure beyond its abstractions.