A research paper introduces a novel method for extending the context length of BERT, a popular Transformer-based model in natural language processing, by integrating token-based memory storage and segment-level recurrence with recurrent memory (RMT). This approach enables BERT to handle up to 2 million tokens, surpassing the maximum input size previously reported for transformer models, while maintaining the base model's memory size at 3.6 GB. The methodology enhances long-term dependency management in natural language tasks and supports large-scale context processing, making it suitable for memory-intensive applications. Additionally, it allows the storage and processing of both local and global information and facilitates information flow between input sequence segments, with a linear scalability for any model size if the segment length is fixed. The method also achieves a significant reduction in the number of floating-point operations (FLOPs) by up to 295 times for multi-segment sequences, showcasing its efficiency and potential for improving transformer models in natural language understanding and generation tasks.