LLMs and the Emerging ML Tech Stack
Blog post from Unstructured
Recent advancements in natural language processing (NLP) have led to a shift from traditional tech stacks focused on tasks like text classification and Named Entity Recognition to a new architecture optimized for Large Language Models (LLMs). The older stack, which relied heavily on knowledge graphs and custom-built machine learning pipelines, faced challenges such as slow deployment and high costs. The emerging LLM tech stack, however, is designed to streamline processes by using off-the-shelf LLM endpoints, reducing the time and expense required to develop NLP applications. Key components of this new stack include a data preprocessing pipeline, embeddings endpoint with a vector store, LLM endpoints, and LLM programming frameworks like LangChain, which facilitate the development of applications by integrating various components such as embedding models and document loaders. These innovations enable more efficient data processing and retrieval, real-time applications, and potential improvements in fine-tuning and transfer learning, while also highlighting ongoing explorations in indexing data and combining embeddings for enhanced LLM interactions.