PureML: automated data clean up and refactoring
Blog post from LllamaIndex
PureML, developed by a team at the Agentic RAG-A-THON, is a proof of concept designed to address the challenges of data cleaning in machine learning by deploying AI agents to automate and streamline this process, ultimately reducing costs and improving model accuracy. With a particular focus on automotive applications, PureML tackles three main use cases: context-aware null handling, intelligent feature creation, and data consolidation. By integrating a Retrieval-Augmented Generation (RAG) system supported by Generative AI and OpenAI's GPT-4, PureML enhances data accuracy and enriches datasets, such as automatically identifying and adding the country of vehicle manufacture. The solution employs tools like LlamaParse and Reflex to transform and optimize data retrieval and user experience, earning recognition for its innovative use of technology. Although some planned features were not included in the initial demo, such as VESSL and Arize Phoenix, the team remains dedicated to exploring additional use cases and welcomes interest from potential collaborators and investors.