Removing PII Data from OpenAI API Calls with Presidio and FastAPI
Blog post from Ploomber
Presidio, an open-source framework from Microsoft, is utilized to anonymize personally identifiable information (PII) data in conjunction with OpenAI's API, but its implementation requires meticulous configuration and compliance checks for each application using the API. To address these operational challenges, a reverse proxy employing FastAPI and Presidio is proposed as a more centralized solution. This proxy intercepts all OpenAI API calls, anonymizes sensitive data using Presidio, and forwards the sanitized requests to OpenAI, ensuring consistent privacy protection across an organization without requiring individual application changes. Although the setup efficiently sanitizes requests, it currently only supports PII removal for the /chat/completions/ endpoint, and using Presidio's default settings may not align with every company's data policy, leading to potential information loss. An enterprise-grade solution is suggested for enhanced customization and auditing capabilities, providing unique identifiers for redacted data and a user interface for PII rule customization. Deployment involves using Ploomber Cloud to facilitate the process, which ensures seamless integration with OpenAI's API while maintaining data privacy.