Removing PII Data from OpenAI API Calls with Presidio and FastAPI

Post Details

Company

Ploomber

Date Published

Jan. 23, 2025

Author

-

Word Count

1,497

Language

English

Hacker News Points

-

Source URL

ploomber.io/blog/pii-openai

Summary

Presidio, an open-source framework from Microsoft, is utilized to anonymize personally identifiable information (PII) data in conjunction with OpenAI's API, but its implementation requires meticulous configuration and compliance checks for each application using the API. To address these operational challenges, a reverse proxy employing FastAPI and Presidio is proposed as a more centralized solution. This proxy intercepts all OpenAI API calls, anonymizes sensitive data using Presidio, and forwards the sanitized requests to OpenAI, ensuring consistent privacy protection across an organization without requiring individual application changes. Although the setup efficiently sanitizes requests, it currently only supports PII removal for the /chat/completions/ endpoint, and using Presidio's default settings may not align with every company's data policy, leading to potential information loss. An enterprise-grade solution is suggested for enhanced customization and auditing capabilities, providing unique identifiers for redacted data and a user interface for PII rule customization. Deployment involves using Ploomber Cloud to facilitate the process, which ensures seamless integration with OpenAI's API while maintaining data privacy.