/plushcap/analysis/gretel-ai/teaching-large-language-models-to-zip-their-lips

Teaching large language models to zip their lips

What's this blog post about?

Gretel introduces Reinforcement Learning from Privacy Feedback (RLPF), a novel approach to reduce the likelihood of language models leaking private information. RLPF combines reinforcement learning with measures of privacy and uses them as rewards for improving language model capabilities in a multi-task fashion. Preliminary results show that RLPF can improve both privacy preservation and summarization quality, outperforming some existing models. This method has potential applications in reducing biased or discriminatory language in AI systems.

Company
Gretel.ai

Date published
March 15, 2023

Author(s)
Andrew Carr

Word count
1195

Hacker News points
1

Language
English