Wu X, Li J, Xu M, et al, DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models, EMNLP, 2023. (顶级国际会议)
Abstract: The DEPN (Detect and Edit Privacy Neurons) framework is designed to identify and modify privacy neurons within pre-trained language models. This framework utilizes a privacy neuron detector to pinpoint neurons associated with private information, and then sets their activation levels to zero to erase the model's memory of such data. Additionally, DEPN introduces a privacy neuron aggregator for batch processing to forget private information. Experimental results show that DEPN can significantly reduce the risk of private data exposure without compromising the model's performance. The study also explores the relationship between model memorization and privacy neurons, demonstrating the robustness of the DEPN approach.
摘要:DEPN(Detect and Edit Privacy Neurons)是一个用于检测和编辑预训练语言模型中的隐私神经元的框架。该框架通过隐私神经元检测器定位与私人信息相关的神经元,并将它们的激活值设为零以删除模型对这些信息的记忆。此外,DEPN还引入了隐私神经元聚合器,以便批量处理和去记忆私人信息。实验结果显示,DEPN能显著降低私人数据泄露的风险,同时不损害模型性能。研究还探讨了模型记忆与隐私神经元之间的关系,证明了DEPN方法的稳健性。