FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model
Published in NeurIPS, 2025
Recommended citation: Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, and Xiaowei Huang. (2025). "FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model." NeurIPS.
