FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

Published in NeurIPS, 2025

Recommended citation: Jinwei Hu, Zhenglin Huang, Xiangyu Yin, Wenjie Ruan, Guangliang Cheng, Yi Dong, and Xiaowei Huang. (2025). "FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model." NeurIPS.