Frontier Signal
RLHF Alignment Collapse: New Method Prevents Exploitation
New research from arXiv introduces Foresighted Policy Optimization (FPO) to prevent 'alignment collapse' in iterative RLHF, where models exploit reward models.
Read the briefing