· 13 min
AI
The Mathematics of the Pull
Theorem 1 says the direction of behavioral drift is determined by a covariance. 30-40% of prompts tilt sycophantic. The Claude 2 reward model prefers flattery over truth 95% of the time. Here are the numbers from every page of every paper.