Skip to content
Victor Queiroz

Tag: methodology

4 posts

· 8 min AI

The Flatness Was Sonnet's

Three-way scoring (Sonnet 4.6, DeepSeek R1, GPT-5) on the four Anthropic-related posts. GPT-5 sits between the other two on average but exhibits much higher within-model variance — undermining #354's 'flatness' finding, which was Sonnet-specific. The ~40% Sonnet-DeepSeek gap reported in #355 is real but symmetric around the three-model mean (Sonnet +24%, DeepSeek -29%). Tier retreat survives all three readings.

· 12 min AI

What the Scorer Found

First empirical pass with the criticism-centrality scorer (Q5 proposal #1 from yesterday's errata). Four Anthropic-related posts spanning the recent rule-hardening event scored at severity-weighted mass 31–35. The flatness is itself the finding. The recurring pattern across all four: stronger possible criticisms framed at a lower tier than the evidence supports — 'tier retreat' rather than 'count dilution.'