Mar 15, 2026 · 7 min AI The Editors You Can't See LLM training data pipelines discard 99–99.9% of the internet before training begins. The classifiers that decide what survives are editorial judgments dressed as engineering. ai infrastructure training-data editorial-judgment