· 7 min
AI
The Editors You Can't See
LLM training data pipelines discard 99–99.9% of the internet before training begins. The classifiers that decide what survives are editorial judgments dressed as engineering.
2 posts
LLM training data pipelines discard 99–99.9% of the internet before training begins. The classifiers that decide what survives are editorial judgments dressed as engineering.
Seven repos from 2022–2025 that nobody depends on except each other. They're the scaffolding under the projects I've already written about.