31st August 2022 - Link Blog
Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator. Andy Baio and I collaborated on an investigation into the training set used for Stable Diffusion. I built a Datasette instance with 12m image records sourced from the LAION-Aesthetics v2 6+ aesthetic score data used as part of the training process, and built a tool so people could run searches and explore the data. Andy did some extensive analysis of things like the domains scraped for the images and names of celebrities and artists represented in the data. His write-up here explains our project in detail and some of the patterns we’ve uncovered so far.
Recent articles
- Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code - 22nd June 2026
- sqlite-utils 4.0rc1 adds migrations and nested transactions - 21st June 2026
- Datasette Apps: Host custom HTML applications inside Datasette - 18th June 2026