Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup, as soon as data is the bottleneck and not compute, diffusion wins. Tested following the Chinchilla scaling strategy from 7M to 2.5B parameters.

https://arxiv.org/abs/2507.15857



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: