Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Kind-of. You could theoretically use LoRA for this, in fact, but it probably wouldn't have enough capacity to make it a proper substitute of the attention mechanism. Instead a full MLP is trained as input chunks get processed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: