Yes, but MQA is limited to 6B size, while "other" larger non-RNN models in table... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		riku_iki on April 10, 2024 \| parent \| context \| favorite \| on: Implementation of Google's Griffin Architecture – ... Yes, but MQA is limited to 6B size, while "other" larger non-RNN models in table(Llama-2) are not trained on the same dataset, and Hawk and Griffin are 7B. Sorry, I don't understand your point.

GaggiX on April 10, 2024 [–]

The point is that it also beats the baseline on every other size (1B and 3B). So it wouldn't be surprising to see it beat a 7B transformer model like the 6B model. Note 2 on page 5 probably explains why the sizes are different.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact