Last time I tried QwQ or QvQ (a couple of days ago), their CoT was so long that ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Alifatisk 11 months ago \| parent \| context \| favorite \| on: QwQ-32B: Embracing the Power of Reinforcement Lear... Last time I tried QwQ or QvQ (a couple of days ago), their CoT was so long that it almost seemed endless, like it was stuck in a loop. I hope this doesn't have the same issue.

lelag 11 months ago | [–]

If that's an issue, there's a workaround using structure generation to force it to output a </thiking> token after some threshold and force it to write the final answer.

It's a method used to control thinking token generation showcased in this paper: https://arxiv.org/abs/2501.19393

pomtato 11 months ago | [–]

it's not a bug it's a feature!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact