Is this the best way to run your own models these days?

arnaudsm · 2025-03-05T22:56:40 1741215400

It's the easiest to setup, but you can get 2x-6x faster with TGI and vLLM depending on the scenario.

Der_Einzige · 2025-03-06T20:02:49 1741291369

vllm isn't even hard to setup!

I find it so funny that HN is sitting in the stoneage with LLM inference.

Meanwhile I'm here with sillytavern hooked to my own vllm server, getting crazy fast performance on my models and having a complete suite of tools for using LLMs.

Most folks on here have never heard of sillytavern, or oobabooga, or any of the other projects for LLM UI/UX (LM-studio). It's insanity that there hasn't been someone like ADOBE building a pro/prosumer UI for LLMs yet.