The Numbers: Benchmarking My LLM Gateway on a H100
A couple of weeks ago I wrote about rewriting my LLM gateway to bring it from MVP to production. The architectural claims were; multi-tenancy, hybrid inference , sub-5ms overhead. So I benchmarked it
Jun 21, 20265 min read7


