#continuous-batching

The Numbers: Benchmarking My LLM Gateway on a H100

A couple of weeks ago I wrote about rewriting my LLM gateway to bring it from MVP to production. The architectural claims were; multi-tenancy, hybrid inference , sub-5ms overhead. So I benchmarked it

Jun 21, 20265 min read7

The Numbers: Benchmarking My LLM Gateway on a H100

Command Palette