Running Costs

Research papers, repositories, and articles about running costs

Showing 1 of 1 items

The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines

This paper systematically measures how settings like batch size and max tokens affect throughput for common LLM engines. It shows that smart hyperparameter tuning can beat naive defaults by double-digit percentages, even when hardware stays the same.

Matias Martinez