Performance

Research papers, repositories, and articles about performance

Showing 1 of 1 items

Mugi: Value Level Parallelism For Efficient LLMs

Mugi generalizes value-level parallelism hardware tricks to full LLM workloads. It speeds up core math operations and softmax, yielding over 2x throughput and big energy savings on custom chips.

Daniel Price, Prabhu Vellaisamy