Be ahead of the curve
Research papers, repositories, and articles about performance
Showing 1 of 1 items
Mugi generalizes value-level parallelism hardware tricks to full LLM workloads. It speeds up core math operations and softmax, yielding over 2x throughput and big energy savings on custom chips.